You are working under an assumption that this tech is an O(n) or better computational regime.
Ask ChatGPT:
“Assume the perspective of an expert in CS and Deep Learning. What are the scaling characteristic (use LLMs and Transformer models if you need to be specific) of deep learning ? Expect answer in terms of Big O notation. Tabulate results in two rows, respectively “training” and “inference”. For columns, provide scaling characteristic for CPU, IO, Network, Disk Space, and time. ”
This should get you big Os for n being the size of input (i.e. context size). You can then ask for follow up with n being the model size.
Spoiler, the best scaling number in that entire estimate set is quadratic. Be “scared” when a breakthrough in model architecture and pipeline gets near linear.
Ask ChatGPT: “Assume the perspective of an expert in CS and Deep Learning. What are the scaling characteristic (use LLMs and Transformer models if you need to be specific) of deep learning ? Expect answer in terms of Big O notation. Tabulate results in two rows, respectively “training” and “inference”. For columns, provide scaling characteristic for CPU, IO, Network, Disk Space, and time. ”
This should get you big Os for n being the size of input (i.e. context size). You can then ask for follow up with n being the model size.
Spoiler, the best scaling number in that entire estimate set is quadratic. Be “scared” when a breakthrough in model architecture and pipeline gets near linear.