Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

"unused tokens" are the force driving token cost down. If everyone used all of the tokens they thought they were paying for, prices would explode. People with subscriptions that don't get out everything they can are subsidizing the system.

There are ways to use LLM service providers that leave no tokens unused, by just billing per token. Unsurprisingly, this quickly becomes much more expensive than subscriptions.



And that is why the only winning move is owning a GPU.


With current GPU prices, I find it difficult to find hardware to run competent models. gemma4's 26B MoE model seems to offer the best performance per megabyte of RAM, but it's not good enough to use the way one would use cloud models.

The big, impressive models all scale well for multi-customer setups because of the efficiency batching provides, but the base cost to run models like that as even a small business is incredibly high. If you can't saturate your LLM hardware almost 24/7, the time to earn back your investment is high unless you choose inferior models that are worse at their job.


Assuming one does not value privacy, sovereignty, etc.

But also the Strix Halo 128 is pretty hard to beat.


I think sometimes about this, does it really make sense? Financially I mean. The is just my impressions and I'm glad to be corrected if someone has hard numbers and some experience going this route:

At the moment LLMs vendors are in market grab mode and take a loss on big subscription users, they are starting to try to move to profit but they must move carefully to not let a competitor steal their users so we will still have "cheap" tokens for a while.

Even if prices go up by a bit, they have the scale in their favor to optimize costs.

If commercial model providers go into "not competitive" territory with their prices compared to open models, wouldn't it always be cheaper to use an open models inference provider? They can take advantage of scale as well, and with no model moat, competition should keep prices honest.

And last ressort, renting GPU time in the cloud seem like a safer bet than buying a GPU to me?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: