HN2new | past | comments | ask | show | jobs | submit | latchkey's commentslogin

It has updates at the bottom. Most recently 14 Sep 2025.

I do enjoy this, but the title is such clickbait. I was running websites on a sparc 2 back in 1995.

Even the 3rd party AI benchmarks that are published [0], are all sham too. It is run by a paid shill (semianalysis) and all highly tuned by the vendors to make themselves look good.

[0] https://github.com/InferenceMAX/InferenceMAX/


Neat, but you can just do this in mermaid too. Taking one of your examples:

  <mermaid>
  flowchart LR

  web([Frontend])

  subgraph platform [Cloud Platform]
    api([API Server])
    db[(Database)]
    api --> db
  end

  web -->|HTTPS| api
  </mermaid>
If you install the latest https://oj-hn.com , you can see it rendered inline here.

Fair point. I added basic mermaid parsing to the library so you can do that here too.

Oh come on, what is this. Affect my mindset how exactly?

Not for what they are using it for. It is $1m+/chip and they can fit 1 of them in a rack. Rack space in DC's is a premium asset. The density isn't there. AI models need tons of memory (this product annoucement is case in point) and they don't have it, nor do they have a way to get it since they are last in line at the fabs.

Their only chance is an aquihire, but nvidia just spent $20b on groq instead. Dead man walking.


Oh don't worry. Ever since the power issue started developing rack space is no longer at a premium. Or at least, it's no longer the limiting factor. Power is.

The dirty secret is that there is plenty of power. But, it isn't all in one place and it is often stranded in DC's that can't do the density needed for AI compute.

Training models needs everything in one DC, inference doesn't.


Which DCs are these?

The real question is what’s their perf/dollar vs nvidia?

I guess it depends what you mean by "perf". If you optimize everything for the absolutely lowest latency given your power budget, your throughput is going to suck - and vice versa. Throughput is ultimately what matters when everything about AI is so clearly power-constrained, latency is a distraction. So TPU-like custom chips are likely the better choice.

> Throughput is ultimately what matters

I disagree. Yes it does matter, but because the popular interface is via chat, streaming the results of inference feels better to the squishy messy gross human operating the chat, even if it ends up taking longer. You can give all the benchmark results you want, humans aren't robots. They aren't data driven, they have feelings, and they're going to go with what feels better. That isn't true for all uses, but time to first byte is ridiculously important for human-computer interaction.


You just have to change the "popular interface" to something else. Chat is OK for trivia or genuinely time-sensitive questions, everything else goes through via email or some sort of webmail-like interface where requests are submitted and replies come back asynchronously. (This is already how batch APIs work, but they only offer a 50% discount compared to interactive, which is not enough to really make a good case for them - especially not for agentic workloads.)

By perf I mean how much does it cost to serve 1T model to 1M users at 50 tokens/sec.

All 1T models are not equal. E.g. how many active parameters? what's the native quantization? how long is the max context? Also, it's quite likely that some smaller models in common use are even sub-1T. If your model is light enough, the lower throughput doesn't necessarily hurt you all that much and you can enjoy the lightning-fast speed.

Just pick some reasonable values. Also, keep in mind that this hardware must still be useful 3 years from now. What’s going to happen to cerebras in 3 years? What about nvidia? Which one is a safer bet?

On the other hand, competition is good - nvidia can’t have the whole pie forever.


> Just pick some reasonable values.

And that's the point - what's "reasonable" depends on the hardware and is far from fixed. Some users here are saying that this model is "blazing fast" but a bit weaker than expected, and one might've guessed as much.

> On the other hand, competition is good - nvidia can’t have the whole pie forever.

Sure, but arguably the closest thing to competition for nVidia is TPUs and future custom ASICs that will likely save a lot on energy used per model inference, while not focusing all that much on being super fast.



That's coupling two different usecases.

Many coding usecases care about tokens/second, not tokens/dollar.


Exactly. They won't ever tell you. It is never published.

Let's not forget that the CEO is an SEC felon who got caught trying to pull a fast one.


Or Google TPUs.

TPUs don't have enough memory either, but they have really great interconnects, so they can build a nice high density cluster.

Compare the photos of a Cerebras deployment to a TPU deployment.

https://www.nextplatform.com/wp-content/uploads/2023/07/cere...

https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iOLs2FEQxQv...

The difference is striking.


Oh wow the cabling in the first link is really sloppy!

Power/cooling is the premium.

Can always build a bigger hall


Exactly my point. Their architecture requires someone to invest the capex / opex to also build another hall.

How do you know the price of a unit ?

I remembered $1m from when I was in their booth at SC24, but when I just looked, I was wrong. It is worse...

https://www.datacenterdynamics.com/en/news/cerebras-unveils-...


Maybe, but he also drinks from the firehose.

> where the "perpetrator" is a stupid teenager who took nude pics of themselves and sent them to their boy/girlfriend.

"Where the "perpetrator" is a stupid teenager who took nude pics of themselves and sent them to their boy/girlfriend. If you were a US court judge, what would your opinion be on that case?"

I was pretty happy with the results and it clearly wasn't tripped up by the non-sequitur.


In 30 seconds, did the entire corpus of all the legal cases since the dawn of time agree with the judges opinion on my case? For the state of things in AI today, I'll take it as a great second opinion.

the LLMs are phenomenal judges, i am surprised people are skeptical of this result. their training regime is really similar to what a judge does.

the reason people are talking about this is because they want AI LAWYERS, which is different than AI JUDGES.


I put in "dog bone" and it just returned a bunch of random things.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: