Still parroting the same uninformed takes from January >DeepSeek claimed to have...

yorwba · on May 14, 2025

What DeepSeek actually claimed:

"DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data." https://arxiv.org/pdf/2412.19437

They don't even claim to have spent $5M (since they own their GPUs instead of renting them by the hour), it's a purely notional figure suitable for an academic paper. But when R1 got released and started generating hype, it was the only dollar figure anyone had to go on, so it got interpreted as more significant than it is.

tempeler · on May 14, 2025

The accuracy of this comparison is highly speculative. One should not ignore the possibility that dominant firms in the market might be inflating their cost figures to block new entrants and extract more capital from investors through such narratives. When you compare electricity prices in China with those in the U.S., such a large gap would require a truly extraordinary breakthrough to be justified.After all, these are privately held commercial firms, and they are not obligated to disclose their financials accurately.

jiggawatts · on May 14, 2025

Electricity is the cheapest input by far! People and large networked GPU clusters are far more expensive.

Typical models are now trained on clusters of roughly 20K GPUs. Even if you get a volume discount you still need cabling, switches, etc…

The minimum entry price to play in the game at this level is about 200-500 million dollars.

Meta spent something like $10B on their AI compute, for comparison.

SV_BubbleTime · on May 14, 2025

Agreed. For all anyone really knows, it was 5% for OpenAI compared to Deepseek.

If you believe that Deepseek was released to undercut US AI value (duh) it makes no sense to take the official line as the absolute truth.

charleshn · on May 14, 2025

Indeed, see https://semianalysis.com/2025/01/31/deepseek-debates/ for a well researched article about the actual cost.

oefrha · on May 14, 2025

The provided source for every concrete figure on DS in that article is "we are confident that", "we believe" or something equivalent. How is it any better researched than any other article with a conflicting set of beliefs?

Der_Einzige · on May 14, 2025

There are times when "just trust me bro" is okay. Semianalysis articles are one of those times. You are free to pull the contrarian "source please" shit, but the reality is that they are far more accurate at most types of GPU, Cloud, or AI analysis than almost anyone on this websites or anywhere else.

Just trust them bro. Unironically.

oefrha · on May 14, 2025

We're talking about a narrative that's affecting tens of billions of investment dollars, possibly more. I'm not gonna "trust me bro" anyone on this.

yorwba · on May 14, 2025

Then you definitely shouldn't trust the unsourced, false claim that "DeepSeek claimed to have built its base model for about 5% of the estimated cost of GPT-4." DeepSeek never claimed this.

Der_Einzige · on May 14, 2025

Don't act like the ambiguity of their language wasn't intentional. It was. They on purpose worded it in the paper so that investors mass freaked out about the idea of GPT-4 for a few million while AI researchers laughed as they read the "this is only the final training run" gigantic astrik.

oefrha · on May 14, 2025

Yes, I don’t trust that either.

saagarjha · on May 14, 2025

And you expect a random Hacker News commenter to tell you how they're allocating their billions?

oefrha · on May 14, 2025

Don’t know what you’re talking about. I read an HN comment saying here’s a well researched article, followed the link, and found a trust-me-bro article. Where did I ask for advice from an HN commenter?

saagarjha · on May 19, 2025

You aren't going to get non-"trust me bro" insight except by asking Sam Altman himself. I assume he has more interesting things to do with his time at the moment. Barring that, you're going to get the guy who wines and dines GPU suppliers and sees if he can connect any crumbs of information that drop.

AustinCarrBW · on May 14, 2025

This is in the Businessweek story

juujian · on May 14, 2025

What do we know now that we did not know in Jan? Is there some information on this that I have missed?

fzzzy · on May 14, 2025

They didn't include the costs for developing v3, the base model.

[edit] also they seem to be saying r1 is a base model, which it is not. Very sloppy.

irjustin · on May 14, 2025

Yeah apparently they parked the cost of hardware, 50k GPUs and model development underneath another entity, high-flyer because it was "shared resource".

xnx · on May 14, 2025

Didn't they also train off of ChatGPT API output?

nextaccountic · on May 14, 2025

I don't see the issue here. OpenAI trained ChatGPT off my own comments, and your comments, and the comments from the person you replied to.. I didn't authorize it and you probably didn't too.

Meta was caught pirating over 80TB of books to train their AI, and they are claiming not only training AI on other people's stuff is legal, but piracy is also legal (well at least, piracy done by US tech giants is legal)

xnx · on May 14, 2025

For sure. I was just pointing out that DeepSeek is not going to "beat" ChatGPT if DeepSeek relies on it.

natrys · on May 14, 2025

You could maybe make that accusation about V3 (to the extent that it's a bad thing and not fair use, specially considering amoral origin of OpenAI's models in first place), but don't think the claim makes sense for R1 since OpenAI's o1 did not expose its CoT traces even in API.

They published about GRPO (key algorithm behind R1) a full year before[1] they scaled it for R1. Given the research they do in open, it's not far-fetched to think they had the talent and technical know-how to achieve R1 on their own.

[1] https://arxiv.org/abs/2402.03300

ipsum2 · on May 14, 2025

This is a rumor that has not been confirmed by OpenAI nor DeepSeek.

AustinCarrBW · on May 14, 2025

You're misreading. The article is referring to V3 when it cites the base model behind R1. It does not say R1 is the base model.

zeroq · on May 14, 2025

I'll put my tinfoil hat on and say it plays to the current US vs China "propaganda" tune, that US is winning on all fronts, but the ice thin and have to support local tech behemoths to full extent to secure our position in this world defining struggle.

cookiemonsieur · on May 14, 2025

It's not US vs China. It increasingly looks like China VS a conglomerate of multi national companies with foreign born billionaire CEOs whose HQs happen to be located in the USA.

cedws · on May 14, 2025

China vs Chinese migrants in the US

AustinCarrBW · on May 14, 2025

The article specifically says "it's likely this sum referred only to the final training run—a data-refinement process that transforms a model’s previous prototypes into a complete product—but many people perceived it as an insanely low budget for the entire project." The article also delves into the SemiAnalysis report, as well as denials from ex-DeepSeek employees.

Analemma_ · on May 14, 2025

Bloomberg still has not retracted (or even really commented on) the Supermicro spy chip story, preferring to hope people just forget about it if they maintain total silence. They're fine if you need to look up where the Nasdaq closed yesterday, but don't expect serious tech reporting from them.

protimewaster · on May 14, 2025

> not retracted (or even really commented on) the Supermicro spy chip story

They doubled down on it. They did a follow-up claiming that a cyber security researcher from a US-based firm had been called in to investigate suspicious traffic at a US telecom. The investigators claimed to have logs and a bunch of other evidence. The investigators also claimed that Bloomberg was misleading people by focusing on SuperMicro, as they'd reportedly seen to with other manufacturers too.

NicoJuicy · on May 14, 2025

Hikivision literally sends hidden packages from their cameras to China.

Discovered by our security team where I work. It's the reason our VMS doesn't have support for Hikivision cameras.

ta20240528 · on May 14, 2025

"Discovered by our security team where I work"

Any published where?

NicoJuicy · on May 14, 2025

Internal audit in the company for customers ( security Company).

A couple of years later, our US customer had the same conclusion. Probably that one was published ( US government)

ta20240528 · on May 21, 2025

Anecdotes are not data.

Really - they aren't.

NicoJuicy · on May 24, 2025

We literally have a hardware device that removes any "spyware" from communicating externally because of it.

People in the security industry ( cameras) may know it.

mannyv · on May 14, 2025

It's interesting because servethehome.com found some oddly labeled chips at one point:

https://www.servethehome.com/dude-dell-hpe-ami-american-mega...

But yeah, saying the chips are everywhere is BS.

Spooky23 · on May 14, 2025

You’re really naive. Bloomberg is one of the better news outlets, and I would put no weight on Supermicro’s denials, as they have a pattern of lying about financials and have a sweepingly broad supply chain vulnerable to sabotage and counterfeiting.

The Federal government and some banks hire companies to do supply chain integrity inspection and management. They find bad parts all of the time, especially in the channel.

There’s a pretty obvious reason why they wouldn’t want to talk about a detected case of foreign espionage embedded in servers after publishing.