"DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of
the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that
the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs
associated with prior research and ablation experiments on architectures, algorithms, or data."https://arxiv.org/pdf/2412.19437
They don't even claim to have spent $5M (since they own their GPUs instead of renting them by the hour), it's a purely notional figure suitable for an academic paper. But when R1 got released and started generating hype, it was the only dollar figure anyone had to go on, so it got interpreted as more significant than it is.
The accuracy of this comparison is highly speculative. One should not ignore the possibility that dominant firms in the market might be inflating their cost figures to block new entrants and extract more capital from investors through such narratives. When you compare electricity prices in China with those in the U.S., such a large gap would require a truly extraordinary breakthrough to be justified.After all, these are privately held commercial firms, and they are not obligated to disclose their financials accurately.
The provided source for every concrete figure on DS in that article is "we are confident that", "we believe" or something equivalent. How is it any better researched than any other article with a conflicting set of beliefs?
There are times when "just trust me bro" is okay. Semianalysis articles are one of those times. You are free to pull the contrarian "source please" shit, but the reality is that they are far more accurate at most types of GPU, Cloud, or AI analysis than almost anyone on this websites or anywhere else.
Then you definitely shouldn't trust the unsourced, false claim that "DeepSeek claimed to have built its base model for about 5% of the estimated cost of GPT-4." DeepSeek never claimed this.
Don't act like the ambiguity of their language wasn't intentional. It was. They on purpose worded it in the paper so that investors mass freaked out about the idea of GPT-4 for a few million while AI researchers laughed as they read the "this is only the final training run" gigantic astrik.
Don’t know what you’re talking about. I read an HN comment saying here’s a well researched article, followed the link, and found a trust-me-bro article. Where did I ask for advice from an HN commenter?
You aren't going to get non-"trust me bro" insight except by asking Sam Altman himself. I assume he has more interesting things to do with his time at the moment. Barring that, you're going to get the guy who wines and dines GPU suppliers and sees if he can connect any crumbs of information that drop.
Yeah apparently they parked the cost of hardware, 50k GPUs and model development underneath another entity, high-flyer because it was "shared resource".
I don't see the issue here. OpenAI trained ChatGPT off my own comments, and your comments, and the comments from the person you replied to.. I didn't authorize it and you probably didn't too.
Meta was caught pirating over 80TB of books to train their AI, and they are claiming not only training AI on other people's stuff is legal, but piracy is also legal (well at least, piracy done by US tech giants is legal)
You could maybe make that accusation about V3 (to the extent that it's a bad thing and not fair use, specially considering amoral origin of OpenAI's models in first place), but don't think the claim makes sense for R1 since OpenAI's o1 did not expose its CoT traces even in API.
They published about GRPO (key algorithm behind R1) a full year before[1] they scaled it for R1. Given the research they do in open, it's not far-fetched to think they had the talent and technical know-how to achieve R1 on their own.
I'll put my tinfoil hat on and say it plays to the current US vs China "propaganda" tune, that US is winning on all fronts, but the ice thin and have to support local tech behemoths to full extent to secure our position in this world defining struggle.
It's not US vs China. It increasingly looks like China VS a conglomerate of multi national companies with foreign born billionaire CEOs whose HQs happen to be located in the USA.
The article specifically says "it's likely this sum referred only to the final training run—a data-refinement process that transforms a model’s previous prototypes into a complete product—but many people perceived it as an insanely low budget for the entire project." The article also delves into the SemiAnalysis report, as well as denials from ex-DeepSeek employees.
Bloomberg still has not retracted (or even really commented on) the Supermicro spy chip story, preferring to hope people just forget about it if they maintain total silence. They're fine if you need to look up where the Nasdaq closed yesterday, but don't expect serious tech reporting from them.
> not retracted (or even really commented on) the Supermicro spy chip story
They doubled down on it. They did a follow-up claiming that a cyber security researcher from a US-based firm had been called in to investigate suspicious traffic at a US telecom. The investigators claimed to have logs and a bunch of other evidence. The investigators also claimed that Bloomberg was misleading people by focusing on SuperMicro, as they'd reportedly seen to with other manufacturers too.
You’re really naive. Bloomberg is one of the better news outlets, and I would put no weight on Supermicro’s denials, as they have a pattern of lying about financials and have a sweepingly broad supply chain vulnerable to sabotage and counterfeiting.
The Federal government and some banks hire companies to do supply chain integrity inspection and management. They find bad parts all of the time, especially in the channel.
There’s a pretty obvious reason why they wouldn’t want to talk about a detected case of foreign espionage embedded in servers after publishing.
>DeepSeek claimed to have built its base model for about 5% of the estimated cost of GPT-4