That was tremendously more informative than I expected from a post entitled “FAQ”. A great dive into everything surrounding DeepSeek, including technically how it was pulled off. Thanks for sharing!
For anyone who’s not familiar with the blog (like i was), the article focuses not only on the technical aspect, but also on the economic aspects and especially on economic politics. The author is a business analyst and essentially proposes more free trade in regards to China and is also of the opinion that AGI is close. Both of which I not fully agree with. I still appreciated the technical details though.
Just piggybacking on this to say that Ben Thompson’s blog and various podcasts are definitely worth following even if you’re not politically aligned with his takes, his insights are really in-depth and high quality.
In that context the level of censorship is certainly an interesting aspect to consider as well. LLMs being tweaked to or trained on biased data can certainly have additionally negative consequences. Whether it’s minority protection or historical revisionism, they’re all encoded into the model.
DeepSeek actually programmed 20 of the 132 processing units on each H800 specifically to manage cross-chip communications. This is actually impossible to do in CUDA. DeepSeek engineers had to drop down to PTX, a low-level instruction set for Nvidia GPUs that is basically like assembly language. This is an insane level of optimization that only makes sense if you are using H800s.
Is it insane to drop to assembly? Were past mega training jobs not optimizing the hell out of it in order to save half a billion?
Nope. H100s were prohibited by the chip ban, but not H800s. Everyone assumed that training leading edge models required more interchip memory bandwidth, but that is exactly what DeepSeek optimized both their model structure and infrastructure around.
Again, just to emphasize this point, all of the decisions DeepSeek made in the design of this model only make sense if you are constrained to the H800; if DeepSeek had access to H100s, they probably would have used a larger training cluster with much fewer optimizations specifically focused on overcoming the lack of bandwidth.
This was amusing to read and feels almost like the premise for a novel. Country bans export of important resource. Opposition discovers a way to use a cheaper resource to reach parity.
And maybe (although probably not) the public-domain implementation of SMB, which I mention here as a good excuse to ask the following question:
Somewhere online, Andrew Tridgell tells the story of how Samba was inspired by Linus Torvalds being bitten by a penguin at Canberra Zoo, but I’ve lost my bookmark to it and can’t find it. Does anyone have a reference to it please?
Still looking for what tridge has written about it, in case any of youse know. I’m pretty sure he’s said something about how it (also) relates to Samba.
Rumor out there is that lots of Chinese companies have bought plenty of Nvidia hardware via Singapore. There are few users of Nvidia cards in that location, yet it accounts for 15% of Nvidia’s worldwide sales.
I am not saying those are DeepSeek’s. I actually believe the architecture and training scheme they use could be more efficient. Given how novel LLMs are, few things have been attempted at scale, so it’s logical there is some low-hanging fruit in terms of performance improvements.
Not saying it’s not China, but there are a lot of wholesalers in Singapore that distribute to APAC. As an Australian consumer it can often be cheapest to get grey market electronics from Singapore and/or HK - and local NVIDIA stock levels have been pretty tight for quite a while now.
Rumor out there is that lots of Chinese companies have bought plenty of Nvidia hardware via Singapore. There are few users of Nvidia cards in that location, yet it accounts for 15% of Nvidia’s worldwide sales.
Few users? Singapore has a huge tech footprint, with both data centres and fully staffed offices for every major player.
Meanwhile HK has confirmed turning a blind eye to sanctions. And they’re in the process of losing the last of their sweetheart trade benefits with the west as a result.
Louisiana isn’t Hong Kong and the US isn’t China, and both the domestic and international frameworks that create Louisiana and Hong Kong are barely comparable. Case in point: Louisiana hasn’t received special carveouts in both international treaties and bilateral trade policies.
Look, I’m happy to talk about this in excruciating detail, but I can’t tell what if any background you have in this topic?
I’m not sure this level of snark is appropriate given that you’re flatly wrong: the A100/H100/A800/H800 export controls apply to China and Hong Kong. While Chinese companies continue to be able to acquire them through various means (and they’re not illegal in China, just harder to get due to the American export controls), there is no difference in American export controls on GPUs between China and Hong Kong.
HK has been treated as a part of China since the 2020 EO, an EO that was renewed every year by Biden too.
However, the export controls themselves are targeted mostly at Chinese based entities. (Here’s the Federal Register link.) Moreover, HK based entities still have a heap of exemptions and licenses.
But, back to the original point, as you said the controls have only raised the price and difficulty. And no one just straight buys a bunch of a controlled item and reports it as a direct shipment. Look at how HK copped another round when heaps of transshipment for Russia against Ukraine was revealed.
Basically, export controls are not a simple binary. I say this, having inadvertently ran into them a few times.
This jumped out as me as well. I had just read DHH describing how constraints ended up contributing to success in Founders at Work last week, so the concept was on my mind.
That was tremendously more informative than I expected from a post entitled “FAQ”. A great dive into everything surrounding DeepSeek, including technically how it was pulled off. Thanks for sharing!
For anyone who’s not familiar with the blog (like i was), the article focuses not only on the technical aspect, but also on the economic aspects and especially on economic politics. The author is a business analyst and essentially proposes more free trade in regards to China and is also of the opinion that AGI is close. Both of which I not fully agree with. I still appreciated the technical details though.
Just piggybacking on this to say that Ben Thompson’s blog and various podcasts are definitely worth following even if you’re not politically aligned with his takes, his insights are really in-depth and high quality.
In that context the level of censorship is certainly an interesting aspect to consider as well. LLMs being tweaked to or trained on biased data can certainly have additionally negative consequences. Whether it’s minority protection or historical revisionism, they’re all encoded into the model.
Is it insane to drop to assembly? Were past mega training jobs not optimizing the hell out of it in order to save half a billion?
No, it’s certainly not the first project to use PTX, especially for multi-node training.
This was amusing to read and feels almost like the premise for a novel. Country bans export of important resource. Opposition discovers a way to use a cheaper resource to reach parity.
It reminds me of the innovation coming out of Australia (rsync, squid, transparent proxying) because we were paying 19c/meg for international traffic.
And maybe (although probably not) the public-domain implementation of SMB, which I mention here as a good excuse to ask the following question:
Somewhere online, Andrew Tridgell tells the story of how Samba was inspired by Linus Torvalds being bitten by a penguin at Canberra Zoo, but I’ve lost my bookmark to it and can’t find it. Does anyone have a reference to it please?
My memory is that the penguin bite inspired the Linux logo. I thought tridge was already working on Samba at that point but I could be wrong.
Ah yes. Thank you!
Still looking for what tridge has written about it, in case any of youse know. I’m pretty sure he’s said something about how it (also) relates to Samba.
Rumor out there is that lots of Chinese companies have bought plenty of Nvidia hardware via Singapore. There are few users of Nvidia cards in that location, yet it accounts for 15% of Nvidia’s worldwide sales.
I am not saying those are DeepSeek’s. I actually believe the architecture and training scheme they use could be more efficient. Given how novel LLMs are, few things have been attempted at scale, so it’s logical there is some low-hanging fruit in terms of performance improvements.
Not saying it’s not China, but there are a lot of wholesalers in Singapore that distribute to APAC. As an Australian consumer it can often be cheapest to get grey market electronics from Singapore and/or HK - and local NVIDIA stock levels have been pretty tight for quite a while now.
Few users? Singapore has a huge tech footprint, with both data centres and fully staffed offices for every major player.
Meanwhile HK has confirmed turning a blind eye to sanctions. And they’re in the process of losing the last of their sweetheart trade benefits with the west as a result.
HK is part of China, for good or ill. Didn’t the sanctions apply there too?
No, because not everyone has got the message that One Country, Two Systems is dead.
I mean Louisiana uses the Napoleonic code rather than common law. Doesn’t mean there are border checks.
Louisiana isn’t Hong Kong and the US isn’t China, and both the domestic and international frameworks that create Louisiana and Hong Kong are barely comparable. Case in point: Louisiana hasn’t received special carveouts in both international treaties and bilateral trade policies.
Look, I’m happy to talk about this in excruciating detail, but I can’t tell what if any background you have in this topic?
I’m not sure this level of snark is appropriate given that you’re flatly wrong: the A100/H100/A800/H800 export controls apply to China and Hong Kong. While Chinese companies continue to be able to acquire them through various means (and they’re not illegal in China, just harder to get due to the American export controls), there is no difference in American export controls on GPUs between China and Hong Kong.
HK has been treated as a part of China since the 2020 EO, an EO that was renewed every year by Biden too.
However, the export controls themselves are targeted mostly at Chinese based entities. (Here’s the Federal Register link.) Moreover, HK based entities still have a heap of exemptions and licenses.
But, back to the original point, as you said the controls have only raised the price and difficulty. And no one just straight buys a bunch of a controlled item and reports it as a direct shipment. Look at how HK copped another round when heaps of transshipment for Russia against Ukraine was revealed.
Basically, export controls are not a simple binary. I say this, having inadvertently ran into them a few times.
This jumped out as me as well. I had just read DHH describing how constraints ended up contributing to success in Founders at Work last week, so the concept was on my mind.