Deep Seek's GPU Infrastructure
- Initially acquired 10,000 GPUs in 2021
- Estimated to have grown to around 50,000 GPUs in total
- Used 2,000 H800 GPUs specifically for V3 model pre-training
- Share infrastructure with their quantitative trading fund operations
Initial Export Control Framework
- US government initially restricted two parameters:
- Computing power (FLOPS)
- Interconnect bandwidth between GPUs
- This two-factor restriction created an opportunity for optimisation
H800 GPU Restrictions and Adaptations
- H800 was China's version of the H100 GPU
- Two key restriction factors from the US government:
- Chip compute (FLOPS)
- Interconnect bandwidth
- H800 was designed with:
- Full FLOPS capability (same as H100)
- Restricted interconnect bandwidth
- Deep Seek developed specialized SM (Streaming Multiprocessor) scheduling techniques to work around interconnect limitations
- Managed to achieve full GPU utilisation despite interconnect restrictions
- Chip compute (FLOPS)
- Interconnect bandwidth
- Full FLOPS capability (same as H100)
- Restricted interconnect bandwidth
Export Control Evolution
- First Phase:
- Dual restrictions on FLOPS and interconnect
- H800 was allowed in China with limited interconnect
- Second Phase:
- The government identified flaws in the dual-restriction approach
- Simplified to focus only on FLOPS restrictions
- H800 eventually banned completely in late 2023
H20 Architecture Adaptation
- Newer H20 chip designed specifically for the Chinese market:
- Has restricted FLOPS (to comply with controls)
- Improved memory bandwidth and capacity
- Maintained interconnect capabilities
- In some ways performs better than H100 on memory operations
Source: Gemini, Seekingalpha, Forrester, SemiAnalysis
Comments