Skip to main content

Deepseek's Architecture Adaptation of Export Controls

Deep Seek's GPU Infrastructure

  • Initially acquired 10,000 GPUs in 2021
  • Estimated to have grown to around 50,000 GPUs in total
  • Used 2,000 H800 GPUs specifically for V3 model pre-training
  • Share infrastructure with their quantitative trading fund operations

Initial Export Control Framework

  • US government initially restricted two parameters:
    • Computing power (FLOPS)
    • Interconnect bandwidth between GPUs
  • This two-factor restriction created an opportunity for optimisation

H800 GPU Restrictions and Adaptations

  • H800 was China's version of the H100 GPU
  • Two key restriction factors from the US government:
    • Chip compute (FLOPS)
    • Interconnect bandwidth
  • H800 was designed with:
    • Full FLOPS capability (same as H100)
    • Restricted interconnect bandwidth
  • Deep Seek developed specialized SM (Streaming Multiprocessor) scheduling techniques to work around interconnect limitations
  • Managed to achieve full GPU utilisation despite interconnect restrictions



Export Control Evolution

  1. First Phase:
    • Dual restrictions on FLOPS and interconnect
    • H800 was allowed in China with limited interconnect
  2. Second Phase:
    • The government identified flaws in the dual-restriction approach
    • Simplified to focus only on FLOPS restrictions
    • H800 eventually banned completely in late 2023

H20 Architecture Adaptation

  • Newer H20 chip designed specifically for the Chinese market:
    • Has restricted FLOPS (to comply with controls)
    • Improved memory bandwidth and capacity
    • Maintained interconnect capabilities
    • In some ways performs better than H100 on memory operations
Source: Gemini, Seekingalpha, Forrester, SemiAnalysis


Comments