Featured Post

Acquire to Accelerate: Inside Accenture's Strategic Growth Engine

Background: From Andersen Consulting to AI PowerhouseAccenture’s journey began in 1957 as the business and technology consulting division o...

Friday, January 31, 2025

Deepseek's Architecture Adaptation of Export Controls

Deep Seek's GPU Infrastructure

  • Initially acquired 10,000 GPUs in 2021
  • Estimated to have grown to around 50,000 GPUs in total
  • Used 2,000 H800 GPUs specifically for V3 model pre-training
  • Share infrastructure with their quantitative trading fund operations

Initial Export Control Framework

  • US government initially restricted two parameters:
    • Computing power (FLOPS)
    • Interconnect bandwidth between GPUs
  • This two-factor restriction created an opportunity for optimisation

H800 GPU Restrictions and Adaptations

  • H800 was China's version of the H100 GPU
  • Two key restriction factors from the US government:
    • Chip compute (FLOPS)
    • Interconnect bandwidth
  • H800 was designed with:
    • Full FLOPS capability (same as H100)
    • Restricted interconnect bandwidth
  • Deep Seek developed specialized SM (Streaming Multiprocessor) scheduling techniques to work around interconnect limitations
  • Managed to achieve full GPU utilisation despite interconnect restrictions



Export Control Evolution

  1. First Phase:
    • Dual restrictions on FLOPS and interconnect
    • H800 was allowed in China with limited interconnect
  2. Second Phase:
    • The government identified flaws in the dual-restriction approach
    • Simplified to focus only on FLOPS restrictions
    • H800 eventually banned completely in late 2023

H20 Architecture Adaptation

  • Newer H20 chip designed specifically for the Chinese market:
    • Has restricted FLOPS (to comply with controls)
    • Improved memory bandwidth and capacity
    • Maintained interconnect capabilities
    • In some ways performs better than H100 on memory operations
Source: Gemini, Seekingalpha, Forrester, SemiAnalysis


No comments: