Jensen Huang the CEO of Nvidia described how xAI built a 100,000 Nvidia H100 GPU AI cluster in 19 days. Other customers need a year (365 days or about 20 times as long) to install the same size AI cluster. Jensen called the xAI achievement superhuman. The customers of Nvidia GPUs have to work with the Nvidia hardware, software, data center and network engineering teams. The customers have to work with Nvidia for the project plans and timelines.
This advantage will be capitalized by xAI in the 2025-2026 timeframe for a 10-20X advantage in training AI. I describe how the procurement of Nvidia B200s is happening now and when they will installed and ready to start training by the different companies.
xAI will have its first B200 GPU LLM installed, trained and released before OpenAI and Meta have finished installing their B200s. xAI will be done before OpenAI and Meta reach the starting line for training. xAI can triple or quadruple their training time and increase the overall training and still beat the others to market.
This xAI AI data center build advantage will last for the next two years or more. Why do I assert this? The AI data center builds involve billion dollar and soon multi-billion dollar projects. There are perhaps thousands of people involved. It is comparable to building a skyscraper or a gigawatt coal or natural gas power plant. In large and complex infrastructure energy and IT projects, you will not be able to greatly change the construction time of your next skyscraper or coal plant.
Dr Alan Thompson has the statistics and track record for the recent large language model projects. Nvidia has provided us with installation and setup times before the training is started. The training time is the wall clock (days). The chip count and chip type tell us the scale of each LLM build. The total time (years) is the chip years.
OpenAI and Meta are taking about one year to setup their large GPU clusters while xAI is taking 19 days. xAI even with planning and other activities is take 122 days to complete the entire installation and setup.
OpenAI is taking 9 months for post-training, testing and certifying new models like GPT 4. OpenAI has taken at least 6 months to test and certify GPT 5 because they finished training in April, 2024.
Meta took 3 months to for post-training, testing and certifying Llama 3.
xAI took 2 months to for post-training, testing and certifying Grok 2.
Nvidia B200 Deliveries over the Next 6 Months
Microsoft and OpenAI have the major orders and early deliveries for Nvidia B200 chips.
Blackwell chip production ramp-up began in early 4Q24. Considering yield rates and testing efficiency, estimated shipments are about 150,000–200,000 units in 4Q24, with significant growth projected at 200–250% QoQ to 500,000–550,000 units in Q1 2025. Microsoft is currently the most aggressive customer in procuring GB200.
There will be 650,000 to 750,000 Nvidia B200s delivered by the end of March 2025. The first 100,000 units could be delivered to Microsoft and OpenAI by Dec 2024. This would then take 12 months to install and setup because of the difficulty in radically changing and accelerating large and complex projects.
xAI would likely get 300,000 B200s by the end the end of February 2025. Meta would get their 100,000 B200s in January.
LLM Versions are about 3-12X more Total Training Time than Prior Version
The biggest leap in total training time was between GPT 3 and GPT 4. This was about a 12X leap in training from 400 to 6000. However, the time between GPT 3 and GPT 4 stretched to 2 years. The 3X improvement in total training time is where Nvidia chips can be delivered faster and the wall clock training time is 50 days. We are in a faster upgrade cycle so 3X total train time increase is the target between versions. If the overall project is taking longer to install then the company could go for a bigger leap in total training time.
Put All of This Together Lets us Determine When LLM Will Be Delivered
We know within 1-3 months when Nvidia B200s will be supplied in 100k, 300k and 1M volumes. We know the installation and setup time is about 19 days (5000 chips per day) for xAI and 365 days (300 chips per day) for OpenAI/Microsoft and Meta. We know wha they will target for Chip Year increases which determines the training time.
xAI will be releasing upgraded models 2-3 times faster than OpenAI and Meta.
xAI will get the lead in early 2025 and then will massively pull away in 2026.
Grok 6 using 300,000 B200s with 150 days of training will be 10-20 times more training than systems from Meta Llama 5 and OpenAI GPT 6. xAI will beat Meta and OpenAI by the installation and the training for B200 trained models.
Keep reading with a 7-day free trial
Subscribe to next BIG future to keep reading this post and get 7 days of free access to the full post archives.