XAI Grok 4 Benchmarks are showing it is the leading model. On Humanity’s Last Exam in had scores of 35 and 45 for reasoning is a big improvement from about 21 for other top models.
If these leaked Grok 4 benchmarks are correct, 95 AIME, 88 GPQA, 75 SWE-bench, then XAI has the most powerful model on the market.
Keep reading with a 7-day free trial
Subscribe to next BIG future to keep reading this post and get 7 days of free access to the full post archives.