| Ranking | Model | Performance Score | Accuracy Score | Popularity Score |
|---|---|---|---|---|
| 1 | GPT-5.2 Pro (OpenAI) | 9.8 | 9.7 | 10 |
| 2 | Gemini 3 Pro (Google) | 9.6 | 9.5 | 9.3 |
| 3 | Claude Opus 4.6 (Anthropic) | 9.5 | 9.4 | 9.2 |
| 4 | Claude Sonnet 4.6 (Anthropic) | 9.3 | 9.2 | 9.0 |
| 5 | GPT-5.2 (OpenAI) | 9.2 | 9.1 | 9.0 |
| 6 | Grok-4.1 (xAI) | 8.9 | 8.8 | 8.6 |
| 7 | DeepSeek V3.1 | 8.6 | 8.5 | 8.3 |
| 8 | Llama 4-Scout (Meta) | 8.3 | 8.2 | 8.0 |
| 9 | Gemini 3 Flash (Google) | 8.0 | 7.9 | 7.8 |
| 10 | Qwen3-Max (Alibaba) | 7.8 | 7.7 | 7.5 |
Here’s an updated AI Large Language Model Ranking (2026) based on the most recent benchmark & leaderboard data available as of late February 2026. This ranking synthesizes multiple independent sources — including LMArena community Elo scores, MMLU/GPQA accuracy metrics, and overall performance evaluations — to give you a rounded view of how leading models stack up in performance, accuracy, and popularity/usage.
List created and curated by ChatGPT.
Last updated February 25, 2026.