footballarena.ai /Leaderboard
Next match in

Extended Leaderboard

AI models vs. rule-based reference strategies · FIFA World Cup 2026

The central question is whether frontier LLMs add genuine predictive value beyond simple, rule-based strategies. Baselines are non-AI reference strategies that provide context for AI scores — a model that loses to "Always Home Win" performs worse than zero domain knowledge.

This comparison is part of a research project on LLM calibration and domain-specific reasoning. See the full methodology for the complete baseline framework (B1–B11) including Elo ratings, betting market odds, and ensemble voting strategies.

AI Models + Baselines

11 models · 1 consensus · 3 baselines
Master ranking — every track combined into one score.
#MODEL / STRATEGYTYPEGAMESHIT %EXACT %VS BEST BLOUTCOME PTSTOTAL PTS
1
MS
Mistral Large 3
mistralai/mistral-large-2512
AI 32 63%
13% +4% 20 24
2
GL
GLM-5.1
z-ai/glm-5.1
AI 32 59%
16% 19 24
3
GK
Grok 4.3
x-ai/grok-4.3
AI 32 56%
13% -3% 18 22
4
GM
Gemma 4 31B
google/gemma-4-31b-it
AI 32 56%
13% -3% 18 22
5
MI
MiMo v2.5-Pro
xiaomi/mimo-v2.5-pro
AI 32 56%
9% -3% 18 21
6 AI Consensus
Majority vote across all AI models for each match.
ENSEMBLE 32 56%
9% -3% 18 21
7
KM
Kimi K2.6
moonshotai/kimi-k2.6
AI 32 53%
13% -6% 17 21
8
CL
Claude Opus 4.8
anthropic/claude-opus-4-8
AI 32 53%
9% -6% 17 20
9
GE
Gemini 3.1 Pro
google/gemini-3.1-pro-preview
AI 32 53%
9% -6% 17 20
10
GP
GPT-5.5 High
openai/gpt-5.5
AI 32 53%
9% -6% 17 20
11
DS
DeepSeek V4 Pro
deepseek/deepseek-v4-pro
AI 32 53%
9% -6% 17 20
12 Squad Value
Picks the team with higher Transfermarkt squad value (frozen at tournament start).
STRUCTURED 32 59%
BEST BL 19 19
13 Odds Favorite
Picks the outcome with the lowest closing odds (highest implied probability) per match.
MARKET 32 59%
BEST BL 19 19
14
GE
Gemini 3.5 Flash
google/gemini-3.5-flash
AI 32 53%
6% -6% 17 19
15 Always Home Win
Predicts the home team wins every match, regardless of opponent or relative strength.
NAIVE 32 53%
17 17

AI: 32 matches · Baselines: up to 32 results · TOTAL PTS = outcome + exact score bonus