LLM Benchmark Dashboard 2026

📊 Benchmark Explanations

MMLU (Massive Multitask Language Understanding)

Tests knowledge across 57 subjects including math, history, law, medicine, and more. Higher is better. 80%+ = excellent general knowledge, 70-80% = good, below 70% = average.

HumanEval

Measures code generation ability using 164 programming problems. Higher is better. 80%+ = excellent coder, 70-80% = good programming skills, below 70% = basic coding ability.

Total Models

CPU Compatible

New This Month

Best Overall

Gemma 4

Model

Size

RAM

VRAM

MMLU

HumanEval

Speed

Type

Compare

⚖️ Model Comparison

Select models from the table view to compare them here. Best values highlighted in green.

🤖 LLM Benchmark Dashboard

📊 Benchmark Explanations

MMLU (Massive Multitask Language Understanding)

HumanEval

Total Models

CPU Compatible

New This Month

Best Overall

MMLU Score by Model

HumanEval Score by Model

RAM Requirements

Models by Organization

⚖️ Model Comparison

🙏 Open Source Credits