A
AI Benchmarks
Live Data

Methodology

All benchmark data is independently measured by Artificial Analysis. Pricing data comes from OpenRouter.

🧠
Intelligence Index
A composite score aggregating 10 independent evaluations including GPQA Diamond, MMLU-Pro, HLE, AIME 2025, LiveCodeBench, SciCode, IFBench, AA-LCR, AA-Omniscience, and Terminal-Bench Hard. Higher is better.
💻
Coding Index
Evaluated on LiveCodeBench (programming problems from LeetCode, AtCoder, Codeforces) and SciCode (Python for scientific computing). Measures real-world software engineering capability.
📐
Math Index
Evaluated on AIME 2025 (30 problems from American Invitational Mathematics Examination) and MATH-500. Tests olympiad-level mathematical reasoning.
Output Speed
Median output tokens per second, measured with a medium-length prompt across multiple providers. Represents real-world user experience.
💰
Price per 1M Tokens
Input and output token prices from OpenRouter's public API. Blended price = (input × 3 + output × 1) / 4, representing a typical 3:1 input/output ratio.
🏆
Value Score
Intelligence Index divided by blended price per 1M tokens. Higher = better performance per dollar. Models with exceptional value scores (>800) earn the 'Best Value' badge.
🔄
Update Frequency
Data is refreshed automatically via GitHub Actions every Monday and Thursday at 06:00 UTC. Performance benchmarks from Artificial Analysis are updated when new evaluations are published.
Attribution
Intelligence and performance data: Artificial Analysis (artificialanalysis.ai)
Pricing data: OpenRouter (openrouter.ai/api/v1/models)