Methodology

Blind listening tests

Provider and model names are hidden until after you vote. This prevents bias from brand recognition or prior experience. You judge purely on how the voice sounds.

Randomized A/B order

The order of the two clips (A vs B) is randomized for each round. This reduces position bias—the tendency to prefer the first or second option regardless of quality.

Controlled comparison

Both clips use the same sentence. This keeps the comparison fair: you're evaluating how different models render the same text, not different content.

ELO rating system

Rankings update using an ELO system, similar to chess. When you vote for one model over another, their ratings adjust based on the expected outcome. Stronger models gain less from beating weaker ones; upsets move the needle more.

Auditability

Votes are logged for auditability. We can trace how ratings evolved and verify the integrity of the leaderboard over time.

Start Blind Test View Leaderboard