Methodology
Blind listening tests
Provider and model names are hidden until after you vote. This prevents bias from brand recognition or prior experience. You judge purely on how the voice sounds.
Randomized A/B order
The order of the two clips (A vs B) is randomized for each round. This reduces position bias—the tendency to prefer the first or second option regardless of quality.
Controlled comparison
Both clips use the same sentence. This keeps the comparison fair: you're evaluating how different models render the same text, not different content.
ELO rating system
Rankings update using an ELO system, similar to chess. When you vote for one model over another, their ratings adjust based on the expected outcome. Stronger models gain less from beating weaker ones; upsets move the needle more.
Auditability
Votes are logged for auditability. We can trace how ratings evolved and verify the integrity of the leaderboard over time.