Production research tool

Model Eval Index

Rank models by benchmark performance per projected token dollar without hiding sparse evidence.

Loading snapshot -- models

Current recommendation

Separates pure token-dollar efficiency from premium model preference.

Source intelligence

Imported rows drive rankings; researched sources show the next best places to improve coverage and cost signal.

Benchmark coverage

Each benchmark family shows current rows, pricing coverage, source status, and whether it is active in the score.

Performance vs projected token cost

Higher and farther left means more benchmark performance per token dollar.

Compare set

Select up to four rows to compare efficiency, score, and projected workload cost.

Ranked models

Default rank is adjusted benchmark points per projected token dollar.

Compare Rank Model Composite Evidence Quality Benchmarks Published cost Profile cost Cost / point Efficiency

Source posture

Methodology