Evals
Letter EStructured tests that measure model performance on actual tasks. Not vibes. Not demos.
Updated:
Working definition
Evals are structured tests that measure model performance on actual tasks. Not vibes. Not demos. If your team can’t show you their evals, they’re guessing.