Posts tagged “evals” - Ken Ashe | AI Optimist

Multimodal models still change answers when you shuffle the evidence Jun 25, 2026
Self-distillation can make models better on the first try and worse on the fifth Jun 25, 2026
Agent Success Rate is the only number that matters when a new model drops May 31, 2026
Marketers are still vibe-checking prompts. Frontier devs run evals before lunch. May 30, 2026
Stop Vibe-Checking New Models. Build a 50-Prompt Eval Set Instead. May 28, 2026
The Frustration Index: A Cheap Eval Most Teams Skip May 20, 2026