Concept

Evals as Spec

Treating an executable evaluation suite as the real specification — a criterion two competent reviewers would agree on, made decidable before generation.

Also: The eval is the spec

A prompt describes intent; an eval renders a verdict. Once a non-human author is in the loop, intent under-determines behaviour, so the acceptance criterion has to become executable. A good eval is one where two domain experts would independently reach the same pass/fail verdict — which is simply the operational definition of an acceptance criterion.

Moving the craft from prompt scaffolding to the eval is how the acceptance gap is closed in practice. Depth: The Eval Is the Spec: Why Acceptance Criteria Become Executable Tests in Agentic Delivery.

Defined in

The Eval Is the Spec: Why Acceptance Criteria Become Executable Tests in Agentic Delivery

Related concepts

part of Harness Architecture

Underpins

The model is rented; the harness is owned.

Appears in

Scrum in the Agentic Era: Which Ceremonies Survive The Eval Is the Spec: Why Acceptance Criteria Become Executable Tests in Agentic Delivery

Explore in depth

Related thinking