A prompt describes intent; an eval renders a verdict. Once a non-human author is in the loop, intent under-determines behaviour, so the acceptance criterion has to become executable. A good eval is one where two domain experts would independently reach the same pass/fail verdict — which is simply the operational definition of an acceptance criterion.
Moving the craft from prompt scaffolding to the eval is how the acceptance gap is closed in practice. Depth: The Eval Is the Spec: Why Acceptance Criteria Become Executable Tests in Agentic Delivery.