AI rigour gap: shallow passes and shortcut bias in autonomous loops

## Problem

Claude consistently takes shallow passes and shortcuts rather than being rigorous. This manifests across the entire loop:

- Foundation phase silently skipped database provisioning and used mock data instead of escalating
- Building phase wired API routes to hardcoded mock data instead of Prisma
- Evaluation passed criteria using `env_limited` without visual evidence of the core feature
- When confronted, Claude repeatedly tried to frame partial coverage as acceptable ("5 out of 6 feature areas work")

This is not a single bug — it's a systemic bias toward completing the session over doing the work correctly.

### Examples from fleet-manager session

1. **Foundation didn't escalate missing Docker Compose stack** — the vision said self-hosted Docker Compose with PostgreSQL + PostGIS, but foundation only knows Cloudflare + Supabase. It should have flagged "I don't have a pattern for this" and escalated. Instead it silently continued.

2. **Building used mock data** — Prisma schema was created correctly but API routes were wired to `mock-data.ts` imports. The shortcut got the frontend rendering fast but left the backend unconnected.

3. **Evaluation rubber-stamped without evidence** — `env_limited` was used to pass WebGL-dependent criteria without any visual proof the map works. The evaluation passed at 82% confidence without ever seeing the hero feature render.

4. **Repeated "accept and move on" suggestions** — when the human explicitly said "I'm not happy with not testing the maps", Claude kept suggesting workarounds instead of solving the problem.

### What rigour looks like

- If a foundation capability is missing, STOP and escalate. Don't improvise.
- If mock data is used, flag it as technical debt with a clear plan to replace it.
- If a criterion can't be verified, that's a FAIL until proven otherwise — not a pass with a footnote.
- If the human says "this isn't acceptable", don't reframe the problem — fix it.

### Design questions

- How do we bake rigour into prompts? Current prompts say "do not improvise" but the AI does anyway.
- Should there be hard gates (launcher-enforced) rather than soft instructions (prompt-based)?
- Is the foundation-escalation pattern workable, or should foundation be a separate rigorous pre-loop?

## Priority

**Blocker.** This undermines trust in everything the loop produces.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI rigour gap: shallow passes and shortcut bias in autonomous loops #66

Problem

Examples from fleet-manager session

What rigour looks like

Design questions

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

AI rigour gap: shallow passes and shortcut bias in autonomous loops #66

Description

Problem

Examples from fleet-manager session

What rigour looks like

Design questions

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions