Real-world validation: on 2026-05-21, Bing Webmaster Tools flagged 26 URLs as failing on work-smart.ai. Investigation showed 18 of them were soft-404s: HTTP 200 + ~14KB empty SPA shell, no title, no H1. citable scored work-smart.ai 100/A because it only audits URLs in the sitemap (which doesn't include the dead URLs). The soft-404 pattern is invisible to citable today.
Documented as known limitation DEF-7 in AUDIT-2026-05-20.md. Bing report validates real demand.
Proposed v0.3.0 check (C-25 Soft-404 Detection):
- Probe a known-fake URL on the audited site (e.g., {root}/citable_test_404)
- Compare body length + title presence + H1 presence to real pages crawled
- If 200 status but body looks indistinguishable from the fake-URL response (size within 10% of fake, missing title, missing H1) -> FAIL
- Catches: SPA catch-all pattern, WordPress empty-permalink pattern, custom 404 pages returning 200
- Severity: P0 (silently degrades AI crawler trust)
- ~50 lines of Python in checks.py
Pair with the firewall blocking detection check (separate issue), both catch responses that look fine but aren't.
Related: fixed on work-smart.ai itself via build-time route allowlist middleware. Pattern documented in the auto-memory file feedback-spa-soft-404-allowlist-pattern.md.
Real-world validation: on 2026-05-21, Bing Webmaster Tools flagged 26 URLs as failing on work-smart.ai. Investigation showed 18 of them were soft-404s: HTTP 200 + ~14KB empty SPA shell, no title, no H1. citable scored work-smart.ai 100/A because it only audits URLs in the sitemap (which doesn't include the dead URLs). The soft-404 pattern is invisible to citable today.
Documented as known limitation DEF-7 in AUDIT-2026-05-20.md. Bing report validates real demand.
Proposed v0.3.0 check (C-25 Soft-404 Detection):
Pair with the firewall blocking detection check (separate issue), both catch responses that look fine but aren't.
Related: fixed on work-smart.ai itself via build-time route allowlist middleware. Pattern documented in the auto-memory file feedback-spa-soft-404-allowlist-pattern.md.