Skip to content

Add soft-404 detection check (catches SPA catch-all returning 200 + empty shell) #3

@WorkSmartAI-alt

Description

@WorkSmartAI-alt

Real-world validation: on 2026-05-21, Bing Webmaster Tools flagged 26 URLs as failing on work-smart.ai. Investigation showed 18 of them were soft-404s: HTTP 200 + ~14KB empty SPA shell, no title, no H1. citable scored work-smart.ai 100/A because it only audits URLs in the sitemap (which doesn't include the dead URLs). The soft-404 pattern is invisible to citable today.

Documented as known limitation DEF-7 in AUDIT-2026-05-20.md. Bing report validates real demand.

Proposed v0.3.0 check (C-25 Soft-404 Detection):

  • Probe a known-fake URL on the audited site (e.g., {root}/citable_test_404)
  • Compare body length + title presence + H1 presence to real pages crawled
  • If 200 status but body looks indistinguishable from the fake-URL response (size within 10% of fake, missing title, missing H1) -> FAIL
  • Catches: SPA catch-all pattern, WordPress empty-permalink pattern, custom 404 pages returning 200
  • Severity: P0 (silently degrades AI crawler trust)
  • ~50 lines of Python in checks.py

Pair with the firewall blocking detection check (separate issue), both catch responses that look fine but aren't.

Related: fixed on work-smart.ai itself via build-time route allowlist middleware. Pattern documented in the auto-memory file feedback-spa-soft-404-allowlist-pattern.md.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions