Skip to content

docs: add self_audit direction notes to module docstring#591

Merged
neoneye merged 1 commit intomainfrom
docs/self-audit-direction-notes
Apr 17, 2026
Merged

docs: add self_audit direction notes to module docstring#591
neoneye merged 1 commit intomainfrom
docs/self-audit-direction-notes

Conversation

@neoneye
Copy link
Copy Markdown
Member

@neoneye neoneye commented Apr 17, 2026

Summary

Adds notes to the top-of-module docstring in self_audit.py capturing directions worth exploring:

  • Dedicated VIOLATES_KNOWN_PHYSICS prompt. Observation: elephant-alpha in particular flags non-physics plans as physics violations ("faster than light travel" on documents unrelated to FTL), while doing fine on other checklist items. A separate tailored prompt might reduce false positives without disturbing the rest of the audit.
  • New checklist item — Fabricated evidence. Flag hallucinated laws, articles, or citations that don't exist.
  • New checklist item — Fake confidence. Flag assertions made without sufficient evidence to back them.
  • New checklist item — Vague filler language. Flag placeholder phrases like "develop a robust strategy" or "implement a communication strategy" where the actual strategy is never specified.

Why commit these as a docstring

Keeps future-direction notes co-located with the code they affect, so anyone touching self_audit.py sees them immediately. These are deliberately notes, not implementation — related PR #585 covers the ongoing work on the physics false-positive.

🤖 Generated with Claude Code

Captures thoughts on future improvements to the self_audit checklist:
- A dedicated system prompt for VIOLATES_KNOWN_PHYSICS to reduce
  false positives (elephant-alpha model especially prone to flagging
  non-physics plans as physics violations).
- Proposed new checklist items:
  * Find fabricated evidence (hallucinated laws, articles, citations).
  * Call out fake confidence where evidence is insufficient.
  * Attack on vague filler language ("develop a robust strategy",
    "implement a communication strategy") that doesn't specify the
    actual strategy.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@neoneye neoneye merged commit 96bee77 into main Apr 17, 2026
3 checks passed
@neoneye neoneye deleted the docs/self-audit-direction-notes branch April 17, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant