Skip to content

rootkiller6788/OpenClaude-Mythos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Claude Mythos: Observable Behavioral Architecture Inference

Reverse-engineering Claude's system-level behavioral architecture from publicly observable signals.

This is not a reverse-engineering of Claude's internal neural network weights or source code. Anthropic has not publicly disclosed Claude's internal model architecture (layer count, attention type, hidden dim, etc.). What IS publicly known are Claude's capabilities, product mechanisms, safety philosophy, tool-use behavior, extended thinking, and system-level interaction patterns — all observable through Claude's API, documentation, official blog posts, and Claude Code.

This project infers a behavioral architecture from those observables: what system components must likely exist to produce the observed behaviors. Internal layer details (specific layer counts, MoE structure, activation functions) are speculative research implementations and are labeled accordingly.


Evidence Hierarchy

Every component in this project is tagged with its evidence level:

Tag Meaning Example Source
[Observed] Directly observable through API behavior, black-box testing, or Claude Code usage Tool use always responds to user command; extended thinking produces visible chain-of-thought
[Reported] Stated in official Anthropic blog posts, research papers, system cards, or reliable media Constitutional AI paper; Claude model card; Anthropic research blog
[Inferred] Reasonably deduced system components from observed/reported behaviors A tool router must exist to parse tool calls; a safety classifier must gate harmful outputs
[Speculative] Pure architectural hypothesis / speculative implementation for research Specific hidden dimension, layer count, MoE expert count, attention window size

What Is Publicly Known About Claude

Capabilities (from official docs, system cards, and Claude Code)

[Observed] through API and Claude Code:

  • Agentic coding with tool orchestration (read, write, edit, bash, search)
  • Extended thinking with visible chain-of-thought reasoning
  • Multi-turn conversational memory across long contexts
  • Tool use gated by user commands (Claude does not autonomously plan without user intent)
  • Structured output (JSON mode, tool call format)
  • Long context windows (200K tokens)

[Reported] from official Anthropic sources:

  • Constitutional AI — safety via a constitution of principles, used during RLHF training (arxiv:2212.08073)
  • Reliable, interpretable, steerable systems — Anthropic's stated core design philosophy
  • RLHF + RLAIF training pipeline (not architectural layers)
  • Claude Code — agentic coding system with tool loop architecture
  • Model Context Protocol (MCP) — standard for tool/resource integration
  • Multi-modal support (text + vision input)

Inferred System-Level Architecture

Based on observable behaviors, we infer these system-level components (not necessarily internal model layers):

Data Flow (behavioral)

User Input
    │
    ▼
[Inferred]  Policy / Safety Classifier
[Reported]  Constitutional AI principles govern rejection/refusal behavior
[Observed]  Claude refuses harmful requests before generating a response
    │
    ▼
[Observed]  Unified Input Processing
[Observed]  Accepts text + images + tool results in a single context
[Reported]  Multi-modal via vision encoder + text tokenizer
    │
    ▼
[Inferred]  Context Management
[Observed]  Long context (200K), maintains coherence across very long conversations
[Speculative] Chunked context encoding for efficiency
    │
    ▼
[Observed]  Extended Thinking / Chain-of-Thought
[Observed]  Visible reasoning traces in Claude Code and API ("thinking" blocks)
[Inferred]  Iterative / recurrent reasoning process
[Speculative] RDT-like weight reuse or recurrent depth mechanism
    │
    ▼
[Observed]  Tool Use / Agentic Loop
[Observed]  Claude Code orchestrates read → write → bash → grep → etc.
[Inferred]  Tool router + executor + observation injection into context
[Speculative] Tool call prediction head with user-command gating
    │
    ▼
[Observed]  Memory / State Tracking
[Observed]  Maintains context across tool calls within a session
[Inferred]  Session-level state management
[Speculative] Recurrent memory state fusion
    │
    ▼
[Reported]  Output Safety Filter
[Reported]  Constitutional AI also constrains outputs post-generation
[Inferred]  Output compliance verification
    │
    ▼
Response

Project Structure

openclaude/
├── src/
│   ├── __init__.py
│   ├── model.py                      # Toy assembly of all speculated components
│   └── layers/
│       ├── __init__.py
│       ├── common.py                 # [Speculative] LayerNorm — standard in all transformers
│       ├── safety.py                 # [Inferred] Input/Output policy gates
│       ├── embedding.py              # [Speculative] Unified text + modal + position encoding
│       ├── chunk_encoder.py          # [Speculative] Chunked context encoder (toy impl)
│       ├── attention.py              # [Speculative] Causal local-window attention (naive)
│       ├── rdt_block.py              # [Speculative] Recurrent depth transformer block
│       ├── memory.py                 # [Speculative] Recurrent memory state fusion (toy)
│       ├── moe.py                    # [Speculative] Sparse MoE layer (naive ref impl)
│       └── tool_layer.py             # [Inferred] Tool call prediction head
├── demo/
│   └── infer_demo.py                 # Forward pass verification (toy model)
├── docs/
│   ├── ARCHITECTURE.md               # Detailed architecture discussion
│   └── ascii_arch.txt                # ASCII behavioral flow diagram
├── assets/
├── requirements.txt
├── .gitignore
├── LICENSE
└── openclaude.txt                    # Project specification (Chinese)

Component Evidence Mapping

Component File Evidence Rationale
Safety gating safety.py [Reported] + [Observed] Constitutional AI paper; Claude refuses harmful queries observably
Unified input encoding embedding.py [Observed] + [Reported] Claude accepts mixed text/image; tokenizer + vision encoder reported
Chunked context chunk_encoder.py [Speculative] Long context observed, but chunking mechanism is pure hypothesis
Causal attention attention.py [Inferred] Any autoregressive LM needs causal masking; window size is speculative
Recurrent/iterative depth rdt_block.py [Speculative] Extended thinking suggests iteration; weight-reuse is one possible implementation
Memory/state fusion memory.py [Speculative] Session persistence observed; EMA-based fusion is a toy design choice
Sparse MoE moe.py [Speculative] MoE is a common industry technique; expert count/routing is purely hypothetical
Tool call head tool_layer.py [Inferred] + [Observed] Claude Code demonstrates tool orchestration; prediction head is one architectural option
LayerNorm common.py [Inferred] Universally present in transformer architectures

Key Design Philosophy (Anthropic's Public Stance)

[Reported] Anthropic's design principles, from official communications:

  • Reliable — outputs should be factually grounded, not hallucinated
  • Interpretable — model decisions should be explainable (mechanistic interpretability research)
  • Steerable — behavior controllable via constitution, system prompts, and RLHF
  • Safety-first — Constitutional AI as the core alignment approach, not an afterthought

These principles inform the behavioral architecture — they explain WHY Claude behaves the way it does, even though the internal implementation details remain unknown.


What This Project Is NOT

  • NOT a leaked or reverse-engineered source of Claude's internal model
  • NOT a reproduction of Anthropic's training pipeline or weights
  • NOT a claim that Claude uses RDT, 64-expert MoE, or any specific internal architecture
  • NOT a production model — purely speculative research implementations

Setup and Run

# Create and activate venv
python -m venv venv
venv\Scripts\activate      # Windows
# source venv/bin/activate  # Linux/macOS

# Install dependencies
pip install -r requirements.txt
# Run inference demo (verify toy model forward pass)
python demo/infer_demo.py

References

Official Anthropic Sources (public)

  • Constitutional AI: Harmlessness from AI Feedback (arxiv:2212.08073)
  • Anthropic Research Blog (anthropic.com/research)
  • Claude Model Card / System Card
  • Claude Code documentation
  • Model Context Protocol (MCP) specification

External Reporting

  • Various tech media coverage of Claude capabilities
  • Third-party benchmarks and evaluations of Claude models

About

Architecture inference of Claude Mythos — a recurrent depth Transformer model reverse-engineered from public information. For learning and understanding the model structure.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages