Claude Mythos: Observable Behavioral Architecture Inference

Reverse-engineering Claude's system-level behavioral architecture from publicly observable signals.

This is not a reverse-engineering of Claude's internal neural network weights or source code. Anthropic has not publicly disclosed Claude's internal model architecture (layer count, attention type, hidden dim, etc.). What IS publicly known are Claude's capabilities, product mechanisms, safety philosophy, tool-use behavior, extended thinking, and system-level interaction patterns — all observable through Claude's API, documentation, official blog posts, and Claude Code.

This project infers a behavioral architecture from those observables: what system components must likely exist to produce the observed behaviors. Internal layer details (specific layer counts, MoE structure, activation functions) are speculative research implementations and are labeled accordingly.

Evidence Hierarchy

Every component in this project is tagged with its evidence level:

Tag	Meaning	Example Source
`[Observed]`	Directly observable through API behavior, black-box testing, or Claude Code usage	Tool use always responds to user command; extended thinking produces visible chain-of-thought
`[Reported]`	Stated in official Anthropic blog posts, research papers, system cards, or reliable media	Constitutional AI paper; Claude model card; Anthropic research blog
`[Inferred]`	Reasonably deduced system components from observed/reported behaviors	A tool router must exist to parse tool calls; a safety classifier must gate harmful outputs
`[Speculative]`	Pure architectural hypothesis / speculative implementation for research	Specific hidden dimension, layer count, MoE expert count, attention window size

What Is Publicly Known About Claude

Capabilities (from official docs, system cards, and Claude Code)

[Observed] through API and Claude Code:

Agentic coding with tool orchestration (read, write, edit, bash, search)
Extended thinking with visible chain-of-thought reasoning
Multi-turn conversational memory across long contexts
Tool use gated by user commands (Claude does not autonomously plan without user intent)
Structured output (JSON mode, tool call format)
Long context windows (200K tokens)

[Reported] from official Anthropic sources:

Constitutional AI — safety via a constitution of principles, used during RLHF training (arxiv:2212.08073)
Reliable, interpretable, steerable systems — Anthropic's stated core design philosophy
RLHF + RLAIF training pipeline (not architectural layers)
Claude Code — agentic coding system with tool loop architecture
Model Context Protocol (MCP) — standard for tool/resource integration
Multi-modal support (text + vision input)

Inferred System-Level Architecture

Based on observable behaviors, we infer these system-level components (not necessarily internal model layers):

Data Flow (behavioral)

User Input
    │
    ▼
[Inferred]  Policy / Safety Classifier
[Reported]  Constitutional AI principles govern rejection/refusal behavior
[Observed]  Claude refuses harmful requests before generating a response
    │
    ▼
[Observed]  Unified Input Processing
[Observed]  Accepts text + images + tool results in a single context
[Reported]  Multi-modal via vision encoder + text tokenizer
    │
    ▼
[Inferred]  Context Management
[Observed]  Long context (200K), maintains coherence across very long conversations
[Speculative] Chunked context encoding for efficiency
    │
    ▼
[Observed]  Extended Thinking / Chain-of-Thought
[Observed]  Visible reasoning traces in Claude Code and API ("thinking" blocks)
[Inferred]  Iterative / recurrent reasoning process
[Speculative] RDT-like weight reuse or recurrent depth mechanism
    │
    ▼
[Observed]  Tool Use / Agentic Loop
[Observed]  Claude Code orchestrates read → write → bash → grep → etc.
[Inferred]  Tool router + executor + observation injection into context
[Speculative] Tool call prediction head with user-command gating
    │
    ▼
[Observed]  Memory / State Tracking
[Observed]  Maintains context across tool calls within a session
[Inferred]  Session-level state management
[Speculative] Recurrent memory state fusion
    │
    ▼
[Reported]  Output Safety Filter
[Reported]  Constitutional AI also constrains outputs post-generation
[Inferred]  Output compliance verification
    │
    ▼
Response

Project Structure

openclaude/
├── src/
│   ├── __init__.py
│   ├── model.py                      # Toy assembly of all speculated components
│   └── layers/
│       ├── __init__.py
│       ├── common.py                 # [Speculative] LayerNorm — standard in all transformers
│       ├── safety.py                 # [Inferred] Input/Output policy gates
│       ├── embedding.py              # [Speculative] Unified text + modal + position encoding
│       ├── chunk_encoder.py          # [Speculative] Chunked context encoder (toy impl)
│       ├── attention.py              # [Speculative] Causal local-window attention (naive)
│       ├── rdt_block.py              # [Speculative] Recurrent depth transformer block
│       ├── memory.py                 # [Speculative] Recurrent memory state fusion (toy)
│       ├── moe.py                    # [Speculative] Sparse MoE layer (naive ref impl)
│       └── tool_layer.py             # [Inferred] Tool call prediction head
├── demo/
│   └── infer_demo.py                 # Forward pass verification (toy model)
├── docs/
│   ├── ARCHITECTURE.md               # Detailed architecture discussion
│   └── ascii_arch.txt                # ASCII behavioral flow diagram
├── assets/
├── requirements.txt
├── .gitignore
├── LICENSE
└── openclaude.txt                    # Project specification (Chinese)

Component Evidence Mapping

Component	File	Evidence	Rationale
Safety gating	`safety.py`	[Reported] + [Observed]	Constitutional AI paper; Claude refuses harmful queries observably
Unified input encoding	`embedding.py`	[Observed] + [Reported]	Claude accepts mixed text/image; tokenizer + vision encoder reported
Chunked context	`chunk_encoder.py`	[Speculative]	Long context observed, but chunking mechanism is pure hypothesis
Causal attention	`attention.py`	[Inferred]	Any autoregressive LM needs causal masking; window size is speculative
Recurrent/iterative depth	`rdt_block.py`	[Speculative]	Extended thinking suggests iteration; weight-reuse is one possible implementation
Memory/state fusion	`memory.py`	[Speculative]	Session persistence observed; EMA-based fusion is a toy design choice
Sparse MoE	`moe.py`	[Speculative]	MoE is a common industry technique; expert count/routing is purely hypothetical
Tool call head	`tool_layer.py`	[Inferred] + [Observed]	Claude Code demonstrates tool orchestration; prediction head is one architectural option
LayerNorm	`common.py`	[Inferred]	Universally present in transformer architectures

Key Design Philosophy (Anthropic's Public Stance)

[Reported] Anthropic's design principles, from official communications:

Reliable — outputs should be factually grounded, not hallucinated
Interpretable — model decisions should be explainable (mechanistic interpretability research)
Steerable — behavior controllable via constitution, system prompts, and RLHF
Safety-first — Constitutional AI as the core alignment approach, not an afterthought

These principles inform the behavioral architecture — they explain WHY Claude behaves the way it does, even though the internal implementation details remain unknown.

What This Project Is NOT

NOT a leaked or reverse-engineered source of Claude's internal model
NOT a reproduction of Anthropic's training pipeline or weights
NOT a claim that Claude uses RDT, 64-expert MoE, or any specific internal architecture
NOT a production model — purely speculative research implementations

Setup and Run

# Create and activate venv
python -m venv venv
venv\Scripts\activate      # Windows
# source venv/bin/activate  # Linux/macOS

# Install dependencies
pip install -r requirements.txt

# Run inference demo (verify toy model forward pass)
python demo/infer_demo.py

References

Official Anthropic Sources (public)

Constitutional AI: Harmlessness from AI Feedback (arxiv:2212.08073)
Anthropic Research Blog (anthropic.com/research)
Claude Model Card / System Card
Claude Code documentation
Model Context Protocol (MCP) specification

External Reporting

Various tech media coverage of Claude capabilities
Third-party benchmarks and evaluations of Claude models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Mythos: Observable Behavioral Architecture Inference

Evidence Hierarchy

What Is Publicly Known About Claude

Capabilities (from official docs, system cards, and Claude Code)

Inferred System-Level Architecture

Data Flow (behavioral)

Project Structure

Component Evidence Mapping

Key Design Philosophy (Anthropic's Public Stance)

What This Project Is NOT

Setup and Run

References

Official Anthropic Sources (public)

External Reporting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
demo		demo
docs		docs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README-CN.md		README-CN.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Claude Mythos: Observable Behavioral Architecture Inference

Evidence Hierarchy

What Is Publicly Known About Claude

Capabilities (from official docs, system cards, and Claude Code)

Inferred System-Level Architecture

Data Flow (behavioral)

Project Structure

Component Evidence Mapping

Key Design Philosophy (Anthropic's Public Stance)

What This Project Is NOT

Setup and Run

References

Official Anthropic Sources (public)

External Reporting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages