Reverse-engineering Claude's system-level behavioral architecture from publicly observable signals.
This is not a reverse-engineering of Claude's internal neural network weights or source code. Anthropic has not publicly disclosed Claude's internal model architecture (layer count, attention type, hidden dim, etc.). What IS publicly known are Claude's capabilities, product mechanisms, safety philosophy, tool-use behavior, extended thinking, and system-level interaction patterns — all observable through Claude's API, documentation, official blog posts, and Claude Code.
This project infers a behavioral architecture from those observables: what system components must likely exist to produce the observed behaviors. Internal layer details (specific layer counts, MoE structure, activation functions) are speculative research implementations and are labeled accordingly.
Every component in this project is tagged with its evidence level:
| Tag | Meaning | Example Source |
|---|---|---|
[Observed] |
Directly observable through API behavior, black-box testing, or Claude Code usage | Tool use always responds to user command; extended thinking produces visible chain-of-thought |
[Reported] |
Stated in official Anthropic blog posts, research papers, system cards, or reliable media | Constitutional AI paper; Claude model card; Anthropic research blog |
[Inferred] |
Reasonably deduced system components from observed/reported behaviors | A tool router must exist to parse tool calls; a safety classifier must gate harmful outputs |
[Speculative] |
Pure architectural hypothesis / speculative implementation for research | Specific hidden dimension, layer count, MoE expert count, attention window size |
[Observed] through API and Claude Code:
- Agentic coding with tool orchestration (read, write, edit, bash, search)
- Extended thinking with visible chain-of-thought reasoning
- Multi-turn conversational memory across long contexts
- Tool use gated by user commands (Claude does not autonomously plan without user intent)
- Structured output (JSON mode, tool call format)
- Long context windows (200K tokens)
[Reported] from official Anthropic sources:
- Constitutional AI — safety via a constitution of principles, used during RLHF training (
arxiv:2212.08073) - Reliable, interpretable, steerable systems — Anthropic's stated core design philosophy
- RLHF + RLAIF training pipeline (not architectural layers)
- Claude Code — agentic coding system with tool loop architecture
- Model Context Protocol (MCP) — standard for tool/resource integration
- Multi-modal support (text + vision input)
Based on observable behaviors, we infer these system-level components (not necessarily internal model layers):
User Input
│
▼
[Inferred] Policy / Safety Classifier
[Reported] Constitutional AI principles govern rejection/refusal behavior
[Observed] Claude refuses harmful requests before generating a response
│
▼
[Observed] Unified Input Processing
[Observed] Accepts text + images + tool results in a single context
[Reported] Multi-modal via vision encoder + text tokenizer
│
▼
[Inferred] Context Management
[Observed] Long context (200K), maintains coherence across very long conversations
[Speculative] Chunked context encoding for efficiency
│
▼
[Observed] Extended Thinking / Chain-of-Thought
[Observed] Visible reasoning traces in Claude Code and API ("thinking" blocks)
[Inferred] Iterative / recurrent reasoning process
[Speculative] RDT-like weight reuse or recurrent depth mechanism
│
▼
[Observed] Tool Use / Agentic Loop
[Observed] Claude Code orchestrates read → write → bash → grep → etc.
[Inferred] Tool router + executor + observation injection into context
[Speculative] Tool call prediction head with user-command gating
│
▼
[Observed] Memory / State Tracking
[Observed] Maintains context across tool calls within a session
[Inferred] Session-level state management
[Speculative] Recurrent memory state fusion
│
▼
[Reported] Output Safety Filter
[Reported] Constitutional AI also constrains outputs post-generation
[Inferred] Output compliance verification
│
▼
Response
openclaude/
├── src/
│ ├── __init__.py
│ ├── model.py # Toy assembly of all speculated components
│ └── layers/
│ ├── __init__.py
│ ├── common.py # [Speculative] LayerNorm — standard in all transformers
│ ├── safety.py # [Inferred] Input/Output policy gates
│ ├── embedding.py # [Speculative] Unified text + modal + position encoding
│ ├── chunk_encoder.py # [Speculative] Chunked context encoder (toy impl)
│ ├── attention.py # [Speculative] Causal local-window attention (naive)
│ ├── rdt_block.py # [Speculative] Recurrent depth transformer block
│ ├── memory.py # [Speculative] Recurrent memory state fusion (toy)
│ ├── moe.py # [Speculative] Sparse MoE layer (naive ref impl)
│ └── tool_layer.py # [Inferred] Tool call prediction head
├── demo/
│ └── infer_demo.py # Forward pass verification (toy model)
├── docs/
│ ├── ARCHITECTURE.md # Detailed architecture discussion
│ └── ascii_arch.txt # ASCII behavioral flow diagram
├── assets/
├── requirements.txt
├── .gitignore
├── LICENSE
└── openclaude.txt # Project specification (Chinese)
| Component | File | Evidence | Rationale |
|---|---|---|---|
| Safety gating | safety.py |
[Reported] + [Observed] | Constitutional AI paper; Claude refuses harmful queries observably |
| Unified input encoding | embedding.py |
[Observed] + [Reported] | Claude accepts mixed text/image; tokenizer + vision encoder reported |
| Chunked context | chunk_encoder.py |
[Speculative] | Long context observed, but chunking mechanism is pure hypothesis |
| Causal attention | attention.py |
[Inferred] | Any autoregressive LM needs causal masking; window size is speculative |
| Recurrent/iterative depth | rdt_block.py |
[Speculative] | Extended thinking suggests iteration; weight-reuse is one possible implementation |
| Memory/state fusion | memory.py |
[Speculative] | Session persistence observed; EMA-based fusion is a toy design choice |
| Sparse MoE | moe.py |
[Speculative] | MoE is a common industry technique; expert count/routing is purely hypothetical |
| Tool call head | tool_layer.py |
[Inferred] + [Observed] | Claude Code demonstrates tool orchestration; prediction head is one architectural option |
| LayerNorm | common.py |
[Inferred] | Universally present in transformer architectures |
[Reported] Anthropic's design principles, from official communications:
- Reliable — outputs should be factually grounded, not hallucinated
- Interpretable — model decisions should be explainable (mechanistic interpretability research)
- Steerable — behavior controllable via constitution, system prompts, and RLHF
- Safety-first — Constitutional AI as the core alignment approach, not an afterthought
These principles inform the behavioral architecture — they explain WHY Claude behaves the way it does, even though the internal implementation details remain unknown.
- NOT a leaked or reverse-engineered source of Claude's internal model
- NOT a reproduction of Anthropic's training pipeline or weights
- NOT a claim that Claude uses RDT, 64-expert MoE, or any specific internal architecture
- NOT a production model — purely speculative research implementations
# Create and activate venv
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/macOS
# Install dependencies
pip install -r requirements.txt# Run inference demo (verify toy model forward pass)
python demo/infer_demo.py- Constitutional AI: Harmlessness from AI Feedback (
arxiv:2212.08073) - Anthropic Research Blog (
anthropic.com/research) - Claude Model Card / System Card
- Claude Code documentation
- Model Context Protocol (MCP) specification
- Various tech media coverage of Claude capabilities
- Third-party benchmarks and evaluations of Claude models