Self-Developed Algorithm · Iterative Refinement Architecture · Spiral Memory Mechanism
Apex is a fully self-developed iterative refinement reasoning framework. Its core innovation lies in formalizing the Proposal → Review → Refinement multi-round self-correction pipeline as a trainable neural network loop. By leveraging self-critique and verification feedback, the model continuously refines its output during inference, overcoming the single-pass forward limitation of standard Transformer architectures.
Despite their scale, major LLMs (GPT, Claude, LLaMA, etc.) share fundamental architectural weaknesses:
| Issue | Description |
|---|---|
| Single-Pass Inference | Each token is processed once; no built-in self-correction |
| No Feedback Loop | Output does not feed back into input for revision |
| Error Accumulation | Early-token errors compound along the autoregressive chain |
| Linear Reasoning | Chain-of-Thought strategies are unidirectional, lacking divergent review |
Rather than scaling up parameters and data, Apex innovates at the reasoning architecture level:
Standard LLM: Input → [Transformer × N] → Output (single pass)
Apex: Input → Prelude Encoding → [Refinement Loop × K]:
├─ Proposal Head
├─ Review Head
├─ Refinement Head
├─ Scoring Verifier
└─ Spiral Memory Update
→ Decode Output (self-corrective reasoning)
A differentiable self-correction loop that generates three distinct representations per step — Proposal (candidate generation), Review (defect detection), and Refinement (fusion and improvement):
All three heads share the same Transformer backbone and project the same hidden state into different subspaces, forming a self-adversarial and convergent refinement loop. The number of steps
Unlike standard RNNs that update state from only the previous step, Spiral Memory compresses five-dimensional information into a unified memory state:
where
Verifier(refinement) → score ∈ [0, 1]
A trainable scoring network that assesses the quality of each round's refinement output. The score is fed back into the Spiral Memory, driving the model to improve low-scoring outputs in subsequent steps — forming a closed-loop feedback system for reasoning quality.
| Mechanism | Function |
|---|---|
| GQA (Grouped Query Attention) | Reduces KV cache footprint, accelerates inference |
| Sliding Window Attention | Local attention with O(wn) complexity |
| Full Attention (every 4 layers) | Global attention every 4th layer, balancing local efficiency with global context |
| Memory Cross-Attention | Cross-attends to Spiral Memory at every layer, injecting historical reasoning context |
Replaces standard FFN with SwiGLU for enhanced non-linear expressiveness:
Self-implemented RoPE with zero external dependencies, supporting extrapolation to sequence lengths unseen during training.
┌──────────────────────┐
Input ────────────▶ │ SimpleTokenizer │
└──────────┬───────────┘
│
┌──────────▼───────────┐
│ Embedding Table │
└──────────┬───────────┘
│
┌────────────────▼────────────────┐
│ Pre-lude Transformer │
│ (GQA + SWA + Cross-Attn) × N │
└────────────────┬────────────────┘
│
┌──────────────────────▼──────────────────────┐
│ Refinement Loop × K │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Shared Core Transformer Blocks │ │
│ │ (GQA + SWA/Full + Mem Cross-Attn) │ │
│ └─────────────────┬───────────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ Proposal Review Refinement │
│ │ │ │ │
│ └──────────────┼──────────────┘ │
│ ▼ │
│ ┌───────────────┐ │
│ │ Verifier │──▶ Score │
│ └───────┬───────┘ │
│ │ │
│ ┌──────────────▼──────────────┐ │
│ │ Spiral Memory Update │ │
│ │ (P⊕C⊕R⊕Score⊕Invariant} │ │
│ └──────────────┬──────────────┘ │
│ │ │
│ ▼ (next step) │
└─────────────────────────────────────────────┘
│
┌──────────▼───────────┐
│ Linear Decoder │
└──────────┬───────────┘
│
┌──────────▼───────────┐
│ Output Text │
└──────────────────────┘
| Feature | Standard Transformer | Chain-of-Thought | Tree-of-Thought | Apex (Self-Dev) |
|---|---|---|---|---|
| Inference passes | 1 | 1 + prompt | N tree searches | K loop steps |
| Self-correction | ❌ | ❌ (prompt-dependent) | Partial | ✅ Built-in |
| Quality feedback | ❌ | ❌ | External | ✅ Built-in scorer |
| Cross-step memory | ❌ | ❌ | ❌ | ✅ Spiral Memory |
| Compute overhead | Baseline | +prompt length | +tree nodes | +K × shared layers |
| End-to-end differentiable | ✅ | ✅ (no special design) | ❌ | ✅ Full pipeline |
Apex/
├── README.md # This file (English)
├── README-CN.md # Chinese documentation
├── LICENSE # MIT License
├── pyproject.toml # Python package config
├── requirements.txt # Dependencies (torch>=2.0.0)
├── configs/ # Hyperparameter configs
│
├── apex/ # Core package
│ ├── model/ # Model components
│ │ ├── rope.py # Rotary Position Embedding (self-dev)
│ │ ├── attention.py # GQA + Sliding Window + SwiGLU (self-dev)
│ │ ├── transformer.py # Shared Transformer Block (self-dev)
│ │ ├── memory.py # Spiral Memory compression (core innovation)
│ │ ├── heads.py # Three-way reasoning heads + decoder (core innovation)
│ │ ├── dialectic.py # Refinement step + ApexMVP model (core innovation)
│ │ └── recurrent.py # Gated recurrent state cell
│ │
│ ├── runtime/ # Runtime control
│ │ ├── loop.py # Training / validation loop
│ │ ├── verifier.py # Scoring verifier interface (core innovation)
│ │ ├── scheduler.py # Loop-step / LR scheduler (self-dev)
│ │ └── controller.py # Inference controller
│ │
│ ├── data/ # Data pipeline
│ │ ├── tokenizer.py # Character-level tokenizer (self-dev, zero deps)
│ │ ├── dataset.py # Dataset loaders
│ │ └── preprocess.py # Preprocessing utilities
│ │
│ ├── train/ # Training system
│ │ ├── trainer.py # Trainer
│ │ ├── losses.py # Combined loss (verification + consistency)
│ │ └── optim.py # Optimizer factory
│ │
│ └── utils/ # Utility functions
│
├── scripts/ # Run scripts
│ ├── train.sh # Training entrypoint
│ ├── eval.sh # Evaluation entrypoint
│ └── benchmark.sh # Performance benchmark
│
├── examples/ # Usage examples
│ ├── code_repair.py # Code repair example
│ ├── math_reasoning.py # Math reasoning example
│ └── verifier_loop.py # Verifier loop analysis
│
├── docs/ # Detailed docs
│ ├── architecture.md # Architecture design doc
│ ├── runtime.md # Runtime mechanics
│ └── training.md # Training guide
│
├── benchmarks/ # Evaluation benchmarks
├── experiments/ # Experiment configs
├── checkpoints/ # Model checkpoints
├── outputs/ # Output directory
└── datasets/ # Dataset directory
- Python >= 3.10
- PyTorch >= 2.0.0
git clone <repo-url>
cd Apex
pip install -r requirements.txt# Code repair
python examples/code_repair.py
# Math reasoning
python examples/math_reasoning.py
# Detailed verifier loop analysis
python examples/verifier_loop.pybash scripts/train.shOr in Python:
from apex import ApexMVP
from apex.data import make_toy_dataset
from apex.train import Trainer
model = ApexMVP(
vocab_size=32000,
dim=512,
prelude_layers=2,
shared_layers=4,
num_heads=8,
num_kv_heads=2,
window_size=128,
loop_steps=3,
)
dataset = make_toy_dataset()
trainer = Trainer(model, device="cuda", lr=1e-4)
trainer.fit(dataset, epochs=50)from apex import ApexMVP
from apex.runtime import RuntimeController
model = ApexMVP(dim=512, loop_steps=3)
controller = RuntimeController(model, device="cuda")
result, scores, history = controller.run("Your question...")
print(f"Result: {result}")
print(f"Verification scores: {[round(s.item(), 4) for s in scores]}")Given input text
Step 1: Tokenization & Embedding
Step 2: Prelude Encoding
Step 3: Iterative Refinement Loop (repeated
where
Step 4: Final Decoding
The final hidden state is fused with the spiral memory via residual addition before the linear decoder generates the output.
Apex uses a three-component loss:
| Term | Formula | Purpose |
|---|---|---|
| Cross-Entropy(logits, targets) | Standard next-token prediction | |
| Encourages high verifier scores on refinements | ||
| MSE(Proposal, Refinement) + MSE(Review, Refinement) | Keeps representations semantically consistent |
| Parameter | Default | Description |
|---|---|---|
dim |
512 | Hidden dimension |
prelude_layers |
2 | Number of prelude Transformer layers |
shared_layers |
4 | Number of shared core layers |
num_heads |
8 | Number of query attention heads |
num_kv_heads |
2 | Number of KV heads (GQA groups) |
window_size |
128 | Sliding window size |
loop_steps |
3 | Number of refinement loop steps |
vocab_size |
32000 | Vocabulary size |
| Domain | Apex Advantage |
|---|---|
| Code Repair | Multi-round self-review detects and fixes defects |
| Math Reasoning | Verifier scores intermediate conclusions, selects correct paths |
| Logical Reasoning | Review head identifies inconsistencies in reasoning chains |
| Text Quality | Iteratively corrects grammar, logic, and style |
| Multi-Step Planning | Spiral Memory stores intermediate planning states |
@misc{apex2025,
title={Apex: Self-Refining Reasoning via Iterative Refinement Loops and Spiral Memory},
author={Apex Contributors},
year={2025},
note={Self-developed innovative algorithm},
}- MVP core architecture (iterative refinement loop + spiral memory + scoring verifier)
- GQA + sliding window attention + SwiGLU activation
- Complete train / eval / inference pipeline
- Real dataset training (CodeNet, GSM8K)
- Dynamic loop-step scheduling
- Multi-task fine-tuning support
- Distributed training (FSDP)
- ONNX / TensorRT inference acceleration
- Open-weight release
MIT License. See LICENSE.