tinyMARS

First quantitative validation of channel causality in a from-scratch channel-causal architecture. (Channel state demonstrably changes generation — see §"Scope of validation" below for what this does and does NOT claim.)

A 145M-parameter decoder with six proprioceptive channels (memory, affect, time, ethics, identity, continuity) integrated via cross-attention with ReZero gating, in every layer.

Built between 2026-05-02 and 2026-05-04 by Mario Gutierrez at Celiums Solutions LLC.

What this is

This repository contains the code, methodology, evaluation suite, training/eval logs, and per-iteration metrics for the tinyMARS architectural validation experiment.

The strong claim of MARS (Multi-channel Architecture for Real-time Subjectivity) is that the architecture is a new category, not a fine-tuning trick. The parallel MARS-Real track validated MARS as an adapter on top of Google's frozen Gemma 4 E2B-it. tinyMARS asks the harder question: if we train the channels into the model from layer one, with no pretrained backbone underneath, do they actually become causal in generation?

After four iterations:

The affect channel rose from 40% pass rate (iter 3, skills+papers corpus) to 80% pass rate (iter 4, after adding ~210K rows of affect-rich dialog), on a 20-test ablation suite where flipping the channel state should change the response.
alpha_l2 (the L2 norm of the ReZero alpha gates, our proxy for "are channels actually contributing") sustained at 0.323 at end of iter 4 training.
7 of 8 measured zero-distance ablation tests in iter 4 had cosine_distance(response_with_channels, response_with_zeroed_channels) > 0.04, the threshold above which channels are demonstrably modifying output.

Scope of validation. These results validate channel causality: channel state changes generation in measurable, judge-recognized ways. They do not validate the stronger architectural delta — Native (with channels) vs Baseline (no channels) trained on the same corpus at the same scale. Neither iter 3 nor iter 4 ran a parallel baseline at their respective corpus scales (decision: NO_BASELINE in both iteration JSONs), so the "+25pp delta" criterion for full architectural validation is not met. See docs/english/00-executive-summary.md §Limitations and docs/english/08-conclusions.md for the full caveats.

For the full paper-grade analysis with all numbers, see docs/closure/final-report.md or the rendered PDF docs/closure/tinyMARS-final-report.pdf.

For the conversational story (NotebookLM-friendly), see docs/narrative/.

What this is NOT

Not a deployable model. Eval loss 1.47 with 145M params is below conversational competence. We are publishing the architectural prototype, not weights, because the model is below the threshold where releasing weights would produce a useful product. Plan B (the parallel MARS-Real track on Gemma 4 E2B-it) is the deployable artifact and will be released separately.
Not a replacement for transformers. TinyMARSDecoder is a vanilla decoder with one structural addition: a ChannelInjection block per channel inserted at every layer.
Not a tokenizer innovation. Standard 32K BPE.
Not an attention-mechanism innovation. Standard self-attention with RoPE.
Not the full Celiums system. Hyphae (the structured external substrate) is a separate research/product track and is not part of this OSS work.

Repository layout

.
├── README.md                       this file
├── LICENSE                          MIT
├── CHANGELOG.md
├── .gitignore
├── requirements.txt                 deps for training + eval
├── src/
│   └── tinymars_model.py            TinyMARSDecoder + ChannelInjection + ReZero
├── training/
│   ├── train_tinymars.py            training loop
│   ├── build_pretrain_corpus.py     corpus builder
│   ├── tokenizer_train.py           custom 32K BPE
│   ├── parquet_corpus.py            corpus utilities
│   └── random_corpus.py             synthetic corpus for smoke tests
├── eval/
│   ├── eval_tinymars.py             eval runner
│   ├── eval_suite.json              the 20 ablation tests
│   ├── channels.py                  channel state class
│   ├── checkpoint_model.py          model loader
│   ├── judges.py                    LLM-as-judge methodology
│   ├── stub_model.py                testing infrastructure
│   ├── tests.py                     test definitions
│   └── chat_tinymars.py             interactive REPL against a checkpoint
├── pipeline/
│   ├── gen_tinymars_corpus_{A,B,C,D}.py   corpus generators (one per Tipo)
│   ├── expand_base_prompts.py
│   ├── expand_memory_narratives.py
│   ├── inference_client.py          LLM API wrapper (BYOK via env vars)
│   ├── personas.json                 12 synthetic personas for Tipo C
│   ├── affect_quadrants.py           PAD quadrant definitions
│   ├── topic_pool.json               diverse topics
│   ├── base_prompts_A.json           seed prompts for affect-controlled pairs
│   ├── substrate_prompts.json        seed prompts for substrate-aware pairs
│   └── memory_narratives.json        synthetic episodic memories
├── scripts/
│   ├── chat_iter3.sh                 chat against a remote checkpoint
│   ├── watch.sh, watch_iter3.sh      live training dashboard
│   └── render_final_pdf.py           markdown → PDF
├── docs/
│   ├── 00-mision.md → 05-roadmap.md  project design (Spanish)
│   ├── english/00-08-*.md            paper-grade English docs
│   ├── narrative/00-05-*.md          conversational narrative (NotebookLM-friendly)
│   ├── iterations/iter[1-4].json     raw metrics per iteration
│   ├── closure/
│   │   ├── final-report.md           paper-grade summary
│   │   └── tinyMARS-final-report.pdf rendered PDF
│   └── procedures/PROCEDURES.md      reproducibility procedures
└── artifacts/
    ├── iter[1-4]-summary.json        final metrics per iteration
    ├── iter[3,4]-eval.json            full per-test eval results
    └── iter[3,4]_native.log           training logs (audit trail)

Reproducing the experiments

Reproducing tinyMARS requires:

Hardware: an H200 SXM 141 GB or equivalent. Iter 3 trained 7,687 steps in 6 hours. Iter 4 trained 19,073 steps in 15 hours.
Base model: none. tinyMARS trains from random initialization with a custom 32K-BPE tokenizer.
Corpus: skills + papers as replay (we used internal Celiums corpora, not in this repo), plus channel-causal pairs generated via pipeline/gen_tinymars_corpus_{A,B,C,D}.py using a teacher LLM (we used Anthropic Opus 4.7 via DO Inference).
API access: a teacher LLM (Anthropic Opus or equivalent) and a judge LLM (Sonnet or equivalent). API keys via environment variables — see pipeline/inference_client.py.

# 1. Install
pip install -r requirements.txt

# 2. Set API keys
export DO_INFERENCE_API_KEY="<your-do-inference-key>"
# or set the equivalent env var for your provider

# 3. Generate corpus (Tipo A example)
python pipeline/gen_tinymars_corpus_A.py --out data/A/ --target 3000

# 4. Train tokenizer
python training/tokenizer_train.py --corpus data/ --out data/tokenizer/tokenizer-32k.model

# 5. Build packed pretraining corpus
python training/build_pretrain_corpus.py --out data/corpus/packed/

# 6. Train
python -m training.train_tinymars \
  --variant native \
  --max-steps 19073 \
  --bs 8 --grad-accum 16 --seq-len 2048 \
  --lr 6e-4 --warmup-ratio 0.02 --weight-decay 0.10 \
  --channel-dropout-p 0.5 --channel-dropout-warmup-steps 100 \
  --train-data "data/corpus/packed/train-*.parquet" \
  --eval-data "data/corpus/packed/eval-*.parquet" \
  --out checkpoints/iter5-native

# 7. Evaluate
python -m eval.eval_tinymars \
  --ckpt checkpoints/iter5-native/latest.pt \
  --suite eval/eval_suite.json

What's not in this repo, and why

Model weights. tinyMARS is below conversational threshold; releasing weights would produce confusion ("Celiums released a model that doesn't work well"). Plan B (4.6B refined Gemma 4 E2B base) is the deployable artifact, in a separate forthcoming repo.
Generated corpus. The corpus channel-causal pairs (Tipos A/B/C/D) were generated using teacher LLMs whose terms of service may restrict redistribution of generated text. We publish the methodology and prompts so anyone can regenerate their own corpus.
Personal data. Some episodic memory pairs in Tipo B were patterned on actual journal entries. Not redistributed.
Internal Celiums corpus (skills + papers replay). Comes from the parallel Celiums knowledge-engine track, separate licensing.
Hyphae internals. Hyphae is a separate research/product track. Conceptual references in some docs; no implementation in this repo.

License

MIT for code, evaluation, and documentation. See LICENSE.

If you use this work, please cite:

@misc{tinymars2026,
  author = {Mario Gutierrez},
  title  = {tinyMARS: First quantitative validation of channel causality in a from-scratch channel-causal architecture},
  year   = {2026},
  month  = {May},
  howpublished = {\url{https://github.com/terrizoaguimor/tinymars}},
  note   = {Celiums Solutions LLC, paper-grade research artifact}
}

Contact

Website: https://celiums.ai/
Research post: https://celiums.ai/research/tinymars
Twitter/X: @CeliumsKnowledge
Issues: please open a GitHub issue for technical questions

Acknowledgments

LLM tools used during development

Anthropic Claude Opus 4.7 and Claude 4.6 Sonnet, served via DigitalOcean Gradient AI Inference, used for corpus quality scoring, judge filtering of generated channel-causal pairs, and rewrite tasks during identity-defense corpus refinement. Use governed by the DigitalOcean Gradient AI Inference terms of service.

Tokenizer

Custom 32K BPE tokenizer trained from scratch on the tinyMARS corpus using SentencePiece. tinyMARS does not derive from any pretrained tokenizer.

Parallel work referenced

MARS-Real (separate track, not part of this OSS release): a continued pretraining derivative of Google's Gemma 4 E2B-it, used in compliance with the Gemma Terms of Use. MARS-Real is the deployable artifact; tinyMARS (this work) is the architectural validation prototype trained from random initialization.

Compute and inference

Single H200 SXM 141 GB cloud instance via DigitalOcean Cloud GPU Droplets plus teacher/judge LLM calls via DigitalOcean Gradient AI Inference.

Direct cost breakdown (~$1,110 total):

Item	Cost	Notes
Teacher / judge LLM (Opus 4.7 + Sonnet 4.6)	~$945	~33,000 generation+judge cycles across all four iterations of channel-causal corpus
H200 SXM 141 GB droplet	~$160	2 days × $3.34/h — all training iterations + corpus building
DOKS compute-burst	~$5	parallel corpus generation
DO Spaces (S3-compatible)	<$1	shard distribution

Important note on cost shape: the dominant cost is inference, not compute. Generating channel-causal training pairs with a frontier teacher LLM (Anthropic Claude Opus 4.7) costs more than running the H200 itself for the same window. This is a useful number to be honest about: tinyMARS-class architectural validation experiments are inference-bound when the corpus is teacher-generated. Future iterations could reduce this materially by using smaller teacher models for non-critical pair categories or by replacing teacher-generated content with curated public datasets where the channel signal is naturally present.

Not included: Mario's time; pre-existing Celiums infrastructure (memory engine, knowledge engine, OpenSearch, Postgres + pgvector, MCP tools) developed over the prior nine months and not chargeable to this experiment; snapshot retention going forward (~$12/month). Estimates use observed API pricing as of May 2026 (Opus 4.7: $15/M input, $75/M output; Sonnet 4.6: $3/M input, $15/M output) and per-pair token averages from production runs; final billing reconciliation pending.

Inspiration / prior work

ReZero — Bachlechner et al. (2020) — alpha-gate identity-init pattern for residual connections.
ACT-R framework — Anderson et al. — procedural memory architecture and sub-symbolic activation.
BGE-M3 — BAAI — multilingual 1024-d embeddings used for memory / identity / continuity channels.
RoPE — Su et al. (2021) — rotary positional embeddings.

Built between 2026-05-02 and 2026-05-04. Total direct cost: ~$1,110, dominated by teacher LLM inference for corpus generation. Honest reporting of what we proved and what we didn't.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tinyMARS

What this is

What this is NOT

Repository layout

Reproducing the experiments

What's not in this repo, and why

License

Contact

Acknowledgments

LLM tools used during development

Tokenizer

Parallel work referenced

Compute and inference

Inspiration / prior work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
artifacts		artifacts
docs		docs
eval		eval
pipeline		pipeline
scripts		scripts
src		src
training		training
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

tinyMARS

What this is

What this is NOT

Repository layout

Reproducing the experiments

What's not in this repo, and why

License

Contact

Acknowledgments

LLM tools used during development

Tokenizer

Parallel work referenced

Compute and inference

Inspiration / prior work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages