docs: extract stable knowledge into ADRs and trim plan.md by dndungu · Pull Request #2 · dndungu/zerfoo

dndungu · 2026-03-03T20:35:56Z

Summary

Trimmed docs/plan.md from 3058 to 272 lines (91% reduction) by extracting stable design decisions into 6 ADR files
Replaced ~2500 lines of completed task breakdowns with phase summaries and ADR cross-references
Added ADR index table (Section 13) and minor updates to docs/design.md (+25 lines)
Preserved blocked item E29 (GPU validation), operating procedure, hand-off notes, final scorecard, and new packages table in plan.md

New ADR files (`docs/adr/`)

ADR	Title	Phase
001	Enterprise Production Readiness	4+7
002	Distributed Training Protocol	5
003	Open Weights Model Import	6
004	Embeddable Inference Library	8
005	Multi-Architecture Support	9
006	GPU Engine Architecture	2-3

Test plan

Verify all ADR cross-references from plan.md and design.md resolve to existing files
Verify blocked item E29 is fully described in plan.md
Verify no code changes (docs only)

Add parseLlamaConfig for model_type "llama" mapping HuggingFace field names to ModelMetadata. Default rope_theta is 500000 (Llama 3 default). Register in DefaultArchConfigRegistry. Add testdata/llama3_config.json fixture based on Llama 3.1 8B config. Table-driven tests + fixture-based test verify all field mappings including RoPE scaling. Implements T57.2 (S57.2.1 through S57.2.4). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add parseMistralConfig for model_type "mistral" (default rope_theta 10000, adds sliding_window). Add parseQwenConfig for model_type "qwen2" (attention_bias always true, default rope_theta 1000000, YaRN rope scaling support). Register both in DefaultArchConfigRegistry. Add testdata fixtures: mistral7b_config.json, qwen25_7b_config.json. Table-driven + fixture-based tests verify all field mappings. Implements T57.3 (S57.3.1 through S57.3.4).

Add parsePhiConfig for model_type "phi3"/"phi" with partial_rotary_factor (default 1.0) and tie_word_embeddings. Add parseDeepSeekConfig for model_type "deepseek_v3" with MLA fields (kv_lora_rank, q_lora_rank, qk_rope_head_dim) and MoE fields (n_routed_experts, num_experts_per_tok, n_shared_experts). Register all in DefaultArchConfigRegistry. Extend ModelMetadata with 7 new fields: PartialRotaryFactor, KVLoRADim, QLoRADim, QKRopeHeadDim, NumExperts, NumExpertsPerToken, NumSharedExperts. Add testdata fixtures: phi4_config.json, deepseek_v3_config.json. Table-driven + fixture-based tests verify all field mappings. Implements T57.4 (S57.4.1 through S57.4.4).

Update loadMetadata to unmarshal config.json into raw map, then dispatch to DefaultArchConfigRegistry for architecture-specific field mapping. Overlays chat_template from raw JSON (internal field not in HuggingFace configs). Existing Gemma loading continues to work. Llama, Mistral, Qwen, Phi, and DeepSeek configs now parsed correctly via their registered parsers. Unknown model_type falls back to generic parsing. Implements T57.5 (S57.5.1 through S57.5.3).

ParamResolver interface maps architecture-specific weight names to canonical names used by Zerfoo layers. Phi renames dense_proj to o_proj; all other families (Llama, Gemma, Mistral, Qwen, DeepSeek) use canonical naming. Generic ResolveAll helper creates alias maps. 100% coverage on param_resolver.go.

Add variadic BuildOption with WithParamResolver to BuildFromZMF. When supplied, the resolver adds canonical parameter name aliases via ResolveAll so layer builders can look up parameters by canonical name even when the ZMF file uses architecture-specific names. Backward compatible: no options = no resolver = existing behavior.

Move dirRegistry, forward pass, greedy decode, and generation test patterns into helpers_test.go. Simplify Gemma 3 test files to use the shared helpers, eliminating code duplication.

… tests Env-gated by LLAMA3_ZMF_PATH / LLAMA3_MODEL_DIR. Tests skip gracefully when model files are absent. Covers forward pass shape validation, 5-step greedy decode, and inference API generation (greedy determinism, stream parity, chat).

Extract modelParityConfig, runModelForwardPass, runModelGreedyDecode, runModelGeneration helpers to eliminate structural duplication across model test files. Migrate Gemma 3 and Llama 3 tests to the new pattern, and merge gemma3_generation_test.go into gemma3_test.go.

… tests Three env-gated parity tests for Mistral models using the shared modelParityConfig pattern. Skips gracefully when MISTRAL_ZMF_PATH and MISTRAL_MODEL_DIR are not set.

The buildGroupedQueryAttention registry builder now looks up optional bias parameters (name_wq_bias, name_wk_bias, name_wv_bias, name_wo_bias) and passes them to NewDenseFromParams. When absent, behavior is unchanged (backward compatible). This enables Qwen models which use attention_bias=true.

WithYaRNScaling(factor, origMaxLen) modifies inverse frequencies: - Low frequency (wavelength > factor*origMaxLen): scaled by 1/factor - Intermediate (origMaxLen <= wavelength <= factor*origMaxLen): interpolated - High frequency (wavelength < origMaxLen): unchanged AttentionScaleFactor() returns sqrt(1 + ln(factor)/ln(origMaxLen)). Without YaRN, behavior is unchanged (backward compatible).

The buildGroupedQueryAttention function now reads optional rope_scaling_type, rope_scaling_factor, and rope_scaling_orig_max_len attributes and passes WithYaRNScaling to the RoPE constructor.

WithGlobalAttributes injects extra attributes into every node during graph construction. The inference loader now converts ModelMetadata.RopeScaling into global attributes so GQA nodes receive YaRN parameters without the ZMF file needing to carry them.

Forward pass, greedy decode, and generation tests gated by QWEN25_ZMF_PATH and QWEN25_MODEL_DIR environment variables.

WithRotaryDimFraction controls what fraction of head dimensions receive rotation. Default is 1.0 (all dims rotated). When fraction < 1.0, the Forward/Backward methods split into rotated and pass-through portions. Phi-4 uses 0.75 for partial RoPE.

The buildGroupedQueryAttention function now reads an optional partial_rotary_factor attribute and passes WithRotaryDimFraction to the RoPE constructor when fraction < 1.0.

When ModelMetadata.PartialRotaryFactor is set (0 < f < 1), inject it as a global attribute so GQA nodes receive partial RoPE configuration.

NewTiedLMHead creates an LMHead that reuses the token embedding weight matrix (transposed) instead of owning its own. The tied LMHead has no trainable parameters since the embedding layer owns the weight.

Forward pass, greedy decode, and generation tests gated by PHI4_ZMF_PATH and PHI4_MODEL_DIR environment variables.

Implements MLA as used in DeepSeek V3/R1. Compresses KV into a low-rank latent vector via down-projection, then up-projects to K and V. Includes RoPE integration and SDPA.

Reads num_heads, head_dim, kv_lora_dim, max_seq_len from attributes and loads W_Q, W_DKV, W_UK, W_UV, W_O from node parameters. Includes tests for missing attributes, missing params, and custom rope_base.

When SharedExpert is non-nil, its output is added to the weighted routed expert sum for every token. When nil, behavior is unchanged (backward compatible).

Env-gated by DEEPSEEK_ZMF_PATH / DEEPSEEK_MODEL_DIR. Includes forward pass, greedy decode, and generation tests.

Update docs/plan.md to mark E69 (final verification) complete. Add Section 12 to docs/design.md documenting supported model families, architecture-specific features, and config registry.

Trim plan.md from 3058 to 272 lines by extracting stable design decisions into 6 ADR files and replacing ~2500 lines of detailed task breakdowns with phase summaries and ADR cross-references. New files: - docs/adr/001-enterprise-production-readiness.md (Phases 4+7) - docs/adr/002-distributed-training-protocol.md (Phase 5) - docs/adr/003-open-weights-model-import.md (Phase 6) - docs/adr/004-embeddable-inference-library.md (Phase 8) - docs/adr/005-multi-architecture-support.md (Phase 9) - docs/adr/006-gpu-engine-architecture.md (Phases 2-3) Updates to design.md (+25 lines): - Section 7.2: added 6 model families' parity test env vars - Section 12.4: added parameter name resolver note - Section 13: added ADR index table Preserved in plan.md: blocked item E29 (GPU validation), operating procedure, hand-off notes, final scorecard, and new packages table.

dndungu and others added 30 commits March 3, 2026 09:00

docs(plan): mark T57.2 complete (Llama config parser)

9ee68f1

docs(plan): mark T57.3 complete (Mistral and Qwen config parsers)

e206534

docs(plan): mark T57.4 complete (Phi and DeepSeek config parsers)

06fb782

docs(plan): mark T57.5 complete (config registry integration)

640b7f7

docs(plan): mark T57.6 and E57 complete (all quality gates pass)

8b69258

docs(plan): mark T58.1 complete (parameter name resolver)

68f569e

feat: forward BuildOption through top-level BuildFromZMF wrapper

c7afb92

docs(plan): mark T58.2 complete (resolver integration in builder)

abeeeaa

docs(plan): mark T58.3 and E58 complete (all quality gates pass)

9380f44

refactor(parity): extract shared test helpers from Gemma 3 tests

4aa07c6

Move dirRegistry, forward pass, greedy decode, and generation test patterns into helpers_test.go. Simplify Gemma 3 test files to use the shared helpers, eliminating code duplication.

docs(plan): mark T59.1 complete (Llama 3 parity tests)

b2ec741

feat(parity): add Mistral forward pass, greedy decode, and generation…

d4688f1

… tests Three env-gated parity tests for Mistral models using the shared modelParityConfig pattern. Skips gracefully when MISTRAL_ZMF_PATH and MISTRAL_MODEL_DIR are not set.

docs(plan): mark T59.2 complete (Mistral parity tests)

f51d8e1

docs(plan): mark T59.3 and E59 complete (all quality gates pass)

3ffac84

docs(plan): mark T60.1 complete (QKV bias support in GQA)

86e5557

docs(plan): mark T60.2 and E60 complete (QKV bias quality gates pass)

ebb3d79

docs(plan): mark T61.1 complete (YaRN RoPE scaling)

6966a2b

feat(attention): support YaRN scaling in GQA builder

4c05d9d

The buildGroupedQueryAttention function now reads optional rope_scaling_type, rope_scaling_factor, and rope_scaling_orig_max_len attributes and passes WithYaRNScaling to the RoPE constructor.

docs(plan): mark T61.2 complete (YaRN config integration)

1784ba9

dndungu added 21 commits March 3, 2026 11:07

docs(plan): mark T61.3 complete (E61 YaRN scaling verified)

6915ebc

test(parity): add Qwen 2.5 parity tests

dc087e7

Forward pass, greedy decode, and generation tests gated by QWEN25_ZMF_PATH and QWEN25_MODEL_DIR environment variables.

docs(plan): mark E62 complete (Qwen validation)

0cb09d7

feat(attention): support partial RoPE in GQA builder

d666fde

The buildGroupedQueryAttention function now reads an optional partial_rotary_factor attribute and passes WithRotaryDimFraction to the RoPE constructor when fraction < 1.0.

feat(inference): wire partial_rotary_factor into global attributes

d96191d

When ModelMetadata.PartialRotaryFactor is set (0 < f < 1), inject it as a global attribute so GQA nodes receive partial RoPE configuration.

docs(plan): mark E63 complete (partial RoPE for Phi-4)

1bbac58

feat(core): add NewTiedLMHead for weight-tied embeddings

7fd6123

NewTiedLMHead creates an LMHead that reuses the token embedding weight matrix (transposed) instead of owning its own. The tied LMHead has no trainable parameters since the embedding layer owns the weight.

docs(plan): mark E64 complete (tied embeddings)

6d70340

test(parity): add Phi-4 parity tests

eb84f15

Forward pass, greedy decode, and generation tests gated by PHI4_ZMF_PATH and PHI4_MODEL_DIR environment variables.

docs(plan): mark E65 complete (Phi-4 validation)

6a9443a

feat(attention): add Multi-head Latent Attention (MLA) layer

bcfd22f

Implements MLA as used in DeepSeek V3/R1. Compresses KV into a low-rank latent vector via down-projection, then up-projects to K and V. Includes RoPE integration and SDPA.

feat(attention): add BuildMultiHeadLatentAttention builder

1acf2a6

Reads num_heads, head_dim, kv_lora_dim, max_seq_len from attributes and loads W_Q, W_DKV, W_UK, W_UV, W_O from node parameters. Includes tests for missing attributes, missing params, and custom rope_base.

feat(registry): register MultiHeadLatentAttention in RegisterAll

5a81b9d

docs(plan): mark E66 complete (Multi-head Latent Attention)

f4b96f2

feat(core): add shared expert support to MixtureOfExperts

06a12ea

When SharedExpert is non-nil, its output is added to the weighted routed expert sum for every token. When nil, behavior is unchanged (backward compatible).

docs(plan): mark E67 complete (shared expert MoE)

beba4bd

test(parity): add DeepSeek V3 parity tests

13b0974

Env-gated by DEEPSEEK_ZMF_PATH / DEEPSEEK_MODEL_DIR. Includes forward pass, greedy decode, and generation tests.

docs(plan): mark E68 complete (DeepSeek V3 validation)

88499c8

docs: mark Phase 9 complete and add multi-architecture section

8d45548

Update docs/plan.md to mark E69 (final verification) complete. Add Section 12 to docs/design.md documenting supported model families, architecture-specific features, and config registry.

dndungu merged commit b16fd47 into main Mar 3, 2026
1 of 2 checks passed

dndungu deleted the docs/extract-adrs-trim-plan branch March 3, 2026 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: extract stable knowledge into ADRs and trim plan.md#2

docs: extract stable knowledge into ADRs and trim plan.md#2
dndungu merged 51 commits intomainfrom
docs/extract-adrs-trim-plan

dndungu commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dndungu commented Mar 3, 2026

Summary

New ADR files (docs/adr/)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New ADR files (`docs/adr/`)