Skip to content

docs: extract stable knowledge into ADRs and trim plan.md#2

Merged
dndungu merged 51 commits intomainfrom
docs/extract-adrs-trim-plan
Mar 3, 2026
Merged

docs: extract stable knowledge into ADRs and trim plan.md#2
dndungu merged 51 commits intomainfrom
docs/extract-adrs-trim-plan

Conversation

@dndungu
Copy link
Owner

@dndungu dndungu commented Mar 3, 2026

Summary

  • Trimmed docs/plan.md from 3058 to 272 lines (91% reduction) by extracting stable design decisions into 6 ADR files
  • Replaced ~2500 lines of completed task breakdowns with phase summaries and ADR cross-references
  • Added ADR index table (Section 13) and minor updates to docs/design.md (+25 lines)
  • Preserved blocked item E29 (GPU validation), operating procedure, hand-off notes, final scorecard, and new packages table in plan.md

New ADR files (docs/adr/)

ADR Title Phase
001 Enterprise Production Readiness 4+7
002 Distributed Training Protocol 5
003 Open Weights Model Import 6
004 Embeddable Inference Library 8
005 Multi-Architecture Support 9
006 GPU Engine Architecture 2-3

Test plan

  • Verify all ADR cross-references from plan.md and design.md resolve to existing files
  • Verify blocked item E29 is fully described in plan.md
  • Verify no code changes (docs only)

dndungu and others added 30 commits March 3, 2026 09:00
Add parseLlamaConfig for model_type "llama" mapping HuggingFace field
names to ModelMetadata. Default rope_theta is 500000 (Llama 3 default).
Register in DefaultArchConfigRegistry.

Add testdata/llama3_config.json fixture based on Llama 3.1 8B config.
Table-driven tests + fixture-based test verify all field mappings
including RoPE scaling.

Implements T57.2 (S57.2.1 through S57.2.4).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add parseMistralConfig for model_type "mistral" (default rope_theta
10000, adds sliding_window). Add parseQwenConfig for model_type "qwen2"
(attention_bias always true, default rope_theta 1000000, YaRN rope
scaling support). Register both in DefaultArchConfigRegistry.

Add testdata fixtures: mistral7b_config.json, qwen25_7b_config.json.
Table-driven + fixture-based tests verify all field mappings.

Implements T57.3 (S57.3.1 through S57.3.4).
Add parsePhiConfig for model_type "phi3"/"phi" with partial_rotary_factor
(default 1.0) and tie_word_embeddings. Add parseDeepSeekConfig for
model_type "deepseek_v3" with MLA fields (kv_lora_rank, q_lora_rank,
qk_rope_head_dim) and MoE fields (n_routed_experts, num_experts_per_tok,
n_shared_experts). Register all in DefaultArchConfigRegistry.

Extend ModelMetadata with 7 new fields: PartialRotaryFactor, KVLoRADim,
QLoRADim, QKRopeHeadDim, NumExperts, NumExpertsPerToken, NumSharedExperts.

Add testdata fixtures: phi4_config.json, deepseek_v3_config.json.
Table-driven + fixture-based tests verify all field mappings.

Implements T57.4 (S57.4.1 through S57.4.4).
Update loadMetadata to unmarshal config.json into raw map, then dispatch
to DefaultArchConfigRegistry for architecture-specific field mapping.
Overlays chat_template from raw JSON (internal field not in HuggingFace
configs). Existing Gemma loading continues to work. Llama, Mistral,
Qwen, Phi, and DeepSeek configs now parsed correctly via their
registered parsers. Unknown model_type falls back to generic parsing.

Implements T57.5 (S57.5.1 through S57.5.3).
ParamResolver interface maps architecture-specific weight names to
canonical names used by Zerfoo layers. Phi renames dense_proj to
o_proj; all other families (Llama, Gemma, Mistral, Qwen, DeepSeek)
use canonical naming. Generic ResolveAll helper creates alias maps.
100% coverage on param_resolver.go.
Add variadic BuildOption with WithParamResolver to BuildFromZMF.
When supplied, the resolver adds canonical parameter name aliases
via ResolveAll so layer builders can look up parameters by canonical
name even when the ZMF file uses architecture-specific names.
Backward compatible: no options = no resolver = existing behavior.
Move dirRegistry, forward pass, greedy decode, and generation test
patterns into helpers_test.go. Simplify Gemma 3 test files to use
the shared helpers, eliminating code duplication.
… tests

Env-gated by LLAMA3_ZMF_PATH / LLAMA3_MODEL_DIR. Tests skip
gracefully when model files are absent. Covers forward pass shape
validation, 5-step greedy decode, and inference API generation
(greedy determinism, stream parity, chat).
Extract modelParityConfig, runModelForwardPass, runModelGreedyDecode,
runModelGeneration helpers to eliminate structural duplication across
model test files. Migrate Gemma 3 and Llama 3 tests to the new pattern,
and merge gemma3_generation_test.go into gemma3_test.go.
… tests

Three env-gated parity tests for Mistral models using the shared
modelParityConfig pattern. Skips gracefully when MISTRAL_ZMF_PATH
and MISTRAL_MODEL_DIR are not set.
The buildGroupedQueryAttention registry builder now looks up optional
bias parameters (name_wq_bias, name_wk_bias, name_wv_bias, name_wo_bias)
and passes them to NewDenseFromParams. When absent, behavior is unchanged
(backward compatible). This enables Qwen models which use attention_bias=true.
WithYaRNScaling(factor, origMaxLen) modifies inverse frequencies:
- Low frequency (wavelength > factor*origMaxLen): scaled by 1/factor
- Intermediate (origMaxLen <= wavelength <= factor*origMaxLen): interpolated
- High frequency (wavelength < origMaxLen): unchanged

AttentionScaleFactor() returns sqrt(1 + ln(factor)/ln(origMaxLen)).
Without YaRN, behavior is unchanged (backward compatible).
The buildGroupedQueryAttention function now reads optional
rope_scaling_type, rope_scaling_factor, and rope_scaling_orig_max_len
attributes and passes WithYaRNScaling to the RoPE constructor.
WithGlobalAttributes injects extra attributes into every node during
graph construction. The inference loader now converts
ModelMetadata.RopeScaling into global attributes so GQA nodes receive
YaRN parameters without the ZMF file needing to carry them.
dndungu added 21 commits March 3, 2026 11:07
Forward pass, greedy decode, and generation tests gated by
QWEN25_ZMF_PATH and QWEN25_MODEL_DIR environment variables.
WithRotaryDimFraction controls what fraction of head dimensions receive
rotation. Default is 1.0 (all dims rotated). When fraction < 1.0, the
Forward/Backward methods split into rotated and pass-through portions.
Phi-4 uses 0.75 for partial RoPE.
The buildGroupedQueryAttention function now reads an optional
partial_rotary_factor attribute and passes WithRotaryDimFraction
to the RoPE constructor when fraction < 1.0.
When ModelMetadata.PartialRotaryFactor is set (0 < f < 1), inject it
as a global attribute so GQA nodes receive partial RoPE configuration.
NewTiedLMHead creates an LMHead that reuses the token embedding weight
matrix (transposed) instead of owning its own. The tied LMHead has no
trainable parameters since the embedding layer owns the weight.
Forward pass, greedy decode, and generation tests gated by
PHI4_ZMF_PATH and PHI4_MODEL_DIR environment variables.
Implements MLA as used in DeepSeek V3/R1. Compresses KV into a
low-rank latent vector via down-projection, then up-projects to
K and V. Includes RoPE integration and SDPA.
Reads num_heads, head_dim, kv_lora_dim, max_seq_len from attributes
and loads W_Q, W_DKV, W_UK, W_UV, W_O from node parameters. Includes
tests for missing attributes, missing params, and custom rope_base.
When SharedExpert is non-nil, its output is added to the weighted
routed expert sum for every token. When nil, behavior is unchanged
(backward compatible).
Env-gated by DEEPSEEK_ZMF_PATH / DEEPSEEK_MODEL_DIR. Includes
forward pass, greedy decode, and generation tests.
Update docs/plan.md to mark E69 (final verification) complete.
Add Section 12 to docs/design.md documenting supported model families,
architecture-specific features, and config registry.
Trim plan.md from 3058 to 272 lines by extracting stable design
decisions into 6 ADR files and replacing ~2500 lines of detailed
task breakdowns with phase summaries and ADR cross-references.

New files:
- docs/adr/001-enterprise-production-readiness.md (Phases 4+7)
- docs/adr/002-distributed-training-protocol.md (Phase 5)
- docs/adr/003-open-weights-model-import.md (Phase 6)
- docs/adr/004-embeddable-inference-library.md (Phase 8)
- docs/adr/005-multi-architecture-support.md (Phase 9)
- docs/adr/006-gpu-engine-architecture.md (Phases 2-3)

Updates to design.md (+25 lines):
- Section 7.2: added 6 model families' parity test env vars
- Section 12.4: added parameter name resolver note
- Section 13: added ADR index table

Preserved in plan.md: blocked item E29 (GPU validation),
operating procedure, hand-off notes, final scorecard, and
new packages table.
@dndungu dndungu merged commit b16fd47 into main Mar 3, 2026
1 of 2 checks passed
@dndungu dndungu deleted the docs/extract-adrs-trim-plan branch March 3, 2026 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant