Skip to content

[WIP] Implement custom SpectralLinear layer and SharedProjector module#10

Draft
Copilot wants to merge 1 commit intomainfrom
copilot/replace-standard-nn-linear-layers
Draft

[WIP] Implement custom SpectralLinear layer and SharedProjector module#10
Copilot wants to merge 1 commit intomainfrom
copilot/replace-standard-nn-linear-layers

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 5, 2026

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.

Stage 1: The Spectral Weights & Aliasing

Goal: Replace standard nn.Linear layers with "Spectral" layers and tie the Attention and FFN weights.

Copy-paste this into Copilot:

"I am building a high-density 'Fractal Compressor' for the Parameter Golf challenge (16MB limit). I need to implement a custom PyTorch layer called SpectralLinear.

Instead of storing a full weight matrix $W$, this layer should store a small set of learnable coefficients $C$ (a low-rank matrix) and use a fixed, non-learnable 2D basis (like a Discrete Cosine Transform or a fixed Hadamard matrix) to reconstruct the effective weight matrix during the forward pass. The goal is to achieve the expressive power of a large matrix with 1/10th of the parameters.

Additionally, implement 'Parameter Aliasing': create a SharedProjector module where the Query projection ($W_q$) and the FFN-up projection ($W_{up}$) share the same base SpectralLinear weights, but are differentiated by a small, learnable bias vector (a 'shift vector').

Please write the SpectralLinear class and the SharedProjector class. Ensure the reconstruction happens efficiently in the forward pass."


Stage 2: Dynamic Sequence Folding (The "Squeeze" Layer)

Goal: Create the mechanism that merges tokens as the model goes deeper into the recurrence.

Copy-paste this into Copilot:

"Now, I need to implement a SequenceFolder module to be used within a recurrent transformer loop.

The goal is to reduce the sequence length $L$ as the model iterates. Every $K$ iterations, the SequenceFolder should merge adjacent tokens or use a learned pooling mechanism (e.g., a 1D convolution with stride 2 or a weighted average) to compress the sequence.

This must be differentiable. Provide a FoldedRecurrentBlock that wraps a standard transformer block and the SequenceFolder. The block should track the current iteration count and trigger the folding operation only when iteration % K == 0. Ensure that the positional embeddings are updated or handled correctly after the sequence length changes."


Stage 3: The Final Assembly (Byte-Level + Recurrence)

Goal: Integrate everything into the main model and swap the large embedding matrix for a byte-level system.

Copy-paste this into Copilot:

"Finally, let's assemble the full model. Replace the standard large token embedding matrix with a 'Byte-Level Virtual Vocab' approach.

  1. Use a small embedding matrix for 256 bytes.
  2. Implement a tiny 'Assembler' (a small 2-layer MLP) that projects these byte embeddings into the model dimension $d_{model}$.
  3. Integrate the FoldedRecurrentBlock and the SharedProjector (from previous steps) into a final FractalCompressor class.
  4. Implement the recurrent loop: for i in range(num_iterations): x = self.recurrent_block(x, i).
  5. Tie the final LM Head weights back to the Assembler's projection weights to minimize parameters.

Optimize the entire architecture to fit under 16MB while maximizing $d_{model}$. Please provide the complete model class and the forward pass logic."

Copilot AI requested a review from kailean April 5, 2026 15:45
Copilot stopped work on behalf of kailean due to an error April 5, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants