[WIP] Implement custom SpectralLinear layer and SharedProjector module by Copilot · Pull Request #10 · kailean/parameter-golf

Copilot · 2026-04-05T15:45:11Z

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.

Stage 1: The Spectral Weights & Aliasing

Goal: Replace standard nn.Linear layers with "Spectral" layers and tie the Attention and FFN weights.

Copy-paste this into Copilot:

"I am building a high-density 'Fractal Compressor' for the Parameter Golf challenge (16MB limit). I need to implement a custom PyTorch layer called SpectralLinear.

Instead of storing a full weight matrix $W$, this layer should store a small set of learnable coefficients $C$ (a low-rank matrix) and use a fixed, non-learnable 2D basis (like a Discrete Cosine Transform or a fixed Hadamard matrix) to reconstruct the effective weight matrix during the forward pass. The goal is to achieve the expressive power of a large matrix with 1/10th of the parameters.

Additionally, implement 'Parameter Aliasing': create a SharedProjector module where the Query projection ($W_q$) and the FFN-up projection ($W_{up}$) share the same base SpectralLinear weights, but are differentiated by a small, learnable bias vector (a 'shift vector').

Please write the SpectralLinear class and the SharedProjector class. Ensure the reconstruction happens efficiently in the forward pass."

Stage 2: Dynamic Sequence Folding (The "Squeeze" Layer)

Goal: Create the mechanism that merges tokens as the model goes deeper into the recurrence.

Copy-paste this into Copilot:

"Now, I need to implement a SequenceFolder module to be used within a recurrent transformer loop.

The goal is to reduce the sequence length $L$ as the model iterates. Every $K$ iterations, the SequenceFolder should merge adjacent tokens or use a learned pooling mechanism (e.g., a 1D convolution with stride 2 or a weighted average) to compress the sequence.

This must be differentiable. Provide a FoldedRecurrentBlock that wraps a standard transformer block and the SequenceFolder. The block should track the current iteration count and trigger the folding operation only when iteration % K == 0. Ensure that the positional embeddings are updated or handled correctly after the sequence length changes."

Stage 3: The Final Assembly (Byte-Level + Recurrence)

Goal: Integrate everything into the main model and swap the large embedding matrix for a byte-level system.

Copy-paste this into Copilot:

"Finally, let's assemble the full model. Replace the standard large token embedding matrix with a 'Byte-Level Virtual Vocab' approach.

Use a small embedding matrix for 256 bytes.

Implement a tiny 'Assembler' (a small 2-layer MLP) that projects these byte embeddings into the model dimension $d_{model}$.

Integrate the FoldedRecurrentBlock and the SharedProjector (from previous steps) into a final FractalCompressor class.

Implement the recurrent loop: for i in range(num_iterations): x = self.recurrent_block(x, i).

Tie the final LM Head weights back to the Assembler's projection weights to minimize parameters.

Optimize the entire architecture to fit under 16MB while maximizing $d_{model}$. Please provide the complete model class and the forward pass logic."

Initial plan

5fcb181

Copilot AI assigned Copilot and kailean Apr 5, 2026

Copilot started work on behalf of kailean April 5, 2026 15:45 View session

Copilot AI requested a review from kailean April 5, 2026 15:45

Copilot stopped work on behalf of kailean due to an error April 5, 2026 15:45
The session was cancelled by the user.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Implement custom SpectralLinear layer and SharedProjector module#10

[WIP] Implement custom SpectralLinear layer and SharedProjector module#10
Copilot wants to merge 1 commit intomainfrom
copilot/replace-standard-nn-linear-layers

Copilot AI commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 5, 2026

Stage 1: The Spectral Weights & Aliasing

Stage 2: Dynamic Sequence Folding (The "Squeeze" Layer)

Stage 3: The Final Assembly (Byte-Level + Recurrence)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants