Merged
Conversation
…and what pruning does
…omments for train and prune tokenizer
…in_config.yaml; deleted extra config.
Add pruning functionality to tokenizer creation to remove zero frequency tokens
* Add rms_norm_eps argument to config with default 1e-6 * Fix type issues * Add convert_to_hf.py script and tests * Re-enable tests in CI * Use torch>=2.6 in pyproject.toml * Improve naming * Fix type errors * Added conversion scripts and corresponding tests * Fixed pyright issues * Marked a test as slow since it downloads all models from HF * Revert "Marked a test as slow since it downloads all models from HF" This reverts commit 2c9aedb. Wrong commit with pytest! * Marked a test as slow since it downloads all models from HF * corrected the docstring of a test case. Made it more verbose to mention the backward compatibility --------- Co-authored-by: chandanms <[email protected]>
* Reset train dataloader when depleted * Fix pyright errors * Cast instead of isinstance * Update pinned torch version * Factor out gpt2 and make general train.py * Prefix wandb run name with model_id * Create gpt2 hf converters * Create push_to_hf * Upload tokenizer to hf too * Refactor gpt conversions
…ens; Made the data to tokenizer training iterable.
…_test_cases Tokenizer test cases and reformatting of tokenizer training file
Fix for accidently adding the updated read me to wrong folder
Added a note that mentions deviation from the original paper to implementation
* Play around with no ln * Fix weights_only=True * Add ability to finetune without ln * Revert to old llama from_pretrained * Minor fixes for ln ablation finetune * Fix pyright error
- Add gpt2_simple-pile, gpt2_simple-pile-1L, gpt2_simple-pile-2L model configs - Use GPT-2 tokenizer vocab (50259 = 50257 + [UNK] + [EOS]) - Same architecture as SimpleStories versions (4L/1L/2L, 4 heads, 128 embd) - Add train_config_pile.yaml for training on monology/pile-uncopyrighted - Uses NeelNanda/pile-10k for validation (smaller, faster) - Streaming enabled for large Pile dataset 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Use underscore notation for num_iterations (100_000) - Consolidate model config comment to single line 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
When streaming datasets are loaded (e.g., Pile), the IterableDataset.features attribute is None, causing a TypeError when trying to iterate over it. This fix skips the column filtering for streaming datasets without features, since the map() function with remove_columns parameter will handle column management properly during tokenization. Also updated type hints to accept Dataset | IterableDataset. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The tokenize_and_concatenate function was not receiving the column_name parameter from the dataset config, causing it to use the default value 'story' instead of the actual column name (e.g., 'text' for Pile dataset). This caused _keep_single_column to remove all columns since it was looking for 'story' but the actual column was 'text'. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The Pile dataset uses zstd compression, which requires the zstandard Python package. Without it, dataset loading fails with: ValueError: Compression type zstd not supported 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add scripts/slurm_train.py for generating and submitting SLURM jobs - Support single-node (1-8 GPUs) and multi-node (>8 GPUs) DDP training - Add SLURM documentation to README - Add slurm_scripts/ and slurm_logs/ to .gitignore 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Use hyphen (gpt2_simple-pile) not underscore (gpt2_simple_pile) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The validation loss was different between 1-GPU and 8-GPU runs because the validation dataset was being split across ranks, but val_loss was only computed on rank 0 without all-reduce. This meant 8-GPU runs were evaluating on 1/8th of the validation data (different samples than 1-GPU). Now validation data is not split, so all ranks see the same validation samples, making val_loss comparable across different GPU configurations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
# Conflicts: # simple_stories_train/models/model_configs.py
Restructure and handle pile training
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.