Skip to content

vamiller12/amber-recipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amber: a 7B English language model with the LLaMA architecture.

amber logo


license

Overview

All LLM360 models are trained and released to make access to LLM training knowledge accessible to all. This repo contains the complete training process and details we used to train Amber.

Reproduce Amber

The repo is organized into subfolders by function.

To reproduce the entire training process, to proper order is:

  1. Begin by pretraining the model
  2. Determine the models performance through evaluations and benchmarks
  3. Improve the base model with chat specific functionality via finetuning
  4. Interact with model by downloading Amber for inference

Repository Organization

Contains examples are organized in folders by topic:

Subfolder Description
reproduce amber Instructions to fully reproduce Amber from data prep to trained model
finetuning Scripts to finetune Amber for chat, SFT, and DPO alignment options
inference Scripts to deploy Amber for inference locally
evaluations and benchmarks Scripts to evaluation Amber and compare against LLM360's results

About Amber

Amber is an 7B English language model with the LLaMA architecture.

Training Details

Hyperparameters Hyperparameter Value Data Mix Subset Tokens (Billion)
Total Parameters 6.7B Arxiv 30.00
Hidden Size 4096 Book 28.86
Intermediate Size (MLPs) 11008 C4 197.67
Number of Attention Heads 32 Refined-Web 665.01
Number of Hidden Layers 32 StarCoder 291.92
RMSNorm ɛ 1e^-6 StackExchange 21.75
Max Seq Length 2048 Wikipedia 23.90
Vocab Size 32000 Total 1259.13

About LLM360

LLM360 is an initiative for comprehensive and fully open-sourced LLMs, where all training details, model checkpoints, intermediate results, and additional analyses are made available to the community. Our goal is to advance the field by inviting the community to deepen the understanding of LLMs together. As the first step of the project LLM360, we release all intermediate model checkpoints, our fully-prepared pre-training dataset, all source code and configurations, and training details. We are committed to continually pushing the boundaries of LLMs through this open-source effort.

Get access now at LLM360 site

Citation

BibTeX:

@misc{liu2023llm360,
      title={LLM360: Towards Fully Transparent Open-Source LLMs}, 
      author={Zhengzhong Liu and Aurick Qiao and Willie Neiswanger and Hongyi Wang and Bowen Tan and Tianhua Tao and Junbo Li and Yuqi Wang and Suqi Sun and Omkar Pangarkar and Richard Fan and Yi Gu and Victor Miller and Yonghao Zhuang and Guowei He and Haonan Li and Fajri Koto and Liping Tang and Nikhil Ranjan and Zhiqiang Shen and Xuguang Ren and Roberto Iriondo and Cun Mu and Zhiting Hu and Mark Schulze and Preslav Nakov and Tim Baldwin and Eric P. Xing},
      year={2023},
      eprint={2312.06550},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages