Chikuma_10.7B - V2 (Enhanced with DPO) [For Experiments]

Chikuma

This model is the DPO fine tuned version of Chikuma_10.7B, which was a depth upscaled merge of:

The name "Chikuma" is inspired by the Chikuma River, the longest in Japan, known for its continuous flow and meandering path. This metaphorically represents the model's depth, fluidity, and adaptability in processing and understanding language.

Dataset used for Fine Tuning

Dataset: /argilla/distilabel-intel-orca-dpo-pairs

The dataset was roughly ~3000 samples but they were high quality (according to the chosen_score).

The following filters were applied to the original dataset:

dataset = dataset.filter(
    lambda r:
        r["status"] != "tie" and
        r["chosen_score"] >= 8 and
        not r["in_gsm8k_train"]
)

Chat Template

The chat template for Chikuma_10.7B - V2 is a modified version of ChatML, optimized for improved interaction and engagement:

<|im_start|>GPT4 Correct system:
{system} Always use <|end_of_turn|> when you want to end the answer. <|im_end|>
<|im_start|>GPT4 Correct user:
{user}<|im_end|>
<|im_start|>GPT4 Correct Assistant:
{asistant}<|im_end|>

Nous Benchmark Evaluation

Model AGIEval GPT4All TruthfulQA Bigbench Average
SynthIQ-7b 42.67 73.71 56.51 44.59 54.37
openchat/openchat-3.5-0106 44.17 73.72 52.53 44.4 53.71
Chikuma_10.7B 42.41 73.41 56.69 43.5 54.00
Chikuma_10.7B_v2 42.77 73.81 58.83 44.83 55.06

OpenLLM Leaderboard

Benchmark Name Performance
ARC 66.38
HellaSwag 85
MMLU 65.27
TruthfulQA 58.83
Winogrande 78.77
GSM8K 63.68
Average 69.65

Training Environment

  • Hardware: Single A100 80GB GPU in a runpod, utilized for approximately 1.5 hours.
  • Training Script: Accessible via Google Colab Notebook. Special thanks to mlabonne for providing the template.

Usage

# Format prompt
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(new_model)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer,
    device="cuda"
)

# Generate text

message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "Who invented LLMs?"}
]

prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

sequences = pipeline(
    prompt,
    max_new_tokens=512
)
print(sequences[0]['generated_text'])

Acknowledgements

A heartfelt appreciation goes to the vibrant open-source community, particularly:

  • The Intel team for publishing a great open dataset and show how well it worked in the first place
  • Teknium and NousResearch for their awesome work and models.
  • Maxime for sharing such great resources.
  • Argilla for publishing argilla/distilabel-intel-orca-dpo-pairs

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 68.87
AI2 Reasoning Challenge (25-Shot) 66.38
HellaSwag (10-Shot) 85.14
MMLU (5-Shot) 64.70
TruthfulQA (0-shot) 59.20
Winogrande (5-shot) 79.40
GSM8k (5-shot) 58.38
Downloads last month
258
Safetensors
Model size
10.7B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for sethuiyer/Chikuma_10.7B_v2

Finetuned
(1)
this model
Quantizations
2 models

Dataset used to train sethuiyer/Chikuma_10.7B_v2

Evaluation results