yaml file for Google's Gemma 2B & Gemma 7B #1736

ithax-wb · 2024-02-21T19:39:25Z

ithax-wb
Feb 21, 2024

I would like to use Google's new opensource models Gemma 2B, Gemma 2B Instruct, Gemma 7B and Gemma 7B Instruct with LocalAI. I tried to build the yaml file by myself but I just can't get it to work. Can somebody help me?

Here is some information provided by google regarding the prompting: https://github.com/huggingface/blog/blob/main/gemma.md#prompt-format

Prompt format
The base models have no prompt format. Like other base models, they can be used to continue an input sequence with a plausible continuation or for zero-shot/few-shot inference. They are also a great foundation for fine-tuning on your own use cases. The Instruct versions have a very simple conversation structure:

<start_of_turn>user
knock knock<end_of_turn>
<start_of_turn>model
who is there<end_of_turn>
<start_of_turn>user
LaMDA<end_of_turn>
<start_of_turn>model
LaMDA who?<end_of_turn>

This format has to be exactly reproduced for effective use. We’ll later show how easy it is to reproduce the instruct prompt with the chat template available in transformers:

from transformers import AutoTokenizer, pipeline
import torch

model = "google/gemma-7b-it"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = pipeline(
"text-generation",
model=model,
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda",
)

messages = [
{"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(
prompt,
max_new_tokens=256,
add_special_tokens=True,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95
)
print(outputs[0]["generated_text"][len(prompt):])

etlweather · 2024-02-25T01:18:36Z

etlweather
Feb 25, 2024

Support for Gemma was just added and released in 2.9.0 15 hours ago. #1734

https://github.com/mudler/LocalAI/releases/tag/v2.9.0

0 replies

snmagn · 2024-02-28T15:52:38Z

snmagn
Feb 28, 2024

@ithax-wb
I set Gemma's chat format as follows and it worked.
I hope this helps you.

config.yaml

- name: gemma-2b-it
  context_size: 2048
  f16: true
  gpu_layers: 90
  mmap: true
  trimsuffix: 
  - "\n"
  parameters:
    model: gemma-2b-it-q8_0.gguf
    # temperature: 0.2
    # top_k: 40
    # top_p: 0.95
    # seed: -1
  template:
    chat_message: chat
    chat: chat-block
    completion: completion

chat.tmpl

<start_of_turn>{{if eq .RoleName "assistant"}}model{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
{{if .Content}}{{.Content}}{{end}}
<end_of_turn>

chat-block.tmpl

<bos>{{.Input}}
<start_of_turn>model

completion.tmpl

{{.Input}}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

yaml file for Google's Gemma 2B & Gemma 7B #1736

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

yaml file for Google's Gemma 2B & Gemma 7B #1736

Uh oh!

ithax-wb Feb 21, 2024

Replies: 2 comments

Uh oh!

Uh oh!

etlweather Feb 25, 2024

Uh oh!

Uh oh!

snmagn Feb 28, 2024

ithax-wb
Feb 21, 2024

etlweather
Feb 25, 2024

snmagn
Feb 28, 2024