Replies: 2 comments
-
Support for Gemma was just added and released in 2.9.0 15 hours ago. #1734 |
Beta Was this translation helpful? Give feedback.
0 replies
-
@ithax-wb config.yaml - name: gemma-2b-it
context_size: 2048
f16: true
gpu_layers: 90
mmap: true
trimsuffix:
- "\n"
parameters:
model: gemma-2b-it-q8_0.gguf
# temperature: 0.2
# top_k: 40
# top_p: 0.95
# seed: -1
template:
chat_message: chat
chat: chat-block
completion: completion chat.tmpl <start_of_turn>{{if eq .RoleName "assistant"}}model{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
{{if .Content}}{{.Content}}{{end}}
<end_of_turn> chat-block.tmpl <bos>{{.Input}}
<start_of_turn>model completion.tmpl {{.Input}} |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to use Google's new opensource models Gemma 2B, Gemma 2B Instruct, Gemma 7B and Gemma 7B Instruct with LocalAI. I tried to build the yaml file by myself but I just can't get it to work. Can somebody help me?
Here is some information provided by google regarding the prompting: https://github.com/huggingface/blog/blob/main/gemma.md#prompt-format
Prompt format
The base models have no prompt format. Like other base models, they can be used to continue an input sequence with a plausible continuation or for zero-shot/few-shot inference. They are also a great foundation for fine-tuning on your own use cases. The Instruct versions have a very simple conversation structure:
<start_of_turn>user
knock knock<end_of_turn>
<start_of_turn>model
who is there<end_of_turn>
<start_of_turn>user
LaMDA<end_of_turn>
<start_of_turn>model
LaMDA who?<end_of_turn>
This format has to be exactly reproduced for effective use. We’ll later show how easy it is to reproduce the instruct prompt with the chat template available in transformers:
from transformers import AutoTokenizer, pipeline
import torch
model = "google/gemma-7b-it"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = pipeline(
"text-generation",
model=model,
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda",
)
messages = [
{"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(
prompt,
max_new_tokens=256,
add_special_tokens=True,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95
)
print(outputs[0]["generated_text"][len(prompt):])
Beta Was this translation helpful? Give feedback.
All reactions