Description
We are building a GPTScript integration in one of our products which uses OSS LLMs.
We access these models via an API compatible with OpenAI API. We primarily use ollama for inference.
We've noticed that GPTscript sets the (first) chat completion message role to system
instead of user
, thus making it a system prompt.
This has a bad effect on some of the OSS models which are fine-tuned for tool calling, apparently. Specifically, we noticed that with llama3.1:8b-instruct-q8_0
model.
It's worth mentioning that GPTScript works as expected without any modifications with OpenAI; we've noticed the problems described in this issue only when interacting with the OSS LLMs.
For some reason setting the completion message to system
instead of user
prevents the following gptscript from running correctly (i.e. returning expected results).
Here's the script we noticed the strange behaviour on:
Click me!
model: llama3.1:8b-instruct-q8_0 from http://localhost:8080/v1
tools: sys.exec, sys.read, sys.write, sys.ls
description: Find out the current time
Find out the current time and return it. You must only use the defined tools. Do not guess answers.
When we run the above script with gptscript version v0.9.5
like so (NOTE: we disable the streaming and cache)
GPTSCRIPT_PROVIDER_LOCALHOST_API_KEY=XXXX GPTSCRIPT_INTERNAL_OPENAI_STREAMING="false" gptscript --debug-messages --disable-cache hlx.gpt
We get the following output:
14:32:26 started [main]
14:32:27 sent [main]
content [1] content | Waiting for model response...
14:32:27 ended [main]
OUTPUT:
Here is the payload we observed being sent by GPTscript to ollama
Click me!
{
"messages": [
{
"content": "Find out current time and return it. Use only available tools, Do not make up answers.",
"role": "system"
}
],
"model": "llama3.1:8b-instruct-q8_0",
"stream": true,
"tools": [
{
"function": {
"description": "Execute a command and get the output of the command",
"name": "exec",
"parameters": {
"properties": {
"command": {
"description": "The command to run including all applicable arguments",
"type": "string"
},
"directory": {
"description": "The directory to use as the current working directory of the command. The current directory \".\" will be used if no argument is passed",
"type": "string"
}
},
"type": "object"
}
},
"type": "function"
},
{
"function": {
"description": "Reads the contents of a file. Can only read plain text files, not binary files",
"name": "read",
"parameters": {
"properties": {
"filename": {
"description": "The name of the file to read",
"type": "string"
}
},
"type": "object"
}
},
"type": "function"
},
{
"function": {
"description": "Write the contents to a file",
"name": "write",
"parameters": {
"properties": {
"content": {
"description": "The content to write",
"type": "string"
},
"filename": {
"description": "The name of the file to write to",
"type": "string"
}
},
"type": "object"
}
},
"type": "function"
},
{
"function": {
"description": "Lists the contents of a directory",
"name": "ls",
"parameters": {
"properties": {
"dir": {
"description": "The directory to list",
"type": "string"
}
},
"type": "object"
}
},
"type": "function"
}
]
}
And here's the response back
Click me!
{
"id": "chatcmpl-156",
"object": "chat.completion",
"created": 1733157640,
"model": "llama3.1:8b-instruct-q8_0",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": ""
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 69,
"completion_tokens": 1,
"total_tokens": 70,
"completion_tokens_details": null
},
"system_fingerprint": "fp_ollama"
}
You can see we get back an empty response i.e. no tool is being called. This happens regardless of the number of times gptscript has been run.
After a long debugging session we noticed that if we changed the role
from system
(thus no longer sending a system prompt) to user
llama
suddenly starts responding with the response that leads to GPTScript working as expected. Here's the response we get back - the only change in the request payload is the type of the prompt i.e. message role.
Click me!
{
"id": "chatcmpl-168",
"object": "chat.completion",
"created": 1733157657,
"model": "llama3.1:8b-instruct-q8_0",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"index": 0,
"id": "call_6zerrtfx",
"type": "function",
"function": {
"name": "exec",
"arguments": "{\"command\":\"date\",\"directory\":\".\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 395,
"completion_tokens": 21,
"total_tokens": 416,
"completion_tokens_details": null
},
"system_fingerprint": "fp_ollama"
}
Which brings us to the question asked in this issue. Is there any particular reason gptscript
sets the prompt to system
prompt? Changing it to user
prompt seems to make things work - at least with the llama
family of models, apparently.
Would you be open to making the type of prompt configurable/overridable, say, via env?