You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I"m trying the same SLM model, e.g. Qwen/Qwen3-0.6B, from llama-cli, the result is great; but when I use the same GGUF, the same prompt and temperature of 0, the response from iOS is bad, sometime the answers are long. For example, this is my llama-cli query:
llama-cli -m models-custom/Qwen3-0.6B-Q4_K_M.gguf \--temp 0 \--repeat_penalty 1.1 \-p "You are a helpful assistant with access to the following tools:tools:- name: get_current_weather description: Get the current weather in a given location. parameters: location: (string) The city and state, e.g., San Francisco, CA. unit: (string, optional) The unit of temperature, either \"celsius\" or \"fahrenheit\".You can use these tools to answer user questions. When you need to use a tool, respond with a JSON object in the following format:{ \"tool_name\": \"name of the tool\", \"tool_args\": { \"argument1\": \"value1\", \"argument2\": \"value2\" }}Do not use any other format. If you can answer without a tool, respond directly to the user.User: what is the weather in Seattle? Answer with unit of fahrenheitAssistant: "
The result is great from llama-cli
<think>
Okay, the user is asking for the weather in Seattle and wants the unit as Fahrenheit. I need to use the get_current_weather tool. The location parameter should be "Seattle" and the unit set to "fahrenheit". Let me make sure I format the JSON correctly with those arguments.
</think>
{
"tool_name": "get_current_weather",
"tool_args": {
"location": "Seattle",
"unit": "fahrenheit"
}
}
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
When I"m trying the same SLM model, e.g. Qwen/Qwen3-0.6B, from llama-cli, the result is great; but when I use the same GGUF, the same prompt and temperature of 0, the response from iOS is bad, sometime the answers are long. For example, this is my llama-cli query:
The result is great from llama-cli
Beta Was this translation helpful? Give feedback.
All reactions