[BUG] Sequential tool calls unreliable with LiteLLM ollama_chat #938

strtgbb · 2025-03-11T02:17:28Z

Describe the bug
I'm trying to create an example of using ToolCallingAgent to solve a puzzle that requires multiple function calls to solve.

I haven't been able to determine exactly why it does not work, here are my observations:

After the first or second tool call, I start getting the following error for most of the remaining tool calls

Error in generating tool call with model:
Model did not call any tools. Call `final_answer` tool to return a final answer.

Looking into the logs, I see entries like

'model_output_message': ChatMessage(role='assistant',
  content="Calling tools:\n[{'id': '...', 'type': 'function', 'function': {'name': 'get_state', 'arguments': {}}}]",

The only thing that looks wrong about this, to me, is the prepending of 'Calling tools:\n'.

The first thing I tried was a custom tool_parser, as I noticed that parse_json_tool_call doesn't handle this prefix.
However, the custom tool parser appears to never get called.

Since the model seems to be learning this format from the chat history, I also tried removing the prefix in ActionStep.to_messages,
this fixed the extra prefix, but the tool call is still being returned as a string under ChatMessage(content=.

I got the same error regardless of whether the function accepts arguments. I am trying tools without args to narrow down the issue. I noticed that the system prompt warns against calling a function multiple times with the same argument, this is not relevant for my use case, so I tried removing that part, but the above issues remain.

I also tried CodeAgent, but it will only succeed if I set planning_interval, otherwise the LLM writes an invalid solution and lies about succeeding.
However, if I set planning_interval, then ToolCallingAgent is also able to make a correct plan, it just fails to execute it.

Code to reproduce the error
I've seen the exact same issue with multiple models including qwen2.5:7b/14b, qwen2.5-coder:7b/14b, and mistral-nemo:12b

model = LiteLLMModel(
    model_id=f"ollama_chat/qwen2.5:7b",
    api_base="http://localhost:11434",
    num_ctx=32768,
)
def better_tool_parser(text: str):
    text = text.removeprefix("Calling tools:\n")
    print(text)
    return parse_json_tool_call(text)
agent = ToolCallingAgent(
    tools=[...],
    model=model,
    tool_parser=better_tool_parser,
)
agent.run(prompt)

Error logs (if any)
Provide error logs if there are any.

Expected behavior

The failed tool call message should be more detailed
A model that can call a tool successfully once, should be able to keep calling tools*
Changing the formatting of the tool call history shouldn't affect how the model calls tools*
tool_parser should warn that it doesn't do anything

*I understand these two might fall under "LLM is dumb", but I don't have enough information to determine if the issue is with the model, smolagents, or LiteLLM.

Packages version:

smolagents==1.10.0
litellm==1.63.3

Additional context
I have only tested this with LiteLLM ollama_chat, I don't know if the issue exists with other providers.

The text was updated successfully, but these errors were encountered:

strtgbb · 2025-03-12T20:35:54Z

Minimal reproduction:

import pprint
from smolagents import tool, ToolCallingAgent, LiteLLMModel

model_id = "qwen2.5:7b"
model = LiteLLMModel(
    model_id=f"ollama_chat/{model_id}",
    api_base="http://localhost:11434",
)

counter = 0

@tool
def count() -> str:
    """Increment the counter by one and return the new value."""
    global counter
    counter += 1
    return str(counter)

agent = ToolCallingAgent(
    tools=[count],
    model=model,
)

try:
    agent.run("Using the given tool, count to three.", max_steps=5)
finally:
    with open("agent_succinct.log", "w") as f:
        f.write(
            pprint.pformat(
                agent.memory.get_succinct_steps(),
                width=200,
            )
        )

Large models will try to call count() three times at once, which isn't supported by this setup. But regardless of the model chosen, follow-up tool calls will eventually start using the Calling tools:\n prefix, which isn't treated as a tool call.

sysradium · 2025-03-12T22:04:05Z

Have a look at #962.
I tested that with TransformerLLM though, however it was failing in the same way without a fix in my PR.

strtgbb · 2025-03-12T23:18:43Z

Thanks for looking into it. I've tested your PR and I see no difference.
My issue is unrelated to whether or not a function takes arguments.

You can swap this tool into my above example

@tool
def count(value: int) -> int:
    """
    Return the given value incremented by one.
    
    Args:
      value (int): The current value of the counter.

    Returns:
      int: The new value of the counter.
    """

    return value + 1

The model uses the tool correctly the first time, but still fails the second time and never recovers

╭──────────────────────────────────── New run ────────────────────────────────────╮
│                                                                                 │
│ Using the given tool, count to three.                                           │
│                                                                                 │
╰─ LiteLLMModel - ollama_chat/qwen2.5:7b ─────────────────────────────────────────╯
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╭─────────────────────────────────────────────────────────────────────────────────╮
│ Calling tool: 'count' with arguments: {'value': 0}                              │
╰─────────────────────────────────────────────────────────────────────────────────╯
Observations: 1
[Step 1: Duration 3.84 seconds| Input tokens: 1,095 | Output tokens: 19]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Error in generating tool call with model:
Model did not call any tools. Call `final_answer` tool to return a final answer.
[Step 2: Duration 1.78 seconds| Input tokens: 2,307 | Output tokens: 87]

...

The logs for step 2:

 {'action_output': None,
  'duration': 1.7822458744049072,
  'end_time': 1741821158.4100177,
  'error': {'message': 'Error in generating tool call with model:\nModel did not call any tools. Call `final_answer` tool to return a final answer.', 'type': 'AgentGenerationError'},
  'model_output': None,
  'model_output_message': ChatMessage(role='assistant',
                                      content="Calling tools:\n[{'id': '5a2d0f28-dc6a-4234-a274-ba925a7bddd1', 'type': 'function', 'function': {'name': 'count', 'arguments': {'value': 1}}}]\n\n",
                                      tool_calls=None,
                                      raw=ModelResponse(id='chatcmpl-912b93ed-fdf4-4860-b0d4-213c9f1fad9a', created=1741821158, model='ollama_chat/qwen2.5:7b', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content="Calling tools:\n[{'id': '5a2d0f28-dc6a-4234-a274-ba925a7bddd1', 'type': 'function', 'function': {'name': 'count', 'arguments': {'value': 1}}}]\n\n", role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=68, prompt_tokens=1212, total_tokens=1280, completion_tokens_details=None, prompt_tokens_details=None))),
  'observations': None,
  'start_time': 1741821156.6277719,
  'step': 2,
  'tool_calls': []}

sysradium · 2025-03-12T23:57:06Z

Interesting. That maybe indeed that is ollama specific, so ran:

import pprint

from smolagents import (
    LiteLLMModel,
    ToolCallingAgent,
    tool,
)


model = LiteLLMModel(
    model_id="ollama_chat/llama3.2:latest",
    api_base="http://localhost:11434",
    num_ctx=32768,
)

counter = 0


@tool
def count() -> str:
    """Increment the counter by one and return the new value."""
    global counter
    counter += 1
    return str(counter)


agent = ToolCallingAgent(
    tools=[count],
    model=model,
)

try:
    agent.run("Using the given tool, count to three.", max_steps=5)
finally:
    with open("agent_succinct.log", "w") as f:
        f.write(
            pprint.pformat(
                agent.memory.get_succinct_steps(),
                width=200,
            )
        )

The result is:

sysradium · 2025-03-13T00:04:29Z

It works worse with an argument, i.e. when:

@tool
def count(value: int) -> int:
    # ...

Specifically it was having a hard time and kept passing a string instead of a number. But eventually got there.

There is a claim that all works better with CodeAgent and a model that is good at producing code. For example

strtgbb · 2025-03-13T00:56:38Z

I see the example working with llama3.2:3b on my end.
For my use case of trying to prompt LLMs to solve puzzles. 3B models aren't smart enough, and CodeAgent only finds a solution with planning steps enabled.
It seems strange that both ToolCallingAgent and CodeAgent can make a good plan, but only one can execute.

I tried tool calling using ollama-python for comparison, it has no issue with sequential tool calls out of the box. I formatted the tools calls in the message history as

 {'content': 1, 'name': 'count', 'role': 'tool'}

I did accidentally find a way to prompt the model through ollama-python that caused it to write tool calls with the following format, which did not get recognized as a valid tool call.

Let's try ...

<tool_call>
{"name": "my_tool", "arguments": {"arg_1": "1}}
</tool_call>

It must be something this library is doing that confuses the agent.
The content="Calling tools:\n... line in the log suggests to me that the issue is with how the chat history is read by the agent.

strtgbb added the bug Something isn't working label Mar 11, 2025

sysradium mentioned this issue Mar 12, 2025

Fix ToolCalling agent argument/name extraction #962

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Sequential tool calls unreliable with LiteLLM ollama_chat #938

[BUG] Sequential tool calls unreliable with LiteLLM ollama_chat #938

strtgbb commented Mar 11, 2025

strtgbb commented Mar 12, 2025

sysradium commented Mar 12, 2025

strtgbb commented Mar 12, 2025

sysradium commented Mar 12, 2025

sysradium commented Mar 13, 2025 •

edited

Loading

strtgbb commented Mar 13, 2025

[BUG] Sequential tool calls unreliable with LiteLLM ollama_chat #938

[BUG] Sequential tool calls unreliable with LiteLLM ollama_chat #938

Comments

strtgbb commented Mar 11, 2025

strtgbb commented Mar 12, 2025

sysradium commented Mar 12, 2025

strtgbb commented Mar 12, 2025

sysradium commented Mar 12, 2025

sysradium commented Mar 13, 2025 • edited Loading

strtgbb commented Mar 13, 2025

sysradium commented Mar 13, 2025 •

edited

Loading