Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Sequential tool calls unreliable with LiteLLM ollama_chat #938

Open
strtgbb opened this issue Mar 11, 2025 · 6 comments
Open

[BUG] Sequential tool calls unreliable with LiteLLM ollama_chat #938

strtgbb opened this issue Mar 11, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@strtgbb
Copy link

strtgbb commented Mar 11, 2025

Describe the bug
I'm trying to create an example of using ToolCallingAgent to solve a puzzle that requires multiple function calls to solve.

I haven't been able to determine exactly why it does not work, here are my observations:

After the first or second tool call, I start getting the following error for most of the remaining tool calls

Error in generating tool call with model:
Model did not call any tools. Call `final_answer` tool to return a final answer.

Looking into the logs, I see entries like

'model_output_message': ChatMessage(role='assistant',
  content="Calling tools:\n[{'id': '...', 'type': 'function', 'function': {'name': 'get_state', 'arguments': {}}}]",

The only thing that looks wrong about this, to me, is the prepending of 'Calling tools:\n'.

The first thing I tried was a custom tool_parser, as I noticed that parse_json_tool_call doesn't handle this prefix.
However, the custom tool parser appears to never get called.

Since the model seems to be learning this format from the chat history, I also tried removing the prefix in ActionStep.to_messages,
this fixed the extra prefix, but the tool call is still being returned as a string under ChatMessage(content=.

I got the same error regardless of whether the function accepts arguments. I am trying tools without args to narrow down the issue. I noticed that the system prompt warns against calling a function multiple times with the same argument, this is not relevant for my use case, so I tried removing that part, but the above issues remain.

I also tried CodeAgent, but it will only succeed if I set planning_interval, otherwise the LLM writes an invalid solution and lies about succeeding.
However, if I set planning_interval, then ToolCallingAgent is also able to make a correct plan, it just fails to execute it.

Code to reproduce the error
I've seen the exact same issue with multiple models including qwen2.5:7b/14b, qwen2.5-coder:7b/14b, and mistral-nemo:12b

model = LiteLLMModel(
    model_id=f"ollama_chat/qwen2.5:7b",
    api_base="http://localhost:11434",
    num_ctx=32768,
)
def better_tool_parser(text: str):
    text = text.removeprefix("Calling tools:\n")
    print(text)
    return parse_json_tool_call(text)
agent = ToolCallingAgent(
    tools=[...],
    model=model,
    tool_parser=better_tool_parser,
)
agent.run(prompt)

Error logs (if any)
Provide error logs if there are any.

Expected behavior

  • The failed tool call message should be more detailed
  • A model that can call a tool successfully once, should be able to keep calling tools*
  • Changing the formatting of the tool call history shouldn't affect how the model calls tools*
  • tool_parser should warn that it doesn't do anything

*I understand these two might fall under "LLM is dumb", but I don't have enough information to determine if the issue is with the model, smolagents, or LiteLLM.

Packages version:

smolagents==1.10.0
litellm==1.63.3

Additional context
I have only tested this with LiteLLM ollama_chat, I don't know if the issue exists with other providers.

@strtgbb strtgbb added the bug Something isn't working label Mar 11, 2025
@strtgbb
Copy link
Author

strtgbb commented Mar 12, 2025

Minimal reproduction:

import pprint
from smolagents import tool, ToolCallingAgent, LiteLLMModel

model_id = "qwen2.5:7b"
model = LiteLLMModel(
    model_id=f"ollama_chat/{model_id}",
    api_base="http://localhost:11434",
)

counter = 0

@tool
def count() -> str:
    """Increment the counter by one and return the new value."""
    global counter
    counter += 1
    return str(counter)

agent = ToolCallingAgent(
    tools=[count],
    model=model,
)

try:
    agent.run("Using the given tool, count to three.", max_steps=5)
finally:
    with open("agent_succinct.log", "w") as f:
        f.write(
            pprint.pformat(
                agent.memory.get_succinct_steps(),
                width=200,
            )
        )

Large models will try to call count() three times at once, which isn't supported by this setup. But regardless of the model chosen, follow-up tool calls will eventually start using the Calling tools:\n prefix, which isn't treated as a tool call.

@sysradium
Copy link
Contributor

Have a look at #962.
I tested that with TransformerLLM though, however it was failing in the same way without a fix in my PR.

@strtgbb
Copy link
Author

strtgbb commented Mar 12, 2025

Thanks for looking into it. I've tested your PR and I see no difference.
My issue is unrelated to whether or not a function takes arguments.

You can swap this tool into my above example

@tool
def count(value: int) -> int:
    """
    Return the given value incremented by one.
    
    Args:
      value (int): The current value of the counter.

    Returns:
      int: The new value of the counter.
    """

    return value + 1

The model uses the tool correctly the first time, but still fails the second time and never recovers

╭──────────────────────────────────── New run ────────────────────────────────────╮
│                                                                                 │
│ Using the given tool, count to three.                                           │
│                                                                                 │
╰─ LiteLLMModel - ollama_chat/qwen2.5:7b ─────────────────────────────────────────╯
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╭─────────────────────────────────────────────────────────────────────────────────╮
│ Calling tool: 'count' with arguments: {'value': 0}                              │
╰─────────────────────────────────────────────────────────────────────────────────╯
Observations: 1
[Step 1: Duration 3.84 seconds| Input tokens: 1,095 | Output tokens: 19]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Error in generating tool call with model:
Model did not call any tools. Call `final_answer` tool to return a final answer.
[Step 2: Duration 1.78 seconds| Input tokens: 2,307 | Output tokens: 87]

...

The logs for step 2:

 {'action_output': None,
  'duration': 1.7822458744049072,
  'end_time': 1741821158.4100177,
  'error': {'message': 'Error in generating tool call with model:\nModel did not call any tools. Call `final_answer` tool to return a final answer.', 'type': 'AgentGenerationError'},
  'model_output': None,
  'model_output_message': ChatMessage(role='assistant',
                                      content="Calling tools:\n[{'id': '5a2d0f28-dc6a-4234-a274-ba925a7bddd1', 'type': 'function', 'function': {'name': 'count', 'arguments': {'value': 1}}}]\n\n",
                                      tool_calls=None,
                                      raw=ModelResponse(id='chatcmpl-912b93ed-fdf4-4860-b0d4-213c9f1fad9a', created=1741821158, model='ollama_chat/qwen2.5:7b', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content="Calling tools:\n[{'id': '5a2d0f28-dc6a-4234-a274-ba925a7bddd1', 'type': 'function', 'function': {'name': 'count', 'arguments': {'value': 1}}}]\n\n", role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=68, prompt_tokens=1212, total_tokens=1280, completion_tokens_details=None, prompt_tokens_details=None))),
  'observations': None,
  'start_time': 1741821156.6277719,
  'step': 2,
  'tool_calls': []}

@sysradium
Copy link
Contributor

Interesting. That maybe indeed that is ollama specific, so ran:

import pprint

from smolagents import (
    LiteLLMModel,
    ToolCallingAgent,
    tool,
)


model = LiteLLMModel(
    model_id="ollama_chat/llama3.2:latest",
    api_base="http://localhost:11434",
    num_ctx=32768,
)

counter = 0


@tool
def count() -> str:
    """Increment the counter by one and return the new value."""
    global counter
    counter += 1
    return str(counter)


agent = ToolCallingAgent(
    tools=[count],
    model=model,
)

try:
    agent.run("Using the given tool, count to three.", max_steps=5)
finally:
    with open("agent_succinct.log", "w") as f:
        f.write(
            pprint.pformat(
                agent.memory.get_succinct_steps(),
                width=200,
            )
        )

The result is:

Image

@sysradium
Copy link
Contributor

sysradium commented Mar 13, 2025

It works worse with an argument, i.e. when:

@tool
def count(value: int) -> int:
    # ...

Specifically it was having a hard time and kept passing a string instead of a number. But eventually got there.

There is a claim that all works better with CodeAgent and a model that is good at producing code. For example

Image

@strtgbb
Copy link
Author

strtgbb commented Mar 13, 2025

I see the example working with llama3.2:3b on my end.
For my use case of trying to prompt LLMs to solve puzzles. 3B models aren't smart enough, and CodeAgent only finds a solution with planning steps enabled.
It seems strange that both ToolCallingAgent and CodeAgent can make a good plan, but only one can execute.

I tried tool calling using ollama-python for comparison, it has no issue with sequential tool calls out of the box. I formatted the tools calls in the message history as

 {'content': 1, 'name': 'count', 'role': 'tool'}

I did accidentally find a way to prompt the model through ollama-python that caused it to write tool calls with the following format, which did not get recognized as a valid tool call.

Let's try ...

<tool_call>
{"name": "my_tool", "arguments": {"arg_1": "1}}
</tool_call>

It must be something this library is doing that confuses the agent.
The content="Calling tools:\n... line in the log suggests to me that the issue is with how the chat history is read by the agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants