Skip to content

Conversation

@njbrake
Copy link
Contributor

@njbrake njbrake commented Oct 21, 2025

Feature: before_llm_call can edit the messages.

The pros:

  • This feature works on all frameworks, I feel good about the integration test I added sufficiently exercising the code so that we will know if any underlying logic changes for the situation we are testing, the test will fail and we'll know one of our assumptions is broken

The cons:

  • Because we have to inject ourselves into specific parts of the agent loop, we're not pretty tightly tied to the agent being used inside the framework. If the user decides to use a different agent model from the default_agent_model in each framework, our callbacks will probably not work for them.

I don't see a good solution for the con here: callbacks to modify message behavior wildly varies from framework to framework and is heavily implementation specific. I lean towards merging this PR and fielding any bug reports if they come in, but I want to be clear that I'm not confident that this is going to be a robust long term solution, but I also don't see any other reasonable way to do this across all the agent frameworks.

@njbrake njbrake marked this pull request as draft October 21, 2025 16:25
@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

❌ Patch coverage is 51.07914% with 204 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/any_agent/frameworks/google.py 24.69% 58 Missing and 3 partials ⚠️
src/any_agent/callbacks/wrappers/google.py 49.46% 36 Missing and 11 partials ⚠️
src/any_agent/callbacks/wrappers/smolagents.py 39.62% 30 Missing and 2 partials ⚠️
src/any_agent/callbacks/wrappers/agno.py 63.15% 13 Missing and 1 partial ⚠️
src/any_agent/callbacks/span_generation/agno.py 50.00% 7 Missing and 4 partials ⚠️
src/any_agent/callbacks/context.py 61.90% 8 Missing ⚠️
src/any_agent/frameworks/openai.py 55.55% 7 Missing and 1 partial ⚠️
src/any_agent/callbacks/wrappers/langchain.py 80.00% 6 Missing and 1 partial ⚠️
src/any_agent/callbacks/wrappers/openai.py 64.70% 5 Missing and 1 partial ⚠️
src/any_agent/callbacks/wrappers/llama_index.py 70.58% 4 Missing and 1 partial ⚠️
... and 2 more
Files with missing lines Coverage Δ
src/any_agent/callbacks/__init__.py 100.00% <100.00%> (ø)
src/any_agent/frameworks/any_agent.py 90.62% <100.00%> (+3.12%) ⬆️
src/any_agent/frameworks/langchain.py 57.62% <100.00%> (+3.91%) ⬆️
src/any_agent/frameworks/smolagents.py 71.31% <ø> (+2.45%) ⬆️
src/any_agent/frameworks/agno.py 66.90% <0.00%> (+2.15%) ⬆️
src/any_agent/callbacks/wrappers/tinyagent.py 93.25% <75.00%> (-2.30%) ⬇️
src/any_agent/callbacks/wrappers/llama_index.py 93.42% <70.58%> (-3.19%) ⬇️
src/any_agent/callbacks/wrappers/openai.py 90.27% <64.70%> (-4.27%) ⬇️
src/any_agent/callbacks/wrappers/langchain.py 85.85% <80.00%> (-0.08%) ⬇️
src/any_agent/callbacks/context.py 74.19% <61.90%> (-25.81%) ⬇️
... and 6 more

... and 46 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@daavoo
Copy link
Contributor

daavoo commented Oct 21, 2025

@daavoo before I get too deep into this PR, would you be able to give a quick look to make sure it aligns with what you were thinking too? I'm 90% sure it does, but want to check in before I go through the trouble of implementing it for all the frameworks

You're cooking, looks good to me

@njbrake
Copy link
Contributor Author

njbrake commented Oct 27, 2025

Smolagents implementation pending improvement based on any responses from the HF team on huggingface/smolagents#1834

"model": model_id,
"api_key": api_key,
"api_base": api_base,
"allow_running_loop": True, # Because smolagents uses sync api
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug in our code that was never found because we weren't testing callbacks in our integration tests

Comment on lines +91 to +95
# Only invoke callbacks on the first LLM call, not on retry attempts
# Retries can be detected by checking if there are error messages in the history
is_retry = any(
"Error:" in str(msg.content) and "Now let's retry" in str(msg.content)
for msg in messages
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The smolagents implementation is certainly quirky. They append error messages as tool responses and have retry hooks a little differently from the other libraries. Putting this here which Claude helped me generate (with a bunch of guidance):

The Complete Flow with Message Modification in Smolagents librar

First Call (Step 1):

  1. _run_stream() line 576: calls _step_stream()
  2. _step_stream() line 1259: calls write_memory_to_messages()
  3. write_memory_to_messages() line 764: calls TaskStep.to_messages()
  4. TaskStep.to_messages() line 192: returns "New task:\nSay goodbye"
  5. Wrapper modifies: Changes last message to "Say hello and goodbye" ✓
  6. _step_stream() line 1284: calls model.generate() with modified messages
  7. LLM tries to call final_answer twice → raises AgentExecutionError
  8. _run_stream() line 597: stores error in action_step.error
  9. _run_stream() line 600: appends action_step (with error) to memory

Second Call (Step 2 - Retry):

  1. _run_stream() line 576: calls _step_stream() AGAIN
  2. _step_stream() line 1259: calls write_memory_to_messages() AGAIN
  3. write_memory_to_messages() rebuilds from memory:
    - TaskStep.to_messages() → "New task:\nSay goodbye" (original task!)
    - ActionStep.to_messages() → Error message: "Error:\n...Now let's retry..."
  4. Messages are rebuilt from memory → modification is LOST ✗
  5. Wrapper modifies the last message (which is now the error message, not the task!)
  6. LLM receives original unmodified task + error message

Why This Happens

The design assumes messages are stateless and reproducible from memory structures. Smolagents never expects the messages parameter to be modified at runtime. The architecture is:

Memory Structures (TaskStep, ActionStep) → to_messages() → Fresh Message List

This happens before every LLM call, so any modifications to the message list itself are discarded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of this, we don't want to redo the callbacks each retry, we want to only make that callback interference on the entry, not the retries (since that would be messing with the internal smolagents logic).

@njbrake njbrake marked this pull request as ready for review October 27, 2025 19:21
@njbrake
Copy link
Contributor Author

njbrake commented Oct 27, 2025

@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

This PR is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 3 days.

@github-actions github-actions bot added the Stale label Nov 4, 2025
@njbrake njbrake removed the Stale label Nov 4, 2025
@njbrake
Copy link
Contributor Author

njbrake commented Nov 4, 2025

@github-actions
Copy link
Contributor

This PR is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 3 days.

@github-actions github-actions bot added Stale and removed Stale labels Nov 12, 2025
@github-actions
Copy link
Contributor

This PR is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 3 days.

@github-actions github-actions bot added the Stale label Nov 23, 2025
@github-actions
Copy link
Contributor

This PR was closed because it has been stalled for 3 days with no activity.

@github-actions github-actions bot closed this Nov 28, 2025
@github-actions github-actions bot removed the Stale label Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow to access and modify the actual inputs before the underlying LLM call

4 participants