Skip to content

feat(agents): Add on_llm_start and on_llm_end Lifecycle Hooks #987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

uzair330
Copy link

@uzair330 uzair330 commented Jul 1, 2025

Motivation

Currently, the AgentHooks provide valuable lifecycle events for the start/end of an agent run and for tool execution (on_tool_start/on_tool_end). However, developers lack the ability to observe the agent's execution at the language model level.

This PR introduces two new hooks, on_llm_start and on_llm_end, to provide this deeper level of observability. This change enables several key use cases:

  • Performance Monitoring: Precisely measure the latency of LLM calls.
  • Debugging & Logging: Log the exact prompts sent to and raw responses received from the model.
  • Implementing Custom Logic: Trigger actions (e.g., updating a UI, saving state) immediately before or after the agent "thinks."

Summary of Changes

  • src/agents/lifecycle.py
    Added two new async methods, on_llm_start and on_llm_end, to the AgentHooks base class, matching the existing on_*_start/on_*_end pattern.

  • src/agents/run.py
    Wrapped the call to model.get_response(...) in _get_new_response with invocations of the new hooks so that they fire immediately before and after each LLM call.

  • tests/test_agent_llm_hooks.py
    Added unit tests (using a mock model and spy hooks) to validate:

    1. The correct sequence of on_start → on_llm_start → on_llm_end → on_end in a chat‑only run.
    2. The correct wrapping of tool execution in a tool‑using run:
      on_start → on_llm_start → on_llm_end → on_tool_start → on_tool_end → on_llm_start → on_llm_end → on_end.
    3. That the agent still runs without error when agent.hooks is None.

Usage Examples

1. Async Example (awaitable via run)

import asyncio
from typing import Any, Optional

from dotenv import load_dotenv

from agents.agent import Agent
from agents.items import ModelResponse, TResponseInputItem
from agents.lifecycle import AgentHooks, RunContextWrapper
from agents.run import Runner

# Load any OPENAI_API_KEY or other env vars
load_dotenv()


# --- 1. Define a custom hooks class to track LLM calls ---
class LLMTrackerHooks(AgentHooks[Any]):
    async def on_llm_start(
        self,
        context: RunContextWrapper,
        agent: Agent,
        system_prompt: Optional[str],
        input_items: list[TResponseInputItem],
    ) -> None:
        print(
            f">>> [HOOK] Agent '{agent.name}' is calling the LLM with system prompt: {system_prompt or '[none]'}"
        )

    async def on_llm_end(
        self,
        context: RunContextWrapper,
        agent: Agent,
        response: ModelResponse,
    ) -> None:
        if response.usage:
            print(f">>> [HOOK] LLM call finished. Tokens used: {response.usage.total_tokens}")


# --- 2. Create your agent with these hooks ---
my_agent = Agent(
    name="MyMonitoredAgent",
    instructions="Tell me a joke.",
    hooks=LLMTrackerHooks(),
)


# --- 3. Drive it via an async main() ---
async def main():
    result = await Runner.run(my_agent, "Tell me a joke.")
    print(f"\nAgent output:\n{result.final_output}")


if __name__ == "__main__":
    asyncio.run(main())

2. Sync Example (blocking via run_sync)

from typing import Any, Optional

from dotenv import load_dotenv

from agents.agent import Agent
from agents.items import ModelResponse, TResponseInputItem
from agents.lifecycle import AgentHooks, RunContextWrapper
from agents.run import Runner

# Load any OPENAI_API_KEY or other env vars
load_dotenv()


# --- 1. Define a custom hooks class to track LLM calls ---
class LLMTrackerHooks(AgentHooks[Any]):
    async def on_llm_start(
        self,
        context: RunContextWrapper,
        agent: Agent,
        system_prompt: Optional[str],
        input_items: list[TResponseInputItem],
    ) -> None:
        print(
            f">>> [HOOK] Agent '{agent.name}' is calling the LLM with system prompt: {system_prompt or '[none]'}"
        )

    async def on_llm_end(
        self,
        context: RunContextWrapper,
        agent: Agent,
        response: ModelResponse,
    ) -> None:
        if response.usage:
            print(f">>> [HOOK] LLM call finished. Tokens used: {response.usage.total_tokens}")


# --- 2. Create your agent with these hooks ---
my_agent = Agent(
    name="MyMonitoredAgent",
    instructions="Tell me a joke.",
    hooks=LLMTrackerHooks(),
)


# --- 3. Drive it via an async main() ---
def main():
    result = Runner.run_sync(my_agent, "Tell me a joke.")
    print(f"\nAgent output:\n{result.final_output}")


if __name__ == "__main__":
    main()

Note

Streaming support for on_llm_start and on_llm_end is not yet implemented. These hooks currently fire only on non‑streamed (batch) LLM calls. Support for streaming invocations will be added in a future release.

Checklist

  • My code follows the style guidelines of this project (checked with ruff).
  • I have added tests that prove my feature works.
  • All new and existing tests passed locally with my changes.

@seratch seratch added enhancement New feature or request feature:core labels Jul 8, 2025
@seratch seratch requested review from seratch and rm-openai July 10, 2025 07:45
@seratch
Copy link
Member

seratch commented Jul 10, 2025

When execute_tools_and_side_effects takes long, indeed these hooks would be helpful to understand the exact time spent by LLM. One quick thing I noticed is that probably this implementation doe not yet support streaming patterns.

@rm-openai do you think having these two hooks is good to go?

@uzair330
Copy link
Author

uzair330 commented Jul 12, 2025

Thanks for the feedback—here’s a quick run‑through:

  1. Why these hooks matter for timing
    By firing on_llm_start immediately before model.get_response(...) and on_llm_end the moment that call returns, we capture a clean window around the pure LLM round‑trip. Anything that happens between those two hooks (tool calls, side‑effects, post‑processing) is clearly outside that window, so long‑running tools no longer obscure where time is spent. You can record timestamps in the hooks (or push spans, metrics, logs, etc.) to drill into LLM latency vs. other work.

  2. Streaming support
    @seratch You’re correct that right now we only wrap the batch (get_response) path. The streaming code (run_streamedmodel.stream_response) will need a similar hook invocation before the stream starts and once on the final ResponseCompletedEvent. I’ll follow up shortly with a small PR to add those calls.

@uzair330
Copy link
Author


New: Streaming Support for on_llm_start and on_llm_end Hooks

This update adds full support for the on_llm_start and on_llm_end hooks in streaming mode, ensuring feature parity with non-streaming execution.


1. Implementation (src/agents/run.py)

  • Hook calls are now moved inside _run_single_turn_streamed to ensure they receive the correct turn-specific data.
  • on_llm_start is triggered just before the model.stream_response call, with the resolved system_prompt.
  • on_llm_end is triggered after the ResponseCompletedEvent and once the final ModelResponse is assembled.

2. Unit Tests (tests/test_agent_llm_hooks.py)

  • Added a new test: test_streamed_hooks_for_chat_scenario, which verifies that the hooks fire in the correct sequence using Runner.run_streamed.
  • Confirms streaming and non-streaming execution paths are now consistent.

✅ Streaming Example

The hooks now work seamlessly with Runner.run_streamed, enabling real-time insight into LLM usage:

import asyncio
from typing import Any

from dotenv import load_dotenv

from agents.agent import Agent
from agents.items import ItemHelpers, ModelResponse
from agents.lifecycle import AgentHooks, RunContextWrapper
from agents.run import Runner

load_dotenv()


class LLMTrackerHooks(AgentHooks[Any]):
    async def on_llm_start(
        self,
        context: RunContextWrapper,
        agent: Agent,
        system_prompt: str,
        input_items: Any,
    ) -> None:
        print(f"[HOOK] on_llm_start: LLM is starting, system_prompt={system_prompt}")

    async def on_llm_end(
        self,
        context: RunContextWrapper,
        agent: Agent,
        response: ModelResponse,
    ) -> None:
        total = response.usage.total_tokens if response.usage else None
        print(f"[HOOK] on_llm_end: LLM returned, total_tokens={total}")


async def main():
    agent = Agent(
        name="MyStreamingAgent",
        instructions="You are a helpful assistant that can answer questions and perform tasks.",
        hooks=LLMTrackerHooks(),
          )

    stream = Runner.run_streamed(agent, input="Create a Python script that prints 'Hello, World!'")
    async for event in stream.stream_events():
        if event.type == "raw_response_event":
            continue
        if event.type == "agent_updated_stream_event":
            print(f"[EVENT] agent_updated → {event.new_agent.name}")
        elif event.type == "run_item_stream_event":
            item = event.item
            if item.type == "tool_call_item":
                print("[EVENT] tool_call_item")
            elif item.type == "tool_call_output_item":
                print(f"[EVENT] tool_call_output_item → {item.output}")
            elif item.type == "message_output_item":
                text = ItemHelpers.text_message_output(item)
                print(f"[EVENT] message_output_item → {text}")

    print(f"\n[RUN COMPLETE] final_output → {stream.final_output}")


if __name__ == "__main__":
    asyncio.run(main())

@uzair330
Copy link
Author

Thanks again for the valuable feedback, @seratch.
Just a gentle ping for you and @rm-openai to let you know that I've pushed the update to add full streaming support for the LLM hooks, and all CI checks are now passing.
Ready for another look when you have a moment. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature:core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants