Skip to content

Python: Improve orchestration exception handling#12716

Merged
TaoChenOSU merged 6 commits into
mainfrom
taochen/orchestration-exception-handling
Jul 15, 2025
Merged

Python: Improve orchestration exception handling#12716
TaoChenOSU merged 6 commits into
mainfrom
taochen/orchestration-exception-handling

Conversation

@TaoChenOSU

@TaoChenOSU TaoChenOSU commented Jul 15, 2025

Copy link
Copy Markdown
Contributor

Motivation and Context

Currently, exceptions that occur inside the orchestration actors are not properly surfaced to the caller as it happens.

Fixes: #12719

Description

This PR addresses the issue by introducing a new exception_callback that is hidden from the user but will raise exceptions that occurs inside the orchestration actors to the caller.

Contribution Checklist

@TaoChenOSU TaoChenOSU self-assigned this Jul 15, 2025
@TaoChenOSU TaoChenOSU added python Pull requests for the Python Semantic Kernel agents labels Jul 15, 2025
@TaoChenOSU TaoChenOSU requested a review from a team as a code owner July 15, 2025 04:45
@github-actions github-actions Bot changed the title Improve orchestration exception handling Python: Improve orchestration exception handling Jul 15, 2025
@moonbox3

moonbox3 commented Jul 15, 2025

Copy link
Copy Markdown
Collaborator

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
agents/orchestration
   agent_actor_base.py961584%51, 84–91, 93, 139, 156, 193, 219, 236
   concurrent.py720100% 
   group_chat.py180995%111–113, 115, 286–288, 290, 356
   handoffs.py1871393%256–258, 260, 272, 287, 298–299, 304, 316–319
   magentic.py2945680%122–124, 256, 379, 447, 450, 457, 461, 463, 521–522, 524–525, 531, 533–534, 538, 566, 570, 580–582, 585, 590–595, 617, 629–630, 632, 637, 641–642, 644, 646–647, 652, 662–663, 666, 670–671, 677–678, 680, 708, 710, 719–723
   orchestration_base.py128596%66, 169, 174, 179, 202
   sequential.py640100% 
TOTAL26558449983% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
3654 22 💤 0 ❌ 0 🔥 1m 57s ⏱️

@moonbox3 moonbox3 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few points for consideration either in this iteration or the next:

  1. Should we add some exception metadata? Timestamp, agent_id, operation context to help the caller? Looks like we'll lose some context about where or why the exception occurred.
  2. Should we add some logging around the exception points?
  3. Are we tracking a docs update to explain to users how exceptions are now surfaced?

@ekzhu ekzhu left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Callback is simple to reason about.

Just a reminder that we will need to think about a different way to surface exception if the actors are remote in a distributed runtime -- we don't have to worry about that now.

@TaoChenOSU

Copy link
Copy Markdown
Contributor Author

LGTM. A few points for consideration either in this iteration or the next:

  1. Should we add some exception metadata? Timestamp, agent_id, operation context to help the caller? Looks like we'll lose some context about where or why the exception occurred.
  2. Should we add some logging around the exception points?
  3. Are we tracking a docs update to explain to users how exceptions are now surfaced?
  1. We don't get the agent id in the original exception as it could happen in the LLM layer. I see the benefit of having the agent ID so I added an error log that captures it.
  2. Error log added.
  3. Good call out. I will work on that.

@crickman crickman moved this to Sprint: In Review in Semantic Kernel Jul 15, 2025
@crickman crickman added the multi-agent Issues for multi-agent orchestration label Jul 15, 2025
@TaoChenOSU TaoChenOSU added this pull request to the merge queue Jul 15, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jul 15, 2025
@TaoChenOSU TaoChenOSU added this pull request to the merge queue Jul 15, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jul 15, 2025
@TaoChenOSU TaoChenOSU added this pull request to the merge queue Jul 15, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jul 15, 2025
@moonbox3 moonbox3 added this pull request to the merge queue Jul 15, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jul 15, 2025
@TaoChenOSU TaoChenOSU enabled auto-merge July 15, 2025 22:44
@TaoChenOSU TaoChenOSU added this pull request to the merge queue Jul 15, 2025
Merged via the queue into main with commit b74ea05 Jul 15, 2025
28 checks passed
@TaoChenOSU TaoChenOSU deleted the taochen/orchestration-exception-handling branch July 15, 2025 23:11
@github-project-automation github-project-automation Bot moved this from Sprint: In Review to Sprint: Done in Semantic Kernel Jul 15, 2025
jcruzmot-te pushed a commit to thousandeyes/aia-semantic-kernel that referenced this pull request Sep 15, 2025
### Motivation and Context

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->
Currently, exceptions that occur inside the orchestration actors are not
properly surfaced to the caller as it happens.

Fixes: microsoft#12719

### Description

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->
This PR addresses the issue by introducing a new exception_callback that
is hidden from the user but will raise exceptions that occurs inside the
orchestration actors to the caller.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents multi-agent Issues for multi-agent orchestration python Pull requests for the Python Semantic Kernel

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Bug: Exceptions that occur in actors are not surfaced to the caller of orchestrations

4 participants