Skip to content

fix(framework): Report clientapp appio communication failures as errors#7061

Merged
danieljanes merged 51 commits into
mainfrom
report-clientapp-appio-communication-failures-as-errors
May 6, 2026
Merged

fix(framework): Report clientapp appio communication failures as errors#7061
danieljanes merged 51 commits into
mainfrom
report-clientapp-appio-communication-failures-as-errors

Conversation

@msheller
Copy link
Copy Markdown
Member

@msheller msheller commented Apr 29, 2026

Minor fix for error checking corner case that could cause silent failures (success reported despite error).

This addresses the codex comment remaining in already merged PR 6986

Checklist

  • Implement proposed change
  • Write tests
  • Update documentation
  • Address LLM-reviewer comments, if applicable (e.g., GitHub Copilot)
  • Make CI checks pass
  • Ping maintainers on Slack (channel #contributions)

@github-actions github-actions Bot added the Maintainer Used to determine what PRs (mainly) come from Flower maintainers. label Apr 29, 2026
Base automatically changed from tls-for-appio-servicers to main April 30, 2026 15:28
@msheller msheller changed the title Report clientapp appio communication failures as errors fix(framework): Report clientapp appio communication failures as errors Apr 30, 2026
@msheller msheller marked this pull request as ready for review April 30, 2026 16:59
Copilot AI review requested due to automatic review settings April 30, 2026 16:59
@msheller msheller requested a review from danieljanes as a code owner April 30, 2026 16:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a corner case in the SuperNode ClientApp runtime where AppIO gRPC failures could still result in a success (ExitCode.SUCCESS) being reported, and introduces a dedicated non-zero exit code for such communication failures.

Changes:

  • Update run_clientapp to return a non-zero exit code when a grpc.RpcError occurs during AppIO communication.
  • Add ExitCode.CLIENTAPP_COMMUNICATION_ERROR = 250 and document it.
  • Add a unit test ensuring gRPC failures don’t report success.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
framework/py/flwr/supernode/runtime/run_clientapp.py Tracks an exit_code and switches it to a ClientApp communication error on grpc.RpcError.
framework/py/flwr/supernode/runtime/run_clientapp_test.py Adds a test asserting flwr_exit is called with the new non-zero exit code on gRPC failure.
framework/py/flwr/common/exit/exit_code.py Introduces the new ClientApp-specific exit code and help text; adjusts documented ServerApp range.
framework/docs/source/ref-exit-codes/250.rst Documents exit code 250 and remediation guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread framework/py/flwr/supernode/runtime/run_clientapp.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to me, but I think the new exit-code docs are a little too user-action-oriented for an internal process. In normal runtime usage, users do not start flwr-clientapp directly; it is launched by SuperExec with the AppIO address/token/TLS settings already derived from the SuperNode/SuperExec setup.

Could we adjust this page to say that this means the internal ClientApp process (flwr-clientapp) could not communicate with the SuperNode ClientAppIo API, and that users should check the surrounding SuperNode/ClientApp logs for the underlying gRPC error? A likely cause is that the ClientApp process took too long to start, for example on a very slow or overloaded system, causing the short-lived token/heartbeat window to expire. If the log message is not enough to diagnose it, users should contact the Flower team with the relevant logs. Wdyt?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! LMK if you want further changes!

Copy link
Copy Markdown
Member

@panh99 panh99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@danieljanes danieljanes enabled auto-merge (squash) May 6, 2026 15:21
@danieljanes danieljanes merged commit 442472a into main May 6, 2026
70 checks passed
@danieljanes danieljanes deleted the report-clientapp-appio-communication-failures-as-errors branch May 6, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Maintainer Used to determine what PRs (mainly) come from Flower maintainers.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants