Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow API is not working with TLS enabled in distributed system #1428

Open
payalcha opened this issue Mar 6, 2025 · 1 comment
Open

Comments

@payalcha
Copy link
Collaborator

payalcha commented Mar 6, 2025

Describe the bug
While starting federal experiment in distributed systems, aggregator and collaborator fails to connect.

Issue
Branch - 1.7.1, develop

To Reproduce
Steps to reproduce the behavior:

  1. Get 3 system which can communicate with each other. Clone openfl in all 3 systems.
  2. Identify 1 system as director, 1 systems where you can run envoys and 1 system as manager.
  3. Create certificates for director, Bangalore, Chandler and manager using https://openfl.readthedocs.io/en/latest/developer_guide/utilities/pki.html#semi-automatic-certification in respective systems
  4. Go to openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermarking
  5. Start director in director system using README.md.
  6. In envoys start Bangalore and Chandler using README.md
  7. Chandler and Bangalore is able to connect with director.
  8. Start experiment in manager machine.

Expected behavior
Experiment must run successfully.

Actual behavior
Experiment get stuck post getting envoys.
Aggregator and collaborators are not able to connect.
director.log is attached for more context.

director.log

Additional context
Workaround which worked.
Modifying the experiment as below worked.

  1. In the cell where we instantiate FederatedRuntime -> Remove the #| export from that cell
  2. Add a new cell just below and paste the following:

Python
#| export
from openfl.experimental.workflow.runtime import FederatedRuntime
authorized_collaborators = ['Bangalore', 'Chandler']
federated_runtime = FederatedRuntime(
collaborators=authorized_collaborators,
director=None,
notebook_path='./MNIST_Watermarking.ipynb'
)

  1. While running the Jupyter notebook don't run the added cell from step 2.
@teoparvanov
Copy link
Collaborator

Thanks for documenting the observed issue, which we should aim to fix this in OpenFL 1.9. In the meantime, it's great that we have a workaround that has been tested with 1.7.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants