A distributed platform for executing federated learning tasks across multiple client nodes and a central server. This platform enables collaborative machine learning while keeping data localized on client hardware.
-
Parameter Server (PS)
- Central server that coordinates the federated learning process
- Handles model distribution and aggregation
- Manages connections with client nodes
- Runs on port 8200 by default
-
Client Nodes
- Process local data and perform model training
- Communicate with PS via WebSocket connections
- Docker
- Python 3.10
# Navigate to Server directory
cd Server
# Build the server image
docker build -t websocket-server .
# Run the server with number of expected clients (e.g., 3 clients)
docker run -p 8200:8200 websocket-server --num_ues 3# Navigate to Client directory
cd Client
# Build the client image
docker build -t websocket-client .
# Run client instances on defined port
docker run -p 5000:5000 websocket-client 5000
docker run -p 5001:5001 websocket-client 5001
The system uses WebSocket connections for bidirectional communication:
- Server endpoint:
/job_receive - Client endpoint:
/process
- Task Initiation
- Send a WebSocket request to server endpoint
/job_receivewith a JSON payload as below. Change parameters accordingly.
{ "jobData": { "general": { "task": "satellite_fl", // Task identifier (Any unique name) "algo": "regression", // Type of algorithm. Supports regression/classification "host": "host_ip", // Host IP address "clients": [ // List of clients connected to PS { "client_ip": "172.17.0.1:5000", // Client IP address "client_id": "client1" // Client name }, { "client_ip": "172.17.0.1:5001", "client_id": "client2" }, { "client_ip": "172.17.0.1:5002", "client_id": "client3" } ] }, "scheme": { "minibatch": "64", // Training minibatch size "epoch": "1", // Number of local epochs "lr": "0.001", // Learning rate for model training "scheduler": "random", // Client scheduling method. Supports random/full/round_robin/latency "clientFraction": "0.7", // Client fraction to select number of participating client per communication round "minibatchtest": "4096", // Test minibatch size "comRounds": "20" // Number of total communication rounds }, "modelParam": { "optimizer": "Adam", // Optimizer for training the model "loss": "Huber", // Loss function used for training "compress": "quantize", // Model compression method. If no compression change to False "z_point": 0.0, // Quantization parameter1 "scale": 0.1, // Quantization parameter2 "num_bits": 16 // Quantization parameter3 }, "preprocessing": { "dtype": "regression", // data type. supports regression/img "folder": "satellite", // Folder where training data is stored. Should be inside data folder in the Clients "testfolder": "satellite", // Folder where test data is stores. Should be inside data folder in Server. "normalize": false // Set normalize to false } } } - Send a WebSocket request to server endpoint
Alternatively run start_fl.py file to send a request to the PS by changing parameters inside the file and network parameters in the network_config.yml file.
2.Client Selection
- PS selects a fraction of available clients based on
client_fraction - Selected clients receive the initial model parameters
-
Training Rounds
- For each communication round:
- PS sends model to selected clients via
/processendpoint - Clients perform local training using specified configuration
- Clients send updated models back to PS
- PS aggregates updates using FedAvg method. If quantized before sending model is dequantized before aggregating
- PS sends model to selected clients via
- For each communication round:
-
Termination
- Process continues until specified number of communication rounds completed
- Final model is saved at PS
At the end of the training communication energy is calculated by calculating the achievable bit rate for a set of users in a wireless communication system. Below is an explanation of the key parameters, the steps in the calculation.
- Number of Users (
num_users = 20): - Area Size (
A = 10000 m^2):- Defines a 100 x 100 m^2 area in which the users are randomly located.
- Transmission Power (
Pt = 100e-3 W): - Bandwidth (
B = 2e6Hz):- The available bandwidth for the communication is 2 MHz.
- Noise Spectral Density (
N0 = 1e-9W/Hz): - Model Size (
32 * 1e6 bits):- Assumes a model with parameters represented using 32 bits.
- Server Position:
- Fixed at the center of the area (500, 500), ensuring a centralized setup for communication.
Achievable bit rate is calculated at the start of PS based on the number of user equipments provided as the command line argument. This achievable bit rate is used to calculate energy consumption per communication round as follows.
The total energy for the training is calculated as: