RL Swarm

This fork is battle-tested Gensyn RL-Swarm node with built in monitor and bulletproof autorestart. It includes better memory management, improvements to avoid OOM errors, better handling of DHT errors due to peer poisoining and crashes due to socket conflicts. Intended for advanced node runners.

Improvements/changes

File	Feature	Benefit
run_rl_swarm.sh	auto-restart loop	node restarts automatically in case of crash
run_rl_swarm.sh	restart counter & logging	easily check for crashes in `logs/restarts.log`
run_rl_swarm.sh	VRAM management	fixes memory fragmentation
manager.py	DHT reconnect	better stability and resilence during peer poisoning or network/p2pd drop
manager.py	bootnodes reinjection	skips round if unrecoverable, better node stability
manager.py	silenced hivemind noise	prevents tokenizer deadlocks, better node resilence
rl-swarm.yaml	bfloat16 + gradient checkpointing	memory optimisation
rl-swarm.yaml	smaller beam (50 -> 30)	memory optimisation
rl-swarm.yaml	minimal sampling	memory optimisation

(Feel free to use only selected parts/files for your own node setup)

IMPORTANT

Before running export these variables to answer questions (adjust accordingly) for auto-restart loop:

export HUGGINGFACE_ACCESS_TOKEN="None"
export MODEL_NAME="Gensyn/Qwen2.5-1.5B-Instruct"
export PRG_GAME=true

RL Swarm

RL Swarm is a peer-to-peer system for reinforcement learning. It allows you to train models collaboratively with others in the swarm, leveraging their collective intelligence. It is open source and permissionless, meaning you can run it on a consumer laptop at home or on a powerful GPU in the cloud. You can also connect your model to the Gensyn Testnet to receive an on-chain identity that tracks your progress over time.

Currently, we are running the reasoning-gym swarm on the Testnet. This swarm is designed to train models to solve a diverse set of reasoning tasks using the reasoning-gym dataset. The current list of default models includes:

Models:

Gensyn/Qwen2.5-0.5B-Instruct
Qwen/Qwen3-0.6B
nvidia/AceInstruct-1.5B
dnotitia/Smoothie-Qwen3-1.7B
Gensyn/Qwen2.5-1.5B-Instruct

This iteration of rl-swarm is powered by the GenRL library. It is a fully composable framework for decentralized reinforcement learning which enables users to create and customize their own swarms for reinforcement learning with multi-agent multi-stage environments.

Requirements

Your hardware requirements will vary depending on a number of factors including model size and the accelerator platform you use. Users running large NVIDIA GPU will be assigned a model from the large model pool, while users running less powerful hardware will be assigned a model from the small model pool. This design decision is intended to allow users to advance at a similar rate regardless of the hardware they use, maximizing their utility to the swarm.

Supported Hardware

arm64 or x86 CPU with minimum 32gb ram (note that if you run other applications during training it might crash training).

OR

CUDA devices (officially supported):
- RTX 3090
- RTX 4090
- RTX 5090
- A100
- H100

With either configuration, you will need Python >=3.10 (for Mac, you will likely need to upgrade).

⚠️ Please read before continuing ⚠️

This software is experimental and provided as-is for users who are interested in using (or helping to develop) an early version of the Gensyn Protocol for training models.

If you care about on-chain participation, you must read the Identity Management section below.

If you encounter issues, please first check Troubleshooting. If you cannot find a solution there, please check if there is an open (or closed) Issue. If there is no relevant issue, please file one and include 1) all relevant logs, 2) information about your device (e.g. which GPU, if relevant), and 3) your operating system information.

Instructions

Run the Swarm

The easiest way to run RL Swarm is using Docker. This ensures a consistent setup across all operating systems with minimal dependencies.

1. Clone this repo

git clone https://github.com/oxngon/rl-swarm && cd rl-swarm

Experimental (advanced) mode

If you want to experiment with the GenRL library or theconfigurable parameters, we recommend you run RL Swarm via shell script:

python3 -m venv .venv
source .venv/bin/activate
./run_rl_swarm.sh

To learn more about experimental mode, check out our getting started guide.

Login

A browser window will pop open (you'll need to manually navigate to http://localhost:3000/ if you're on a VM).
Click 'login'.
Login with your preferred method.

Huggingface (recommend 'NONE')

If you would like to upload your model to Hugging Face, enter your Hugging Face access token when prompted. You can generate one from your Hugging Face account, under Access Tokens.

Initial peering and training

From this stage onward your device will begin training. You should see your peer register and vote on-chain here.

You can also track your training progress in real time:

On The RL-Swarm Dashboard: dashboard.gensyn.ai

Name		Name	Last commit message	Last commit date
Latest commit History 223 Commits
containerfiles/swarm-node		containerfiles/swarm-node
hivemind_exp		hivemind_exp
modal-login		modal-login
rgym_exp		rgym_exp
web		web
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.webserver		Dockerfile.webserver
LICENSE.TXT		LICENSE.TXT
README.md		README.md
docker-compose.yaml		docker-compose.yaml
run_rl_swarm.sh		run_rl_swarm.sh
technical_report.pdf		technical_report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL Swarm

Improvements/changes

IMPORTANT

RL Swarm

Requirements

⚠️ Please read before continuing ⚠️

Instructions

Run the Swarm

1. Clone this repo

Experimental (advanced) mode

Login

Huggingface (recommend 'NONE')

Initial peering and training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RL Swarm

Improvements/changes

IMPORTANT

RL Swarm

Requirements

⚠️ Please read before continuing ⚠️

Instructions

Run the Swarm

1. Clone this repo

Experimental (advanced) mode

Login

Huggingface (recommend 'NONE')

Initial peering and training

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages