generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Updated OpenEnv docs #4418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+411
−271
Merged
Updated OpenEnv docs #4418
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
a57711b
Updated OpenEnv docs
sergiopaniego a76419c
Merge branch 'main' into openenv-docs-improvement
sergiopaniego 956ce8d
Merge branch 'main' into openenv-docs-improvement
sergiopaniego e66400b
Update echo with args
sergiopaniego 736d254
Merge branch 'openenv-docs-improvement' of github.com:huggingface/trl…
sergiopaniego bbdeb06
Update args to use Space, Docker or Local
sergiopaniego 3f6ee82
Updated catch
sergiopaniego bf9f806
Update catch example
sergiopaniego 812a453
Merge branch 'main' into openenv-docs-improvement
sergiopaniego 00d8475
Added how to run to catch example
sergiopaniego ff180ad
Updated wordle launch instructions
sergiopaniego abbc5a5
Default is running from docker
sergiopaniego b62bc3b
Updated docs
sergiopaniego 1fe72fb
Added image
sergiopaniego 892fdd2
Code quality
sergiopaniego bc4d1ea
Update
sergiopaniego 9bf71cc
Small nit
sergiopaniego bad5802
Merge branch 'main' into openenv-docs-improvement
sergiopaniego 12a0e51
Update based on feedback
sergiopaniego 38a4cec
Merge branch 'openenv-docs-improvement' of github.com:huggingface/trl…
sergiopaniego 65ff61b
Merge branch 'main' of github.com:huggingface/trl into openenv-docs-i…
sergiopaniego File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,7 +11,7 @@ In this guide, we’ll focus on **how to integrate OpenEnv with TRL**, but feel | |
| To use OpenEnv with TRL, install the framework: | ||
|
|
||
| ```bash | ||
| pip install openenv-core | ||
| pip install git+https://github.com/meta-pytorch/OpenEnv.git | ||
| ``` | ||
|
|
||
| ## Using `rollout_func` with OpenEnv environments | ||
|
|
@@ -65,6 +65,33 @@ By using OpenEnv in this loop, you can: | |
| * Plug in custom simulators, web APIs, or evaluators as environments. | ||
| * Pass structured reward signals back into RL training seamlessly. | ||
|
|
||
| ## Running the Environments | ||
|
|
||
| You can run OpenEnv environments in three different ways: | ||
|
|
||
| 1. **Local Docker container** *(recommended)* | ||
|
|
||
| To start a Docker container: | ||
| * Open the environment on the Hugging Face Hub. | ||
| * Click the **⋮ (three dots)** menu. | ||
| * Select **“Run locally.”** | ||
| * Copy and execute the provided command in your terminal. | ||
|
|
||
| Example: | ||
| ```bash | ||
| docker run -d -p 8001:8001 registry.hf.space/openenv-echo-env:latest | ||
| ``` | ||
|  | ||
| 2. **Local Python process**: Launch the environment directly using Uvicorn. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The user will need to know to be in the OpenEnv repo for this to work. |
||
| You can start the server manually as a local process. For more details about the available environments, refer to the [OpenEnv repository](https://github.com/meta-pytorch/OpenEnv/tree/main/src/envs). | ||
| ```bash | ||
| python -m uvicorn envs.echo_env.server.app:app --host 0.0.0.0 --port 8001 | ||
| ``` | ||
| 3. **Hugging Face Spaces**: Connect to a hosted environment running on the Hugging Face Hub. | ||
| To find the connection URL, open the Space page, click the **⋮ (three dots)** menu, and select **“Embed this Space.”** | ||
| You can then use that URL to connect directly from your client. | ||
| Keep in mind that public Spaces may have rate limits or temporarily go offline if inactive. | ||
|
|
||
| ## A simple example | ||
|
|
||
| The [echo.py](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/echo.py) script demonstrates a minimal, end-to-end integration between TRL and OpenEnv. In this example, the Echo environment rewards completions based on their text length, encouraging the model to generate longer outputs. This pattern can be extended to any custom environment that provides structured feedback or task-based rewards: | ||
|
|
@@ -75,6 +102,15 @@ from trl import GRPOConfig, GRPOTrainer | |
|
|
||
| # Create HTTP client for Echo Environment | ||
| client = EchoEnv.from_docker_image("echo-env:latest") | ||
| """ | ||
| Alternatively, you can start the environment manually with Docker and connect to it: | ||
|
|
||
| # Step 1: Start the Echo environment | ||
| docker run -d -p 8001:8001 registry.hf.space/openenv-echo-env:latest | ||
|
|
||
| # Step 2: Connect the client to the running container | ||
| client = EchoEnv(base_url="http://0.0.0.0:8001") | ||
| """ | ||
|
|
||
| def rollout_func(prompts, args, processing_class): | ||
| # 1. Generate completions via vLLM inference server (running on port 8000) | ||
|
|
@@ -151,6 +187,21 @@ CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen2.5-0.5B-Instruct --host | |
| CUDA_VISIBLE_DEVICES=1 python examples/scripts/openenv/echo.py | ||
| ``` | ||
|
|
||
| Alternatively, you can manually start the Echo environment in a Docker container before running the training: | ||
|
|
||
| ```bash | ||
| # Launch the Echo environment | ||
| docker run -d -p 8001:8001 registry.hf.space/openenv-echo-env:latest | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as above, I think we should double check this port mapping. |
||
| ``` | ||
|
|
||
| Then, initialize the client using: | ||
|
|
||
| `client = EchoEnv(base_url="http://0.0.0.0:8001")` | ||
|
|
||
| instead of: | ||
|
|
||
| `client = EchoEnv.from_docker_image("echo-env:latest")`. | ||
|
|
||
| Below is the reward curve from training: | ||
|
|
||
| <iframe src="https://trl-lib-trackio.hf.space?project=openenv&metrics=train/rewards/reward_from_env/mean&runs=qgallouedec-1761202871&sidebar=hidden&navbar=hidden" style="width:600px; height:500px; border:0;"></iframe> | ||
|
|
@@ -352,7 +403,7 @@ trainer = GRPOTrainer( | |
| trainer.train() | ||
| ``` | ||
|
|
||
| ### Running the Example | ||
| ### Running the Advanced Example | ||
|
|
||
| The example requires two GPUs: | ||
|
|
||
|
|
@@ -364,6 +415,17 @@ CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen3-1.7B --host 0.0.0.0 --p | |
| CUDA_VISIBLE_DEVICES=1 python examples/scripts/openenv/wordle.py | ||
| ``` | ||
|
|
||
| Again, you can manually start the TextArena environment in a Docker container before running the training. | ||
| In this case, initialize the client with | ||
| `client = TextArenaEnv(base_url="http://0.0.0.0:8001")` | ||
| instead of | ||
| `client = TextArenaEnv.from_docker_image("registry.hf.space/burtenshaw-textarena:latest")`: | ||
|
|
||
| ```bash | ||
| # Launch the TextArena environment | ||
| docker run -d -p 8001:8001 registry.hf.space/burtenshaw-textarena:latest | ||
| ``` | ||
|
|
||
| ### Results | ||
|
|
||
| The resulting model improves it's performance on the game, both by reducing the number of repetitions and by increasing the number of correct guesses. However, the the Qwen3-1.7B model we trained is not able to consistently win the game. The following reward curve shows the coverage of the model's guesses and the coverage of correct Y and G letters. | ||
|
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that this port mapping will work because the env doesn't use 8001, it uses 8000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8000 is for vLLM in the snippets we provide 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the late reply. I meant that the internal port in the mapping needs to match what the container is using (8000). the external post can be changed to match the host network.