-
Notifications
You must be signed in to change notification settings - Fork 7k
Migrate rl-skyrl from templates repo to Ray repo #58014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Gang Zhao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request migrates the rl-skyrl example from the templates repository by adding a Jupyter Notebook and a Markdown file. The migration is a good step. My review focuses on ensuring the documentation is correct and maintainable. I've identified a few issues: a broken image link in the notebook due to a relative path, a malformed URL in both the notebook and markdown file, and an image asset hosted in the repository that is being deprecated, which poses a future risk. I've provided suggestions to fix these. As a general note, for this example to appear in the documentation, it needs to be added to doc/source/ray-overview/examples/index.rst.
| "```\n", | ||
| "\n", | ||
| "If using W&B, you should see logs like the ones shown below, with detailed metric tracking and timing breakdowns for each stage of the RL pipeline.\n", | ||
| "<img src=\"assets/gsm8k_wandb.png\" width=1500px />\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image assets/gsm8k_wandb.png is referenced with a relative path. This will result in a broken image when the notebook is rendered, as the assets directory is not included in this pull request. Please use a full URL to the image. I'd also recommend hosting this image within the Ray repository to avoid it breaking if the source repository is changed or removed in the future.
| "<img src=\"assets/gsm8k_wandb.png\" width=1500px />\n" | |
| "<img src=\"https://raw.githubusercontent.com/anyscale/templates/main/templates/rl-skyrl/assets/gsm8k_wandb.png\" width=1500px />\n" |
| "- Explore more advanced algorithms, like [PPO](https://github.com/NovaSky-AI/SkyRL/tree/main/skyrl-train/examples/ppo) or [DAPO](https://skyrl.readthedocs.io/en/latest/algorithms/dapo.html)\n", | ||
| "- Explore more advanced tasks like [SWE-Bench](https://skyrl.readthedocs.io/en/latest/examples/mini_swe_agent.html), or [agentic search (Search-R1)](https://skyrl.readthedocs.io/en/latest/examples/search.html).\n", | ||
| "- Optimize your training pipeline using [Async Training](https://skyrl.readthedocs.io/en/latest/tutorials/async.html)\n", | ||
| "- Deploy your trained LLM using [Ray Serve LLM on Anyscale](https://console.anyscale.com/template-preview/deployment-serve-llm?utm_source=anyscale_docs&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm?utm_source=anyscale&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This URL contains duplicated query parameters, which makes it malformed. The second set of UTM parameters, starting from the second ?, should be removed.
| "- Deploy your trained LLM using [Ray Serve LLM on Anyscale](https://console.anyscale.com/template-preview/deployment-serve-llm?utm_source=anyscale_docs&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm?utm_source=anyscale&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm)." | |
| "- Deploy your trained LLM using [Ray Serve LLM on Anyscale](https://console.anyscale.com/template-preview/deployment-serve-llm?utm_source=anyscale_docs&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm)." |
| ``` | ||
|
|
||
| If using W&B, you should see logs like the ones shown below, with detailed metric tracking and timing breakdowns for each stage of the RL pipeline. | ||
| <img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/rl-skyrl/assets/gsm8k_wandb.png" width=1500px /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Explore more advanced algorithms, like [PPO](https://github.com/NovaSky-AI/SkyRL/tree/main/skyrl-train/examples/ppo) or [DAPO](https://skyrl.readthedocs.io/en/latest/algorithms/dapo.html) | ||
| - Explore more advanced tasks like [SWE-Bench](https://skyrl.readthedocs.io/en/latest/examples/mini_swe_agent.html), or [agentic search (Search-R1)](https://skyrl.readthedocs.io/en/latest/examples/search.html). | ||
| - Optimize your training pipeline using [Async Training](https://skyrl.readthedocs.io/en/latest/tutorials/async.html) | ||
| - Deploy your trained LLM using [Ray Serve LLM on Anyscale](https://console.anyscale.com/template-preview/deployment-serve-llm?utm_source=anyscale_docs&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm?utm_source=anyscale&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This URL is malformed due to duplicated query parameters. The second ? and the parameters following it should be removed.
| - Deploy your trained LLM using [Ray Serve LLM on Anyscale](https://console.anyscale.com/template-preview/deployment-serve-llm?utm_source=anyscale_docs&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm?utm_source=anyscale&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm). | |
| - Deploy your trained LLM using [Ray Serve LLM on Anyscale](https://console.anyscale.com/template-preview/deployment-serve-llm?utm_source=anyscale_docs&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm). |
erictang000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
- Spell out acronyms on first use (GRPO, DAPO, SWE) - Fix spelling errors (mins -> minutes, depedencies -> dependencies) - Convert passive voice to active voice - Remove exclamation points per style guide - Fix 'agentic' to 'agent' and 'Async' to 'async' - Fix 'walkthrough' to 'walk-through' - Fix 'HuggingFace' to 'Hugging Face' - Remove 'will' in favor of present tense Signed-off-by: Gang Zhao <[email protected]>
- Use 'or' instead of parentheses for acronym definitions per Google.Parens - Change GRPO from (GRPO) to 'or GRPO' format - Change DAPO from (DAPO) to 'also known as DAPO' format - Change SWE-Bench from (SWE-Bench) to 'also known as SWE-Bench' format - Change GSM8K from (GSM8K) to 'on GSM8K' in heading - Replace parenthetical code reference with 'with' for cleaner flow Signed-off-by: Gang Zhao <[email protected]>
- Replace GRPO with 'Group Relative Policy Optimization' throughout - Replace DAPO with 'Direct Alignment from Preference Optimization' - Remove 'also known as' phrases with standalone acronyms - Keep full algorithm names for clarity and to pass Google.Acronyms checks Signed-off-by: Gang Zhao <[email protected]>
Signed-off-by: Gang Zhao <[email protected]>
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
This pull request has been automatically closed because there has been no more activity in the 14 days Please feel free to reopen or open a new pull request if you'd still like this to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for your contribution! |
Description
Migrate rl-skyrl template from templates repo to Ray repo since templates repo will be deprecated.