Skip to content

Conversation

@gangsf
Copy link
Collaborator

@gangsf gangsf commented Oct 22, 2025

Description

Migrate rl-skyrl template from templates repo to Ray repo since templates repo will be deprecated.

@gangsf gangsf requested a review from a team as a code owner October 22, 2025 20:54
cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates the rl-skyrl example from the templates repository by adding a Jupyter Notebook and a Markdown file. The migration is a good step. My review focuses on ensuring the documentation is correct and maintainable. I've identified a few issues: a broken image link in the notebook due to a relative path, a malformed URL in both the notebook and markdown file, and an image asset hosted in the repository that is being deprecated, which poses a future risk. I've provided suggestions to fix these. As a general note, for this example to appear in the documentation, it needs to be added to doc/source/ray-overview/examples/index.rst.

"```\n",
"\n",
"If using W&B, you should see logs like the ones shown below, with detailed metric tracking and timing breakdowns for each stage of the RL pipeline.\n",
"<img src=\"assets/gsm8k_wandb.png\" width=1500px />\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The image assets/gsm8k_wandb.png is referenced with a relative path. This will result in a broken image when the notebook is rendered, as the assets directory is not included in this pull request. Please use a full URL to the image. I'd also recommend hosting this image within the Ray repository to avoid it breaking if the source repository is changed or removed in the future.

Suggested change
"<img src=\"assets/gsm8k_wandb.png\" width=1500px />\n"
"<img src=\"https://raw.githubusercontent.com/anyscale/templates/main/templates/rl-skyrl/assets/gsm8k_wandb.png\" width=1500px />\n"

"- Explore more advanced algorithms, like [PPO](https://github.com/NovaSky-AI/SkyRL/tree/main/skyrl-train/examples/ppo) or [DAPO](https://skyrl.readthedocs.io/en/latest/algorithms/dapo.html)\n",
"- Explore more advanced tasks like [SWE-Bench](https://skyrl.readthedocs.io/en/latest/examples/mini_swe_agent.html), or [agentic search (Search-R1)](https://skyrl.readthedocs.io/en/latest/examples/search.html).\n",
"- Optimize your training pipeline using [Async Training](https://skyrl.readthedocs.io/en/latest/tutorials/async.html)\n",
"- Deploy your trained LLM using [Ray Serve LLM on Anyscale](https://console.anyscale.com/template-preview/deployment-serve-llm?utm_source=anyscale_docs&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm?utm_source=anyscale&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm)."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This URL contains duplicated query parameters, which makes it malformed. The second set of UTM parameters, starting from the second ?, should be removed.

Suggested change
"- Deploy your trained LLM using [Ray Serve LLM on Anyscale](https://console.anyscale.com/template-preview/deployment-serve-llm?utm_source=anyscale_docs&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm?utm_source=anyscale&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm)."
"- Deploy your trained LLM using [Ray Serve LLM on Anyscale](https://console.anyscale.com/template-preview/deployment-serve-llm?utm_source=anyscale_docs&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm)."

```

If using W&B, you should see logs like the ones shown below, with detailed metric tracking and timing breakdowns for each stage of the RL pipeline.
<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/rl-skyrl/assets/gsm8k_wandb.png" width=1500px />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This image source points to the anyscale/templates repository. According to the pull request description, this repository is being deprecated. To prevent the image from breaking in the future, it should be moved to a more permanent location, such as within this repository's assets.

- Explore more advanced algorithms, like [PPO](https://github.com/NovaSky-AI/SkyRL/tree/main/skyrl-train/examples/ppo) or [DAPO](https://skyrl.readthedocs.io/en/latest/algorithms/dapo.html)
- Explore more advanced tasks like [SWE-Bench](https://skyrl.readthedocs.io/en/latest/examples/mini_swe_agent.html), or [agentic search (Search-R1)](https://skyrl.readthedocs.io/en/latest/examples/search.html).
- Optimize your training pipeline using [Async Training](https://skyrl.readthedocs.io/en/latest/tutorials/async.html)
- Deploy your trained LLM using [Ray Serve LLM on Anyscale](https://console.anyscale.com/template-preview/deployment-serve-llm?utm_source=anyscale_docs&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm?utm_source=anyscale&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This URL is malformed due to duplicated query parameters. The second ? and the parameters following it should be removed.

Suggested change
- Deploy your trained LLM using [Ray Serve LLM on Anyscale](https://console.anyscale.com/template-preview/deployment-serve-llm?utm_source=anyscale_docs&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm?utm_source=anyscale&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm).
- Deploy your trained LLM using [Ray Serve LLM on Anyscale](https://console.anyscale.com/template-preview/deployment-serve-llm?utm_source=anyscale_docs&utm_medium=docs&utm_campaign=examples_page&utm_content=deployment-serve-llm).

Copy link
Contributor

@erictang000 erictang000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Gang Zhao added 3 commits October 22, 2025 16:18
- Spell out acronyms on first use (GRPO, DAPO, SWE)
- Fix spelling errors (mins -> minutes, depedencies -> dependencies)
- Convert passive voice to active voice
- Remove exclamation points per style guide
- Fix 'agentic' to 'agent' and 'Async' to 'async'
- Fix 'walkthrough' to 'walk-through'
- Fix 'HuggingFace' to 'Hugging Face'
- Remove 'will' in favor of present tense

Signed-off-by: Gang Zhao <[email protected]>
- Use 'or' instead of parentheses for acronym definitions per Google.Parens
- Change GRPO from (GRPO) to 'or GRPO' format
- Change DAPO from (DAPO) to 'also known as DAPO' format
- Change SWE-Bench from (SWE-Bench) to 'also known as SWE-Bench' format
- Change GSM8K from (GSM8K) to 'on GSM8K' in heading
- Replace parenthetical code reference with 'with' for cleaner flow

Signed-off-by: Gang Zhao <[email protected]>
- Replace GRPO with 'Group Relative Policy Optimization' throughout
- Replace DAPO with 'Direct Alignment from Preference Optimization'
- Remove 'also known as' phrases with standalone acronyms
- Keep full algorithm names for clarity and to pass Google.Acronyms checks

Signed-off-by: Gang Zhao <[email protected]>
@ray-gardener ray-gardener bot added rllib RLlib related issues docs An issue or change related to documentation labels Oct 23, 2025
Signed-off-by: Gang Zhao <[email protected]>
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 7, 2025
@github-actions
Copy link

This pull request has been automatically closed because there has been no more activity in the 14 days
since being marked stale.

Please feel free to reopen or open a new pull request if you'd still like this to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for your contribution!

@github-actions github-actions bot closed this Nov 21, 2025
@gangsf gangsf reopened this Dec 8, 2025
@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs An issue or change related to documentation rllib RLlib related issues unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants