Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Better doc for Transform class #2797

Merged
merged 1 commit into from
Feb 20, 2025
Merged

Conversation

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Feb 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2797

Note: Links to docs will display an error until the docs builds have been completed.

❌ 11 New Failures, 1 Unrelated Failure

As of commit ebd707e with merge base 76aa9bc (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 20, 2025
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.6258s 0.5240s 1.9082 Ops/s 1.9115 Ops/s $\color{#d91a1a}-0.17\%$
test_transformed 1.1417s 1.0432s 0.9586 Ops/s 0.9943 Ops/s $\color{#d91a1a}-3.59\%$
test_serial 1.6104s 1.5255s 0.6555 Ops/s 0.6607 Ops/s $\color{#d91a1a}-0.79\%$
test_parallel 1.4021s 1.3155s 0.7602 Ops/s 0.7551 Ops/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[True-True-True-True-True] 0.2225ms 30.0927μs 33.2306 KOps/s 32.8293 KOps/s $\color{#35bf28}+1.22\%$
test_step_mdp_speed[True-True-True-True-False] 46.7080μs 18.1650μs 55.0510 KOps/s 55.2957 KOps/s $\color{#d91a1a}-0.44\%$
test_step_mdp_speed[True-True-True-False-True] 0.5244ms 17.1769μs 58.2178 KOps/s 58.7196 KOps/s $\color{#d91a1a}-0.85\%$
test_step_mdp_speed[True-True-True-False-False] 39.3630μs 10.0590μs 99.4131 KOps/s 99.4070 KOps/s $+0.01\%$
test_step_mdp_speed[True-True-False-True-True] 64.7110μs 32.5873μs 30.6868 KOps/s 30.7450 KOps/s $\color{#d91a1a}-0.19\%$
test_step_mdp_speed[True-True-False-True-False] 47.1180μs 19.8203μs 50.4534 KOps/s 50.6931 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[True-True-False-False-True] 55.0220μs 19.0799μs 52.4111 KOps/s 53.5882 KOps/s $\color{#d91a1a}-2.20\%$
test_step_mdp_speed[True-True-False-False-False] 39.0630μs 12.0785μs 82.7920 KOps/s 83.9744 KOps/s $\color{#d91a1a}-1.41\%$
test_step_mdp_speed[True-False-True-True-True] 67.4560μs 34.2371μs 29.2081 KOps/s 29.2694 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[True-False-True-True-False] 56.4050μs 21.8751μs 45.7141 KOps/s 46.0966 KOps/s $\color{#d91a1a}-0.83\%$
test_step_mdp_speed[True-False-True-False-True] 48.4710μs 19.1354μs 52.2591 KOps/s 53.1762 KOps/s $\color{#d91a1a}-1.72\%$
test_step_mdp_speed[True-False-True-False-False] 51.6360μs 12.0338μs 83.0992 KOps/s 83.9868 KOps/s $\color{#d91a1a}-1.06\%$
test_step_mdp_speed[True-False-False-True-True] 76.2230μs 35.9449μs 27.8204 KOps/s 27.8383 KOps/s $\color{#d91a1a}-0.06\%$
test_step_mdp_speed[True-False-False-True-False] 63.8590μs 23.8355μs 41.9543 KOps/s 42.2405 KOps/s $\color{#d91a1a}-0.68\%$
test_step_mdp_speed[True-False-False-False-True] 51.5060μs 20.6327μs 48.4667 KOps/s 47.9490 KOps/s $\color{#35bf28}+1.08\%$
test_step_mdp_speed[True-False-False-False-False] 36.4280μs 13.7967μs 72.4810 KOps/s 73.2248 KOps/s $\color{#d91a1a}-1.02\%$
test_step_mdp_speed[False-True-True-True-True] 75.6510μs 34.0282μs 29.3874 KOps/s 29.3539 KOps/s $\color{#35bf28}+0.11\%$
test_step_mdp_speed[False-True-True-True-False] 52.7680μs 21.8199μs 45.8297 KOps/s 46.4090 KOps/s $\color{#d91a1a}-1.25\%$
test_step_mdp_speed[False-True-True-False-True] 53.7300μs 21.6918μs 46.1004 KOps/s 46.6258 KOps/s $\color{#d91a1a}-1.13\%$
test_step_mdp_speed[False-True-True-False-False] 31.2490μs 13.5302μs 73.9087 KOps/s 75.0159 KOps/s $\color{#d91a1a}-1.48\%$
test_step_mdp_speed[False-True-False-True-True] 0.6718ms 36.1898μs 27.6321 KOps/s 28.1625 KOps/s $\color{#d91a1a}-1.88\%$
test_step_mdp_speed[False-True-False-True-False] 57.1560μs 23.9305μs 41.7877 KOps/s 42.8404 KOps/s $\color{#d91a1a}-2.46\%$
test_step_mdp_speed[False-True-False-False-True] 2.7709ms 23.7572μs 42.0924 KOps/s 42.4470 KOps/s $\color{#d91a1a}-0.84\%$
test_step_mdp_speed[False-True-False-False-False] 49.8230μs 15.2731μs 65.4746 KOps/s 65.8631 KOps/s $\color{#d91a1a}-0.59\%$
test_step_mdp_speed[False-False-True-True-True] 79.9290μs 38.1082μs 26.2410 KOps/s 26.4450 KOps/s $\color{#d91a1a}-0.77\%$
test_step_mdp_speed[False-False-True-True-False] 58.5790μs 25.7274μs 38.8691 KOps/s 38.7900 KOps/s $\color{#35bf28}+0.20\%$
test_step_mdp_speed[False-False-True-False-True] 60.5130μs 23.5107μs 42.5339 KOps/s 42.7084 KOps/s $\color{#d91a1a}-0.41\%$
test_step_mdp_speed[False-False-True-False-False] 41.2070μs 15.0231μs 66.5643 KOps/s 67.1633 KOps/s $\color{#d91a1a}-0.89\%$
test_step_mdp_speed[False-False-False-True-True] 0.1210ms 38.8710μs 25.7261 KOps/s 25.4766 KOps/s $\color{#35bf28}+0.98\%$
test_step_mdp_speed[False-False-False-True-False] 61.8160μs 27.1143μs 36.8809 KOps/s 37.0125 KOps/s $\color{#d91a1a}-0.36\%$
test_step_mdp_speed[False-False-False-False-True] 62.2360μs 24.8892μs 40.1781 KOps/s 40.1989 KOps/s $\color{#d91a1a}-0.05\%$
test_step_mdp_speed[False-False-False-False-False] 45.9360μs 16.6987μs 59.8849 KOps/s 59.4260 KOps/s $\color{#35bf28}+0.77\%$
test_values[generalized_advantage_estimate-True-True] 11.5882ms 10.0262ms 99.7388 Ops/s 103.9375 Ops/s $\color{#d91a1a}-4.04\%$
test_values[vec_generalized_advantage_estimate-True-True] 28.8016ms 26.1689ms 38.2133 Ops/s 40.8930 Ops/s $\textbf{\color{#d91a1a}-6.55\%}$
test_values[td0_return_estimate-False-False] 0.2283ms 0.1843ms 5.4250 KOps/s 5.4592 KOps/s $\color{#d91a1a}-0.63\%$
test_values[td1_return_estimate-False-False] 28.1899ms 25.2133ms 39.6616 Ops/s 41.0006 Ops/s $\color{#d91a1a}-3.27\%$
test_values[vec_td1_return_estimate-False-False] 28.8400ms 26.2407ms 38.1088 Ops/s 40.5238 Ops/s $\textbf{\color{#d91a1a}-5.96\%}$
test_values[td_lambda_return_estimate-True-False] 39.5087ms 36.0124ms 27.7682 Ops/s 28.5048 Ops/s $\color{#d91a1a}-2.58\%$
test_values[vec_td_lambda_return_estimate-True-False] 28.0031ms 26.2305ms 38.1235 Ops/s 40.7469 Ops/s $\textbf{\color{#d91a1a}-6.44\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.7545ms 8.6083ms 116.1676 Ops/s 116.1387 Ops/s $\color{#35bf28}+0.02\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.7598ms 1.9701ms 507.5987 Ops/s 517.2638 Ops/s $\color{#d91a1a}-1.87\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.6056ms 0.3743ms 2.6719 KOps/s 2.7799 KOps/s $\color{#d91a1a}-3.88\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 47.0307ms 44.6646ms 22.3891 Ops/s 22.1415 Ops/s $\color{#35bf28}+1.12\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.2311ms 3.4569ms 289.2780 Ops/s 290.7244 Ops/s $\color{#d91a1a}-0.50\%$
test_dqn_speed[False-None] 1.8859ms 1.4114ms 708.5025 Ops/s 724.0938 Ops/s $\color{#d91a1a}-2.15\%$
test_dqn_speed[False-backward] 1.9685ms 1.8804ms 531.7969 Ops/s 535.9327 Ops/s $\color{#d91a1a}-0.77\%$
test_dqn_speed[True-None] 0.7146ms 0.4972ms 2.0112 KOps/s 1.9518 KOps/s $\color{#35bf28}+3.05\%$
test_dqn_speed[True-backward] 0.9870ms 0.9218ms 1.0848 KOps/s 1.0683 KOps/s $\color{#35bf28}+1.55\%$
test_dqn_speed[reduce-overhead-None] 0.7046ms 0.4937ms 2.0254 KOps/s 1.9954 KOps/s $\color{#35bf28}+1.50\%$
test_dqn_speed[reduce-overhead-backward] 0.9671ms 0.9271ms 1.0786 KOps/s 1.0744 KOps/s $\color{#35bf28}+0.40\%$
test_ddpg_speed[False-None] 4.3339ms 3.0474ms 328.1494 Ops/s 350.8944 Ops/s $\textbf{\color{#d91a1a}-6.48\%}$
test_ddpg_speed[False-backward] 4.2338ms 4.0662ms 245.9328 Ops/s 252.0696 Ops/s $\color{#d91a1a}-2.43\%$
test_ddpg_speed[True-None] 1.9078ms 1.2625ms 792.0730 Ops/s 794.6697 Ops/s $\color{#d91a1a}-0.33\%$
test_ddpg_speed[True-backward] 2.2337ms 2.1756ms 459.6397 Ops/s 456.3722 Ops/s $\color{#35bf28}+0.72\%$
test_ddpg_speed[reduce-overhead-None] 1.5735ms 1.2587ms 794.4542 Ops/s 788.4779 Ops/s $\color{#35bf28}+0.76\%$
test_ddpg_speed[reduce-overhead-backward] 2.3441ms 2.1656ms 461.7597 Ops/s 460.4824 Ops/s $\color{#35bf28}+0.28\%$
test_sac_speed[False-None] 9.3031ms 8.1693ms 122.4099 Ops/s 123.9832 Ops/s $\color{#d91a1a}-1.27\%$
test_sac_speed[False-backward] 11.4129ms 10.9515ms 91.3114 Ops/s 92.4305 Ops/s $\color{#d91a1a}-1.21\%$
test_sac_speed[True-None] 2.4524ms 2.1110ms 473.7110 Ops/s 464.0849 Ops/s $\color{#35bf28}+2.07\%$
test_sac_speed[True-backward] 4.1553ms 3.7996ms 263.1844 Ops/s 243.8726 Ops/s $\textbf{\color{#35bf28}+7.92\%}$
test_sac_speed[reduce-overhead-None] 3.0859ms 2.1257ms 470.4371 Ops/s 452.9540 Ops/s $\color{#35bf28}+3.86\%$
test_sac_speed[reduce-overhead-backward] 4.5296ms 3.8456ms 260.0364 Ops/s 243.1116 Ops/s $\textbf{\color{#35bf28}+6.96\%}$
test_redq_speed[False-None] 15.2277ms 13.1269ms 76.1794 Ops/s 71.6403 Ops/s $\textbf{\color{#35bf28}+6.34\%}$
test_redq_speed[False-backward] 28.1957ms 22.7494ms 43.9573 Ops/s 43.3880 Ops/s $\color{#35bf28}+1.31\%$
test_redq_speed[True-None] 6.3288ms 5.3012ms 188.6359 Ops/s 163.7280 Ops/s $\textbf{\color{#35bf28}+15.21\%}$
test_redq_speed[True-backward] 13.9836ms 12.5450ms 79.7132 Ops/s 76.4170 Ops/s $\color{#35bf28}+4.31\%$
test_redq_speed[reduce-overhead-None] 5.6037ms 4.9839ms 200.6444 Ops/s 187.5798 Ops/s $\textbf{\color{#35bf28}+6.96\%}$
test_redq_speed[reduce-overhead-backward] 13.7667ms 13.1185ms 76.2283 Ops/s 75.9505 Ops/s $\color{#35bf28}+0.37\%$
test_redq_deprec_speed[False-None] 14.0623ms 13.1659ms 75.9540 Ops/s 76.0545 Ops/s $\color{#d91a1a}-0.13\%$
test_redq_deprec_speed[False-backward] 20.3628ms 18.9740ms 52.7038 Ops/s 51.9948 Ops/s $\color{#35bf28}+1.36\%$
test_redq_deprec_speed[True-None] 4.8128ms 3.9198ms 255.1159 Ops/s 255.7068 Ops/s $\color{#d91a1a}-0.23\%$
test_redq_deprec_speed[True-backward] 8.7950ms 8.3512ms 119.7438 Ops/s 119.7806 Ops/s $\color{#d91a1a}-0.03\%$
test_redq_deprec_speed[reduce-overhead-None] 4.9348ms 4.0228ms 248.5861 Ops/s 255.5828 Ops/s $\color{#d91a1a}-2.74\%$
test_redq_deprec_speed[reduce-overhead-backward] 8.7889ms 8.3393ms 119.9140 Ops/s 118.0244 Ops/s $\color{#35bf28}+1.60\%$
test_td3_speed[False-None] 8.4156ms 8.0389ms 124.3951 Ops/s 125.7990 Ops/s $\color{#d91a1a}-1.12\%$
test_td3_speed[False-backward] 11.2842ms 10.5234ms 95.0261 Ops/s 96.1685 Ops/s $\color{#d91a1a}-1.19\%$
test_td3_speed[True-None] 1.9962ms 1.8526ms 539.7876 Ops/s 531.8814 Ops/s $\color{#35bf28}+1.49\%$
test_td3_speed[True-backward] 3.5185ms 3.4464ms 290.1586 Ops/s 282.2256 Ops/s $\color{#35bf28}+2.81\%$
test_td3_speed[reduce-overhead-None] 2.0470ms 1.8583ms 538.1395 Ops/s 535.3177 Ops/s $\color{#35bf28}+0.53\%$
test_td3_speed[reduce-overhead-backward] 4.6442ms 3.6814ms 271.6344 Ops/s 286.3330 Ops/s $\textbf{\color{#d91a1a}-5.13\%}$
test_cql_speed[False-None] 40.2305ms 37.8076ms 26.4497 Ops/s 26.8357 Ops/s $\color{#d91a1a}-1.44\%$
test_cql_speed[False-backward] 49.8743ms 48.1177ms 20.7824 Ops/s 20.9110 Ops/s $\color{#d91a1a}-0.62\%$
test_cql_speed[True-None] 17.7264ms 16.5729ms 60.3395 Ops/s 60.0440 Ops/s $\color{#35bf28}+0.49\%$
test_cql_speed[True-backward] 37.2597ms 24.0719ms 41.5422 Ops/s 43.2135 Ops/s $\color{#d91a1a}-3.87\%$
test_cql_speed[reduce-overhead-None] 17.7856ms 16.7023ms 59.8722 Ops/s 60.1499 Ops/s $\color{#d91a1a}-0.46\%$
test_cql_speed[reduce-overhead-backward] 24.2609ms 23.0564ms 43.3720 Ops/s 42.5814 Ops/s $\color{#35bf28}+1.86\%$
test_a2c_speed[False-None] 8.0985ms 7.2730ms 137.4946 Ops/s 135.5347 Ops/s $\color{#35bf28}+1.45\%$
test_a2c_speed[False-backward] 16.0373ms 14.7383ms 67.8504 Ops/s 67.1425 Ops/s $\color{#35bf28}+1.05\%$
test_a2c_speed[True-None] 4.8259ms 3.8135ms 262.2275 Ops/s 263.6839 Ops/s $\color{#d91a1a}-0.55\%$
test_a2c_speed[True-backward] 11.2590ms 10.4907ms 95.3226 Ops/s 98.5452 Ops/s $\color{#d91a1a}-3.27\%$
test_a2c_speed[reduce-overhead-None] 4.2755ms 3.7478ms 266.8229 Ops/s 264.5959 Ops/s $\color{#35bf28}+0.84\%$
test_a2c_speed[reduce-overhead-backward] 11.5897ms 10.7104ms 93.3674 Ops/s 93.2185 Ops/s $\color{#35bf28}+0.16\%$
test_ppo_speed[False-None] 8.5686ms 7.8574ms 127.2679 Ops/s 131.9301 Ops/s $\color{#d91a1a}-3.53\%$
test_ppo_speed[False-backward] 17.4399ms 15.5843ms 64.1673 Ops/s 67.3466 Ops/s $\color{#d91a1a}-4.72\%$
test_ppo_speed[True-None] 4.4854ms 4.1657ms 240.0577 Ops/s 244.3362 Ops/s $\color{#d91a1a}-1.75\%$
test_ppo_speed[True-backward] 10.7181ms 10.3066ms 97.0253 Ops/s 99.8072 Ops/s $\color{#d91a1a}-2.79\%$
test_ppo_speed[reduce-overhead-None] 6.1241ms 4.2002ms 238.0836 Ops/s 243.1142 Ops/s $\color{#d91a1a}-2.07\%$
test_ppo_speed[reduce-overhead-backward] 11.3468ms 10.4614ms 95.5893 Ops/s 97.9919 Ops/s $\color{#d91a1a}-2.45\%$
test_reinforce_speed[False-None] 9.3718ms 6.8114ms 146.8117 Ops/s 154.2847 Ops/s $\color{#d91a1a}-4.84\%$
test_reinforce_speed[False-backward] 10.4586ms 10.0870ms 99.1372 Ops/s 101.0224 Ops/s $\color{#d91a1a}-1.87\%$
test_reinforce_speed[True-None] 3.8966ms 3.1967ms 312.8257 Ops/s 322.1640 Ops/s $\color{#d91a1a}-2.90\%$
test_reinforce_speed[True-backward] 10.6710ms 9.4744ms 105.5480 Ops/s 106.3841 Ops/s $\color{#d91a1a}-0.79\%$
test_reinforce_speed[reduce-overhead-None] 3.5998ms 3.1158ms 320.9400 Ops/s 313.6717 Ops/s $\color{#35bf28}+2.32\%$
test_reinforce_speed[reduce-overhead-backward] 10.3813ms 9.9186ms 100.8210 Ops/s 106.0123 Ops/s $\color{#d91a1a}-4.90\%$
test_iql_speed[False-None] 34.1756ms 33.1973ms 30.1229 Ops/s 28.8516 Ops/s $\color{#35bf28}+4.41\%$
test_iql_speed[False-backward] 56.5403ms 47.0763ms 21.2421 Ops/s 21.5061 Ops/s $\color{#d91a1a}-1.23\%$
test_iql_speed[True-None] 13.1193ms 11.8632ms 84.2943 Ops/s 83.7549 Ops/s $\color{#35bf28}+0.64\%$
test_iql_speed[True-backward] 24.3130ms 23.1214ms 43.2500 Ops/s 41.1318 Ops/s $\textbf{\color{#35bf28}+5.15\%}$
test_iql_speed[reduce-overhead-None] 13.9561ms 11.9966ms 83.3570 Ops/s 84.4397 Ops/s $\color{#d91a1a}-1.28\%$
test_iql_speed[reduce-overhead-backward] 24.4452ms 23.3129ms 42.8947 Ops/s 42.3027 Ops/s $\color{#35bf28}+1.40\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.6073ms 5.1117ms 195.6304 Ops/s 192.4482 Ops/s $\color{#35bf28}+1.65\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9167ms 0.5584ms 1.7908 KOps/s 1.8178 KOps/s $\color{#d91a1a}-1.49\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 1.1226ms 0.5278ms 1.8945 KOps/s 1.8996 KOps/s $\color{#d91a1a}-0.27\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 7.3840ms 4.9213ms 203.1997 Ops/s 204.9911 Ops/s $\color{#d91a1a}-0.87\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3.1212ms 0.5569ms 1.7957 KOps/s 1.8885 KOps/s $\color{#d91a1a}-4.91\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.9175ms 0.5286ms 1.8917 KOps/s 1.9753 KOps/s $\color{#d91a1a}-4.23\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.4703ms 1.7450ms 573.0740 Ops/s 595.0987 Ops/s $\color{#d91a1a}-3.70\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.9437ms 1.6505ms 605.8758 Ops/s 624.7030 Ops/s $\color{#d91a1a}-3.01\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.3927ms 5.0636ms 197.4873 Ops/s 194.1160 Ops/s $\color{#35bf28}+1.74\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.4430ms 0.6960ms 1.4367 KOps/s 1.4703 KOps/s $\color{#d91a1a}-2.28\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.1908ms 0.6743ms 1.4829 KOps/s 1.5176 KOps/s $\color{#d91a1a}-2.28\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.8695ms 5.0147ms 199.4118 Ops/s 197.5998 Ops/s $\color{#35bf28}+0.92\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.9376ms 0.5659ms 1.7672 KOps/s 1.7985 KOps/s $\color{#d91a1a}-1.74\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8070ms 0.5369ms 1.8625 KOps/s 1.9389 KOps/s $\color{#d91a1a}-3.94\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.5568ms 4.9079ms 203.7511 Ops/s 199.2346 Ops/s $\color{#35bf28}+2.27\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.8868ms 0.5542ms 1.8045 KOps/s 1.8251 KOps/s $\color{#d91a1a}-1.13\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 1.0607ms 0.5898ms 1.6954 KOps/s 1.8914 KOps/s $\textbf{\color{#d91a1a}-10.36\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.4204ms 4.9980ms 200.0781 Ops/s 194.8416 Ops/s $\color{#35bf28}+2.69\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3184ms 0.6942ms 1.4406 KOps/s 1.4577 KOps/s $\color{#d91a1a}-1.17\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9068ms 0.6591ms 1.5172 KOps/s 1.3837 KOps/s $\textbf{\color{#35bf28}+9.65\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 5.4962ms 4.2619ms 234.6371 Ops/s 227.5310 Ops/s $\color{#35bf28}+3.12\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.4941ms 2.4461ms 408.8109 Ops/s 425.4474 Ops/s $\color{#d91a1a}-3.91\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 4.6273ms 1.4053ms 711.5917 Ops/s 735.0275 Ops/s $\color{#d91a1a}-3.19\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5252s 14.8713ms 67.2436 Ops/s 232.4728 Ops/s $\textbf{\color{#d91a1a}-71.07\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.3220ms 2.1912ms 456.3698 Ops/s 410.6819 Ops/s $\textbf{\color{#35bf28}+11.12\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 8.3660ms 1.4267ms 700.9081 Ops/s 844.4499 Ops/s $\textbf{\color{#d91a1a}-17.00\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 5.9780ms 4.6142ms 216.7239 Ops/s 231.7775 Ops/s $\textbf{\color{#d91a1a}-6.49\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 7.4253ms 2.5605ms 390.5546 Ops/s 392.0249 Ops/s $\color{#d91a1a}-0.38\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 7.1746ms 1.6508ms 605.7564 Ops/s 667.4055 Ops/s $\textbf{\color{#d91a1a}-9.24\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 12.5753ms 12.2063ms 81.9249 Ops/s 78.5617 Ops/s $\color{#35bf28}+4.28\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 16.1035ms 14.7373ms 67.8550 Ops/s 68.2384 Ops/s $\color{#d91a1a}-0.56\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 23.0188ms 21.0687ms 47.4637 Ops/s 46.3551 Ops/s $\color{#35bf28}+2.39\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.2799ms 14.9705ms 66.7979 Ops/s 66.6731 Ops/s $\color{#35bf28}+0.19\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 23.2085ms 20.9429ms 47.7489 Ops/s 46.9766 Ops/s $\color{#35bf28}+1.64\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 17.1575ms 16.1010ms 62.1081 Ops/s 61.0792 Ops/s $\color{#35bf28}+1.68\%$

Copy link

Result of GPU Benchmark Tests

Expand to view detailed results
Name Max Mean Ops
test_simple 0.9038s 0.8181s 1.2223 Ops/s
test_transformed 1.5102s 1.4268s 0.7009 Ops/s
test_serial 2.3917s 2.3031s 0.4342 Ops/s
test_parallel 2.2151s 1.9322s 0.5175 Ops/s
test_step_mdp_speed[True-True-True-True-True] 0.1987ms 40.4863μs 24.6997 KOps/s
test_step_mdp_speed[True-True-True-True-False] 53.6010μs 23.6457μs 42.2910 KOps/s
test_step_mdp_speed[True-True-True-False-True] 48.1910μs 22.5172μs 44.4104 KOps/s
test_step_mdp_speed[True-True-True-False-False] 40.3810μs 13.0209μs 76.7997 KOps/s
test_step_mdp_speed[True-True-False-True-True] 0.1217ms 42.8606μs 23.3314 KOps/s
test_step_mdp_speed[True-True-False-True-False] 55.5810μs 25.6597μs 38.9716 KOps/s
test_step_mdp_speed[True-True-False-False-True] 64.7010μs 24.8824μs 40.1891 KOps/s
test_step_mdp_speed[True-True-False-False-False] 40.3710μs 15.3979μs 64.9439 KOps/s
test_step_mdp_speed[True-False-True-True-True] 82.7610μs 45.5424μs 21.9576 KOps/s
test_step_mdp_speed[True-False-True-True-False] 57.0710μs 28.1257μs 35.5546 KOps/s
test_step_mdp_speed[True-False-True-False-True] 47.5310μs 24.6305μs 40.6001 KOps/s
test_step_mdp_speed[True-False-True-False-False] 68.4210μs 15.0243μs 66.5586 KOps/s
test_step_mdp_speed[True-False-False-True-True] 80.2010μs 47.7040μs 20.9626 KOps/s
test_step_mdp_speed[True-False-False-True-False] 55.7910μs 30.0983μs 33.2244 KOps/s
test_step_mdp_speed[True-False-False-False-True] 73.2610μs 26.6624μs 37.5059 KOps/s
test_step_mdp_speed[True-False-False-False-False] 42.7010μs 17.5769μs 56.8927 KOps/s
test_step_mdp_speed[False-True-True-True-True] 78.3510μs 45.2770μs 22.0862 KOps/s
test_step_mdp_speed[False-True-True-True-False] 64.2410μs 27.8117μs 35.9561 KOps/s
test_step_mdp_speed[False-True-True-False-True] 59.1500μs 28.5308μs 35.0499 KOps/s
test_step_mdp_speed[False-True-True-False-False] 47.8110μs 17.0178μs 58.7620 KOps/s
test_step_mdp_speed[False-True-False-True-True] 82.9410μs 47.4989μs 21.0531 KOps/s
test_step_mdp_speed[False-True-False-True-False] 56.4010μs 30.1071μs 33.2147 KOps/s
test_step_mdp_speed[False-True-False-False-True] 3.2929ms 31.0158μs 32.2417 KOps/s
test_step_mdp_speed[False-True-False-False-False] 48.1900μs 19.4355μs 51.4523 KOps/s
test_step_mdp_speed[False-False-True-True-True] 75.8510μs 49.5251μs 20.1918 KOps/s
test_step_mdp_speed[False-False-True-True-False] 0.1091ms 32.5703μs 30.7028 KOps/s
test_step_mdp_speed[False-False-True-False-True] 68.2810μs 30.5040μs 32.7826 KOps/s
test_step_mdp_speed[False-False-True-False-False] 45.7900μs 19.2380μs 51.9805 KOps/s
test_step_mdp_speed[False-False-False-True-True] 85.4510μs 51.1097μs 19.5658 KOps/s
test_step_mdp_speed[False-False-False-True-False] 72.2610μs 34.6549μs 28.8560 KOps/s
test_step_mdp_speed[False-False-False-False-True] 0.1090ms 31.8676μs 31.3798 KOps/s
test_step_mdp_speed[False-False-False-False-False] 56.6910μs 21.5757μs 46.3484 KOps/s
test_values[generalized_advantage_estimate-True-True] 25.3285ms 24.8941ms 40.1701 Ops/s
test_values[vec_generalized_advantage_estimate-True-True] 0.1188s 3.2748ms 305.3640 Ops/s
test_values[td0_return_estimate-False-False] 0.1074ms 80.7064μs 12.3906 KOps/s
test_values[td1_return_estimate-False-False] 58.8761ms 56.0932ms 17.8275 Ops/s
test_values[vec_td1_return_estimate-False-False] 1.3262ms 1.0818ms 924.3733 Ops/s
test_values[td_lambda_return_estimate-True-False] 88.6516ms 88.2218ms 11.3351 Ops/s
test_values[vec_td_lambda_return_estimate-True-False] 1.2714ms 1.0826ms 923.7295 Ops/s
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.9949ms 24.7704ms 40.3707 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0420ms 0.7555ms 1.3237 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7709ms 0.6727ms 1.4866 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5396ms 1.4894ms 671.4178 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.8072ms 0.6883ms 1.4528 KOps/s
test_dqn_speed[False-None] 7.0078ms 1.5482ms 645.9163 Ops/s
test_dqn_speed[False-backward] 2.2076ms 2.1639ms 462.1213 Ops/s
test_dqn_speed[True-None] 0.6730ms 0.5786ms 1.7282 KOps/s
test_dqn_speed[True-backward] 1.3201ms 1.2609ms 793.0970 Ops/s
test_dqn_speed[reduce-overhead-None] 0.6840ms 0.5965ms 1.6765 KOps/s
test_dqn_speed[reduce-overhead-backward] 1.1956ms 1.0971ms 911.5016 Ops/s
test_ddpg_speed[False-None] 3.2533ms 2.9248ms 341.9067 Ops/s
test_ddpg_speed[False-backward] 4.7989ms 4.3593ms 229.3940 Ops/s
test_ddpg_speed[True-None] 1.5505ms 1.3891ms 719.8673 Ops/s
test_ddpg_speed[True-backward] 2.6707ms 2.6188ms 381.8590 Ops/s
test_ddpg_speed[reduce-overhead-None] 1.4742ms 1.3972ms 715.7132 Ops/s
test_ddpg_speed[reduce-overhead-backward] 2.1723ms 2.0852ms 479.5590 Ops/s
test_sac_speed[False-None] 8.5965ms 8.1069ms 123.3516 Ops/s
test_sac_speed[False-backward] 11.8278ms 11.3328ms 88.2397 Ops/s
test_sac_speed[True-None] 2.0540ms 1.9095ms 523.7041 Ops/s
test_sac_speed[True-backward] 3.9014ms 3.8302ms 261.0843 Ops/s
test_sac_speed[reduce-overhead-None] 20.6400ms 11.8661ms 84.2738 Ops/s
test_sac_speed[reduce-overhead-backward] 1.8925ms 1.8436ms 542.4302 Ops/s
test_redq_speed[False-None] 8.0194ms 7.5461ms 132.5180 Ops/s
test_redq_speed[False-backward] 12.3634ms 11.8074ms 84.6927 Ops/s
test_redq_speed[True-None] 2.4701ms 2.3738ms 421.2727 Ops/s
test_redq_speed[True-backward] 4.1372ms 4.0741ms 245.4533 Ops/s
test_redq_speed[reduce-overhead-None] 2.5404ms 2.3972ms 417.1482 Ops/s
test_redq_speed[reduce-overhead-backward] 4.4856ms 4.0995ms 243.9344 Ops/s
test_redq_deprec_speed[False-None] 9.3764ms 9.0950ms 109.9505 Ops/s
test_redq_deprec_speed[False-backward] 12.5915ms 12.0904ms 82.7102 Ops/s
test_redq_deprec_speed[True-None] 2.8166ms 2.7165ms 368.1240 Ops/s
test_redq_deprec_speed[True-backward] 4.7863ms 4.4045ms 227.0420 Ops/s
test_redq_deprec_speed[reduce-overhead-None] 2.7653ms 2.6966ms 370.8372 Ops/s
test_redq_deprec_speed[reduce-overhead-backward] 4.6686ms 4.3844ms 228.0800 Ops/s
test_td3_speed[False-None] 8.0944ms 8.0364ms 124.4341 Ops/s
test_td3_speed[False-backward] 10.8679ms 10.3857ms 96.2862 Ops/s
test_td3_speed[True-None] 1.7649ms 1.7292ms 578.3153 Ops/s
test_td3_speed[True-backward] 3.3475ms 3.2777ms 305.0966 Ops/s
test_td3_speed[reduce-overhead-None] 51.4240ms 26.3758ms 37.9136 Ops/s
test_td3_speed[reduce-overhead-backward] 1.4530ms 1.3997ms 714.4298 Ops/s
test_cql_speed[False-None] 17.4709ms 16.9517ms 58.9911 Ops/s
test_cql_speed[False-backward] 22.6338ms 22.1327ms 45.1820 Ops/s
test_cql_speed[True-None] 3.4575ms 3.3716ms 296.5949 Ops/s
test_cql_speed[True-backward] 6.2156ms 5.7856ms 172.8423 Ops/s
test_cql_speed[reduce-overhead-None] 20.5226ms 13.0363ms 76.7090 Ops/s
test_cql_speed[reduce-overhead-backward] 2.2105ms 2.0527ms 487.1520 Ops/s
test_a2c_speed[False-None] 3.6216ms 3.2092ms 311.6001 Ops/s
test_a2c_speed[False-backward] 6.9647ms 6.3953ms 156.3654 Ops/s
test_a2c_speed[True-None] 1.5622ms 1.3877ms 720.6138 Ops/s
test_a2c_speed[True-backward] 3.1926ms 3.0982ms 322.7660 Ops/s
test_a2c_speed[reduce-overhead-None] 15.7111ms 8.9286ms 111.9993 Ops/s
test_a2c_speed[reduce-overhead-backward] 1.7652ms 1.6447ms 608.0285 Ops/s
test_ppo_speed[False-None] 3.8469ms 3.7234ms 268.5697 Ops/s
test_ppo_speed[False-backward] 7.5150ms 7.1212ms 140.4250 Ops/s
test_ppo_speed[True-None] 1.5528ms 1.4449ms 692.0929 Ops/s
test_ppo_speed[True-backward] 3.6228ms 3.2605ms 306.6975 Ops/s
test_ppo_speed[reduce-overhead-None] 1.1214ms 0.9936ms 1.0065 KOps/s
test_ppo_speed[reduce-overhead-backward] 1.7433ms 1.5938ms 627.4130 Ops/s
test_reinforce_speed[False-None] 2.3928ms 2.3036ms 434.1122 Ops/s
test_reinforce_speed[False-backward] 3.8556ms 3.4240ms 292.0571 Ops/s
test_reinforce_speed[True-None] 1.3958ms 1.3315ms 751.0252 Ops/s
test_reinforce_speed[True-backward] 3.1654ms 3.1148ms 321.0473 Ops/s
test_reinforce_speed[reduce-overhead-None] 17.9637ms 9.9693ms 100.3081 Ops/s
test_reinforce_speed[reduce-overhead-backward] 1.7726ms 1.6742ms 597.3006 Ops/s
test_iql_speed[False-None] 9.7415ms 9.2890ms 107.6544 Ops/s
test_iql_speed[False-backward] 13.7795ms 13.2989ms 75.1939 Ops/s
test_iql_speed[True-None] 2.4713ms 2.3045ms 433.9391 Ops/s
test_iql_speed[True-backward] 5.4655ms 5.0298ms 198.8137 Ops/s
test_iql_speed[reduce-overhead-None] 0.4763s 12.8720ms 77.6883 Ops/s
test_iql_speed[reduce-overhead-backward] 2.2251ms 2.1347ms 468.4565 Ops/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.8138ms 6.3729ms 156.9134 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.5864ms 0.3361ms 2.9751 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6800ms 0.3169ms 3.1560 KOps/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.3307ms 6.0848ms 164.3448 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.2525ms 0.3040ms 3.2898 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6773ms 0.2685ms 3.7240 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5473ms 1.2695ms 787.7135 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4780ms 1.1978ms 834.8766 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4401ms 6.2607ms 159.7255 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0242ms 0.4576ms 2.1855 KOps/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6943ms 0.4175ms 2.3951 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 13.1930ms 6.2213ms 160.7379 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.1663ms 0.3578ms 2.7950 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5627ms 0.3023ms 3.3075 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.3066ms 6.0361ms 165.6704 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7287ms 0.3231ms 3.0947 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5754ms 0.3117ms 3.2079 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4129ms 6.2896ms 158.9916 Ops/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0190ms 0.4176ms 2.3944 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.5745ms 0.3859ms 2.5913 KOps/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 7.0751ms 5.4927ms 182.0602 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 10.8562ms 2.0937ms 477.6197 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 5.9977ms 1.1735ms 852.1512 Ops/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.4576s 14.5908ms 68.5364 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 6.7536ms 1.9954ms 501.1581 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 8.8529ms 1.2617ms 792.5733 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.0398ms 5.7608ms 173.5869 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.9662ms 2.2125ms 451.9702 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 8.3510ms 1.3726ms 728.5208 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 14.3202ms 13.5190ms 73.9702 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.0035ms 16.8876ms 59.2149 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 18.9447ms 18.3251ms 54.5700 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 18.9718ms 17.3096ms 57.7714 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 18.9809ms 18.2818ms 54.6992 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.2219ms 18.9126ms 52.8747 Ops/s

@vmoens vmoens merged commit ebd707e into gh/vmoens/91/base Feb 20, 2025
62 of 74 checks passed
vmoens added a commit that referenced this pull request Feb 20, 2025
ghstack-source-id: 16e563bc810586d31772b58f9923439b632985c7
Pull Request resolved: #2797
@vmoens vmoens deleted the gh/vmoens/91/head branch February 20, 2025 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants