Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] ParallelEnv.consolidate #2792

Merged
merged 5 commits into from
Feb 20, 2025
Merged

Conversation

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 18, 2025
ghstack-source-id: b120c3d3264efa27f2b81751a26af1faaa3f089b
Pull Request resolved: #2792
Copy link

pytorch-bot bot commented Feb 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2792

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures, 1 Unrelated Failure

As of commit a79bcd8 with merge base 76aa9bc (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 18, 2025
Copy link

github-actions bot commented Feb 18, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.6354s 0.5286s 1.8918 Ops/s 1.9272 Ops/s $\color{#d91a1a}-1.83\%$
test_transformed 1.1350s 1.0231s 0.9775 Ops/s 0.9850 Ops/s $\color{#d91a1a}-0.76\%$
test_serial 1.6443s 1.5301s 0.6536 Ops/s 0.6544 Ops/s $\color{#d91a1a}-0.13\%$
test_parallel 1.4692s 1.3176s 0.7589 Ops/s 0.7607 Ops/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[True-True-True-True-True] 0.2019ms 30.3501μs 32.9488 KOps/s 33.2237 KOps/s $\color{#d91a1a}-0.83\%$
test_step_mdp_speed[True-True-True-True-False] 68.9570μs 17.4839μs 57.1953 KOps/s 56.1068 KOps/s $\color{#35bf28}+1.94\%$
test_step_mdp_speed[True-True-True-False-True] 76.9840μs 17.0865μs 58.5257 KOps/s 57.9672 KOps/s $\color{#35bf28}+0.96\%$
test_step_mdp_speed[True-True-True-False-False] 38.1420μs 9.8111μs 101.9252 KOps/s 99.5016 KOps/s $\color{#35bf28}+2.44\%$
test_step_mdp_speed[True-True-False-True-True] 0.1039ms 31.9685μs 31.2808 KOps/s 30.8749 KOps/s $\color{#35bf28}+1.31\%$
test_step_mdp_speed[True-True-False-True-False] 72.8770μs 19.1119μs 52.3234 KOps/s 50.6544 KOps/s $\color{#35bf28}+3.29\%$
test_step_mdp_speed[True-True-False-False-True] 81.1700μs 18.8709μs 52.9916 KOps/s 52.2263 KOps/s $\color{#35bf28}+1.47\%$
test_step_mdp_speed[True-True-False-False-False] 39.4740μs 11.7332μs 85.2281 KOps/s 83.8374 KOps/s $\color{#35bf28}+1.66\%$
test_step_mdp_speed[True-False-True-True-True] 0.1043ms 33.5647μs 29.7932 KOps/s 29.1637 KOps/s $\color{#35bf28}+2.16\%$
test_step_mdp_speed[True-False-True-True-False] 81.7230μs 21.2600μs 47.0366 KOps/s 46.0237 KOps/s $\color{#35bf28}+2.20\%$
test_step_mdp_speed[True-False-True-False-True] 49.5840μs 18.7912μs 53.2163 KOps/s 52.3207 KOps/s $\color{#35bf28}+1.71\%$
test_step_mdp_speed[True-False-True-False-False] 59.5620μs 11.6126μs 86.1137 KOps/s 83.3731 KOps/s $\color{#35bf28}+3.29\%$
test_step_mdp_speed[True-False-False-True-True] 83.4360μs 35.5580μs 28.1231 KOps/s 27.6445 KOps/s $\color{#35bf28}+1.73\%$
test_step_mdp_speed[True-False-False-True-False] 81.2620μs 22.8793μs 43.7077 KOps/s 42.2574 KOps/s $\color{#35bf28}+3.43\%$
test_step_mdp_speed[True-False-False-False-True] 0.1234ms 21.3001μs 46.9481 KOps/s 47.9092 KOps/s $\color{#d91a1a}-2.01\%$
test_step_mdp_speed[True-False-False-False-False] 44.2030μs 13.4119μs 74.5609 KOps/s 72.7263 KOps/s $\color{#35bf28}+2.52\%$
test_step_mdp_speed[False-True-True-True-True] 87.9340μs 34.0272μs 29.3882 KOps/s 28.7408 KOps/s $\color{#35bf28}+2.25\%$
test_step_mdp_speed[False-True-True-True-False] 66.3540μs 21.2415μs 47.0776 KOps/s 46.4687 KOps/s $\color{#35bf28}+1.31\%$
test_step_mdp_speed[False-True-True-False-True] 76.1530μs 22.3633μs 44.7162 KOps/s 45.7765 KOps/s $\color{#d91a1a}-2.32\%$
test_step_mdp_speed[False-True-True-False-False] 71.5020μs 12.9643μs 77.1347 KOps/s 74.8289 KOps/s $\color{#35bf28}+3.08\%$
test_step_mdp_speed[False-True-False-True-True] 84.9210μs 35.5402μs 28.1371 KOps/s 27.6164 KOps/s $\color{#35bf28}+1.89\%$
test_step_mdp_speed[False-True-False-True-False] 50.5250μs 22.9279μs 43.6150 KOps/s 42.1883 KOps/s $\color{#35bf28}+3.38\%$
test_step_mdp_speed[False-True-False-False-True] 2.9703ms 23.2477μs 43.0151 KOps/s 42.4584 KOps/s $\color{#35bf28}+1.31\%$
test_step_mdp_speed[False-True-False-False-False] 46.9380μs 14.6900μs 68.0736 KOps/s 66.1534 KOps/s $\color{#35bf28}+2.90\%$
test_step_mdp_speed[False-False-True-True-True] 94.6470μs 37.3898μs 26.7453 KOps/s 26.5052 KOps/s $\color{#35bf28}+0.91\%$
test_step_mdp_speed[False-False-True-True-False] 76.5430μs 24.7913μs 40.3367 KOps/s 39.4410 KOps/s $\color{#35bf28}+2.27\%$
test_step_mdp_speed[False-False-True-False-True] 62.6170μs 23.5674μs 42.4314 KOps/s 42.6623 KOps/s $\color{#d91a1a}-0.54\%$
test_step_mdp_speed[False-False-True-False-False] 65.3410μs 14.6573μs 68.2253 KOps/s 66.3699 KOps/s $\color{#35bf28}+2.80\%$
test_step_mdp_speed[False-False-False-True-True] 87.9040μs 38.8996μs 25.7072 KOps/s 25.5248 KOps/s $\color{#35bf28}+0.71\%$
test_step_mdp_speed[False-False-False-True-False] 72.1750μs 26.0640μs 38.3671 KOps/s 37.0592 KOps/s $\color{#35bf28}+3.53\%$
test_step_mdp_speed[False-False-False-False-True] 90.7500μs 25.4921μs 39.2278 KOps/s 39.7143 KOps/s $\color{#d91a1a}-1.22\%$
test_step_mdp_speed[False-False-False-False-False] 43.8930μs 16.4273μs 60.8744 KOps/s 58.9349 KOps/s $\color{#35bf28}+3.29\%$
test_values[generalized_advantage_estimate-True-True] 10.1374ms 9.8305ms 101.7244 Ops/s 102.0236 Ops/s $\color{#d91a1a}-0.29\%$
test_values[vec_generalized_advantage_estimate-True-True] 29.2742ms 24.4744ms 40.8590 Ops/s 37.9996 Ops/s $\textbf{\color{#35bf28}+7.53\%}$
test_values[td0_return_estimate-False-False] 0.2465ms 0.1785ms 5.6016 KOps/s 5.5545 KOps/s $\color{#35bf28}+0.85\%$
test_values[td1_return_estimate-False-False] 28.6286ms 24.3780ms 41.0206 Ops/s 41.9022 Ops/s $\color{#d91a1a}-2.10\%$
test_values[vec_td1_return_estimate-False-False] 26.3891ms 24.6217ms 40.6146 Ops/s 37.6239 Ops/s $\textbf{\color{#35bf28}+7.95\%}$
test_values[td_lambda_return_estimate-True-False] 39.1079ms 35.1422ms 28.4558 Ops/s 29.1557 Ops/s $\color{#d91a1a}-2.40\%$
test_values[vec_td_lambda_return_estimate-True-False] 26.9538ms 24.8751ms 40.2008 Ops/s 37.6711 Ops/s $\textbf{\color{#35bf28}+6.72\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.6639ms 8.5510ms 116.9450 Ops/s 119.9185 Ops/s $\color{#d91a1a}-2.48\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.7984ms 1.9237ms 519.8387 Ops/s 501.4973 Ops/s $\color{#35bf28}+3.66\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.6030ms 0.3819ms 2.6185 KOps/s 2.7058 KOps/s $\color{#d91a1a}-3.22\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 48.5062ms 45.7028ms 21.8805 Ops/s 22.0933 Ops/s $\color{#d91a1a}-0.96\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.5163ms 3.4484ms 289.9906 Ops/s 290.4891 Ops/s $\color{#d91a1a}-0.17\%$
test_dqn_speed[False-None] 1.8775ms 1.4351ms 696.7968 Ops/s 694.2283 Ops/s $\color{#35bf28}+0.37\%$
test_dqn_speed[False-backward] 2.0053ms 1.9469ms 513.6345 Ops/s 510.7996 Ops/s $\color{#35bf28}+0.55\%$
test_dqn_speed[True-None] 0.7378ms 0.5052ms 1.9794 KOps/s 1.9749 KOps/s $\color{#35bf28}+0.23\%$
test_dqn_speed[True-backward] 1.0421ms 0.9745ms 1.0262 KOps/s 1.0403 KOps/s $\color{#d91a1a}-1.36\%$
test_dqn_speed[reduce-overhead-None] 0.7038ms 0.4979ms 2.0086 KOps/s 1.9918 KOps/s $\color{#35bf28}+0.84\%$
test_dqn_speed[reduce-overhead-backward] 1.3312ms 1.0199ms 980.5340 Ops/s 1.0437 KOps/s $\textbf{\color{#d91a1a}-6.06\%}$
test_ddpg_speed[False-None] 4.4724ms 2.9279ms 341.5400 Ops/s 337.9396 Ops/s $\color{#35bf28}+1.07\%$
test_ddpg_speed[False-backward] 5.3228ms 4.1223ms 242.5859 Ops/s 241.7842 Ops/s $\color{#35bf28}+0.33\%$
test_ddpg_speed[True-None] 3.1695ms 1.2959ms 771.6515 Ops/s 789.5567 Ops/s $\color{#d91a1a}-2.27\%$
test_ddpg_speed[True-backward] 2.2722ms 2.2054ms 453.4366 Ops/s 458.4047 Ops/s $\color{#d91a1a}-1.08\%$
test_ddpg_speed[reduce-overhead-None] 0.2568s 1.6101ms 621.0615 Ops/s 779.1350 Ops/s $\textbf{\color{#d91a1a}-20.29\%}$
test_ddpg_speed[reduce-overhead-backward] 2.2172ms 2.1448ms 466.2389 Ops/s 449.2892 Ops/s $\color{#35bf28}+3.77\%$
test_sac_speed[False-None] 10.0159ms 8.1352ms 122.9223 Ops/s 115.6898 Ops/s $\textbf{\color{#35bf28}+6.25\%}$
test_sac_speed[False-backward] 12.3398ms 10.8727ms 91.9734 Ops/s 86.8379 Ops/s $\textbf{\color{#35bf28}+5.91\%}$
test_sac_speed[True-None] 2.4025ms 2.1404ms 467.2116 Ops/s 452.3943 Ops/s $\color{#35bf28}+3.28\%$
test_sac_speed[True-backward] 3.9280ms 3.8457ms 260.0308 Ops/s 252.5071 Ops/s $\color{#35bf28}+2.98\%$
test_sac_speed[reduce-overhead-None] 3.2804ms 2.1635ms 462.2045 Ops/s 462.7916 Ops/s $\color{#d91a1a}-0.13\%$
test_sac_speed[reduce-overhead-backward] 4.2016ms 3.8736ms 258.1591 Ops/s 259.9464 Ops/s $\color{#d91a1a}-0.69\%$
test_redq_speed[False-None] 15.5136ms 13.1746ms 75.9035 Ops/s 74.4653 Ops/s $\color{#35bf28}+1.93\%$
test_redq_speed[False-backward] 25.5559ms 23.3587ms 42.8105 Ops/s 43.1438 Ops/s $\color{#d91a1a}-0.77\%$
test_redq_speed[True-None] 6.1466ms 5.3592ms 186.5935 Ops/s 181.4381 Ops/s $\color{#35bf28}+2.84\%$
test_redq_speed[True-backward] 14.2698ms 12.8696ms 77.7024 Ops/s 76.5711 Ops/s $\color{#35bf28}+1.48\%$
test_redq_speed[reduce-overhead-None] 6.8029ms 5.1760ms 193.1991 Ops/s 184.9291 Ops/s $\color{#35bf28}+4.47\%$
test_redq_speed[reduce-overhead-backward] 13.6930ms 12.8438ms 77.8585 Ops/s 75.9324 Ops/s $\color{#35bf28}+2.54\%$
test_redq_deprec_speed[False-None] 14.6485ms 13.0575ms 76.5844 Ops/s 74.9463 Ops/s $\color{#35bf28}+2.19\%$
test_redq_deprec_speed[False-backward] 20.6390ms 18.8968ms 52.9189 Ops/s 52.3034 Ops/s $\color{#35bf28}+1.18\%$
test_redq_deprec_speed[True-None] 4.7664ms 3.9218ms 254.9875 Ops/s 248.8948 Ops/s $\color{#35bf28}+2.45\%$
test_redq_deprec_speed[True-backward] 9.3503ms 8.4868ms 117.8296 Ops/s 116.7059 Ops/s $\color{#35bf28}+0.96\%$
test_redq_deprec_speed[reduce-overhead-None] 4.4217ms 3.9221ms 254.9679 Ops/s 249.9740 Ops/s $\color{#35bf28}+2.00\%$
test_redq_deprec_speed[reduce-overhead-backward] 9.7875ms 8.9577ms 111.6356 Ops/s 113.0048 Ops/s $\color{#d91a1a}-1.21\%$
test_td3_speed[False-None] 8.6803ms 8.1880ms 122.1304 Ops/s 119.6733 Ops/s $\color{#35bf28}+2.05\%$
test_td3_speed[False-backward] 11.4383ms 10.6107ms 94.2449 Ops/s 91.8945 Ops/s $\color{#35bf28}+2.56\%$
test_td3_speed[True-None] 2.1376ms 1.8448ms 542.0758 Ops/s 527.7951 Ops/s $\color{#35bf28}+2.71\%$
test_td3_speed[True-backward] 3.5809ms 3.4591ms 289.0959 Ops/s 286.5829 Ops/s $\color{#35bf28}+0.88\%$
test_td3_speed[reduce-overhead-None] 2.0791ms 1.8400ms 543.4845 Ops/s 531.9622 Ops/s $\color{#35bf28}+2.17\%$
test_td3_speed[reduce-overhead-backward] 3.6170ms 3.4776ms 287.5580 Ops/s 280.1961 Ops/s $\color{#35bf28}+2.63\%$
test_cql_speed[False-None] 40.3446ms 37.2271ms 26.8622 Ops/s 26.7861 Ops/s $\color{#35bf28}+0.28\%$
test_cql_speed[False-backward] 50.4079ms 47.5295ms 21.0396 Ops/s 20.7376 Ops/s $\color{#35bf28}+1.46\%$
test_cql_speed[True-None] 17.8783ms 16.7981ms 59.5307 Ops/s 57.4996 Ops/s $\color{#35bf28}+3.53\%$
test_cql_speed[True-backward] 29.3876ms 24.0000ms 41.6666 Ops/s 40.8956 Ops/s $\color{#35bf28}+1.89\%$
test_cql_speed[reduce-overhead-None] 18.2914ms 16.2810ms 61.4214 Ops/s 59.5062 Ops/s $\color{#35bf28}+3.22\%$
test_cql_speed[reduce-overhead-backward] 27.7242ms 23.5343ms 42.4912 Ops/s 42.5661 Ops/s $\color{#d91a1a}-0.18\%$
test_a2c_speed[False-None] 8.3316ms 7.2391ms 138.1379 Ops/s 136.9124 Ops/s $\color{#35bf28}+0.90\%$
test_a2c_speed[False-backward] 14.7835ms 14.4422ms 69.2417 Ops/s 68.3810 Ops/s $\color{#35bf28}+1.26\%$
test_a2c_speed[True-None] 4.5708ms 3.7684ms 265.3653 Ops/s 266.6352 Ops/s $\color{#d91a1a}-0.48\%$
test_a2c_speed[True-backward] 11.1362ms 10.3599ms 96.5259 Ops/s 96.7416 Ops/s $\color{#d91a1a}-0.22\%$
test_a2c_speed[reduce-overhead-None] 4.2969ms 3.7591ms 266.0193 Ops/s 265.3445 Ops/s $\color{#35bf28}+0.25\%$
test_a2c_speed[reduce-overhead-backward] 10.5466ms 10.2524ms 97.5380 Ops/s 96.7048 Ops/s $\color{#35bf28}+0.86\%$
test_ppo_speed[False-None] 9.1274ms 7.6033ms 131.5214 Ops/s 131.7328 Ops/s $\color{#d91a1a}-0.16\%$
test_ppo_speed[False-backward] 16.4906ms 15.1500ms 66.0066 Ops/s 66.7932 Ops/s $\color{#d91a1a}-1.18\%$
test_ppo_speed[True-None] 5.0461ms 4.1508ms 240.9159 Ops/s 236.5848 Ops/s $\color{#35bf28}+1.83\%$
test_ppo_speed[True-backward] 10.5196ms 10.1061ms 98.9504 Ops/s 99.2626 Ops/s $\color{#d91a1a}-0.31\%$
test_ppo_speed[reduce-overhead-None] 4.9063ms 4.1344ms 241.8706 Ops/s 241.3513 Ops/s $\color{#35bf28}+0.22\%$
test_ppo_speed[reduce-overhead-backward] 11.1313ms 10.2713ms 97.3584 Ops/s 96.2094 Ops/s $\color{#35bf28}+1.19\%$
test_reinforce_speed[False-None] 8.0853ms 6.6849ms 149.5907 Ops/s 151.0145 Ops/s $\color{#d91a1a}-0.94\%$
test_reinforce_speed[False-backward] 10.5363ms 10.0657ms 99.3470 Ops/s 99.8408 Ops/s $\color{#d91a1a}-0.49\%$
test_reinforce_speed[True-None] 3.5194ms 3.1342ms 319.0630 Ops/s 322.4546 Ops/s $\color{#d91a1a}-1.05\%$
test_reinforce_speed[True-backward] 10.9770ms 9.8481ms 101.5423 Ops/s 109.7276 Ops/s $\textbf{\color{#d91a1a}-7.46\%}$
test_reinforce_speed[reduce-overhead-None] 3.6358ms 3.1417ms 318.2988 Ops/s 322.4658 Ops/s $\color{#d91a1a}-1.29\%$
test_reinforce_speed[reduce-overhead-backward] 9.9376ms 9.3109ms 107.4012 Ops/s 107.4648 Ops/s $\color{#d91a1a}-0.06\%$
test_iql_speed[False-None] 35.4159ms 33.3487ms 29.9862 Ops/s 30.2730 Ops/s $\color{#d91a1a}-0.95\%$
test_iql_speed[False-backward] 61.9782ms 47.5507ms 21.0302 Ops/s 21.6784 Ops/s $\color{#d91a1a}-2.99\%$
test_iql_speed[True-None] 12.9874ms 11.9227ms 83.8739 Ops/s 87.9487 Ops/s $\color{#d91a1a}-4.63\%$
test_iql_speed[True-backward] 24.9248ms 23.7079ms 42.1801 Ops/s 44.7468 Ops/s $\textbf{\color{#d91a1a}-5.74\%}$
test_iql_speed[reduce-overhead-None] 13.1162ms 12.0619ms 82.9057 Ops/s 84.4400 Ops/s $\color{#d91a1a}-1.82\%$
test_iql_speed[reduce-overhead-backward] 25.7636ms 23.6801ms 42.2295 Ops/s 42.2061 Ops/s $\color{#35bf28}+0.06\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 8.0997ms 5.2805ms 189.3769 Ops/s 198.6571 Ops/s $\color{#d91a1a}-4.67\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.5906ms 0.5783ms 1.7292 KOps/s 1.7649 KOps/s $\color{#d91a1a}-2.02\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8775ms 0.5458ms 1.8322 KOps/s 1.8757 KOps/s $\color{#d91a1a}-2.32\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 7.8813ms 5.1244ms 195.1429 Ops/s 208.4206 Ops/s $\textbf{\color{#d91a1a}-6.37\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0257ms 0.5527ms 1.8092 KOps/s 1.7803 KOps/s $\color{#35bf28}+1.62\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8260ms 0.5246ms 1.9063 KOps/s 1.8987 KOps/s $\color{#35bf28}+0.40\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.1440ms 1.7407ms 574.4682 Ops/s 561.7000 Ops/s $\color{#35bf28}+2.27\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.4981ms 1.6648ms 600.6582 Ops/s 587.3734 Ops/s $\color{#35bf28}+2.26\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.9336ms 5.1125ms 195.5978 Ops/s 201.3979 Ops/s $\color{#d91a1a}-2.88\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0343ms 0.7009ms 1.4268 KOps/s 1.4086 KOps/s $\color{#35bf28}+1.29\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.0013ms 0.6750ms 1.4814 KOps/s 1.4693 KOps/s $\color{#35bf28}+0.82\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.6509ms 4.8678ms 205.4318 Ops/s 205.3150 Ops/s $\color{#35bf28}+0.06\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7242s 1.5594ms 641.2731 Ops/s 1.7537 KOps/s $\textbf{\color{#d91a1a}-63.43\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7639ms 0.5337ms 1.8736 KOps/s 1.8307 KOps/s $\color{#35bf28}+2.34\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 7.7873ms 5.0663ms 197.3824 Ops/s 207.3030 Ops/s $\color{#d91a1a}-4.79\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3.1666ms 0.5733ms 1.7444 KOps/s 1.7946 KOps/s $\color{#d91a1a}-2.80\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7836ms 0.5238ms 1.9090 KOps/s 1.9147 KOps/s $\color{#d91a1a}-0.30\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.6972ms 5.1730ms 193.3131 Ops/s 201.8715 Ops/s $\color{#d91a1a}-4.24\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 3.0442ms 0.7034ms 1.4218 KOps/s 1.3965 KOps/s $\color{#35bf28}+1.81\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.0371ms 0.6833ms 1.4635 KOps/s 1.4717 KOps/s $\color{#d91a1a}-0.56\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 5.7585ms 4.3330ms 230.7847 Ops/s 241.3572 Ops/s $\color{#d91a1a}-4.38\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.7654ms 2.5367ms 394.2106 Ops/s 383.9686 Ops/s $\color{#35bf28}+2.67\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 5.4863ms 1.5146ms 660.2507 Ops/s 732.0999 Ops/s $\textbf{\color{#d91a1a}-9.81\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.4889s 14.1510ms 70.6663 Ops/s 243.4810 Ops/s $\textbf{\color{#d91a1a}-70.98\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.7769ms 2.5177ms 397.1923 Ops/s 406.6157 Ops/s $\color{#d91a1a}-2.32\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.8635ms 1.3783ms 725.5253 Ops/s 666.5885 Ops/s $\textbf{\color{#35bf28}+8.84\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.5559ms 4.5577ms 219.4080 Ops/s 30.0571 Ops/s $\textbf{\color{#35bf28}+629.97\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 6.7770ms 2.7297ms 366.3467 Ops/s 376.7379 Ops/s $\color{#d91a1a}-2.76\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 6.9137ms 1.6496ms 606.2083 Ops/s 619.2544 Ops/s $\color{#d91a1a}-2.11\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 21.7152ms 13.1126ms 76.2626 Ops/s 83.7343 Ops/s $\textbf{\color{#d91a1a}-8.92\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 17.8559ms 15.3631ms 65.0912 Ops/s 67.7367 Ops/s $\color{#d91a1a}-3.91\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 24.3726ms 21.5764ms 46.3470 Ops/s 47.6227 Ops/s $\color{#d91a1a}-2.68\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 16.5256ms 15.3060ms 65.3338 Ops/s 66.3560 Ops/s $\color{#d91a1a}-1.54\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 23.5641ms 21.3637ms 46.8083 Ops/s 47.6247 Ops/s $\color{#d91a1a}-1.71\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 18.4516ms 16.6512ms 60.0556 Ops/s 60.8933 Ops/s $\color{#d91a1a}-1.38\%$

Copy link

github-actions bot commented Feb 18, 2025

Result of GPU Benchmark Tests

Expand to view detailed results
Name Max Mean Ops
test_simple 0.9179s 0.8235s 1.2144 Ops/s
test_transformed 1.4896s 1.4027s 0.7129 Ops/s
test_serial 2.2917s 2.2770s 0.4392 Ops/s
test_parallel 1.8858s 1.8552s 0.5390 Ops/s
test_step_mdp_speed[True-True-True-True-True] 80.6710μs 39.9709μs 25.0182 KOps/s
test_step_mdp_speed[True-True-True-True-False] 67.1220μs 23.4636μs 42.6192 KOps/s
test_step_mdp_speed[True-True-True-False-True] 63.3710μs 21.9011μs 45.6598 KOps/s
test_step_mdp_speed[True-True-True-False-False] 41.5310μs 13.0056μs 76.8898 KOps/s
test_step_mdp_speed[True-True-False-True-True] 69.4910μs 42.5136μs 23.5219 KOps/s
test_step_mdp_speed[True-True-False-True-False] 61.4320μs 25.9539μs 38.5298 KOps/s
test_step_mdp_speed[True-True-False-False-True] 54.6910μs 24.7375μs 40.4244 KOps/s
test_step_mdp_speed[True-True-False-False-False] 44.3710μs 15.3824μs 65.0096 KOps/s
test_step_mdp_speed[True-False-True-True-True] 78.7120μs 45.5646μs 21.9469 KOps/s
test_step_mdp_speed[True-False-True-True-False] 57.7310μs 28.4845μs 35.1068 KOps/s
test_step_mdp_speed[True-False-True-False-True] 54.8310μs 24.4904μs 40.8324 KOps/s
test_step_mdp_speed[True-False-True-False-False] 50.3710μs 15.4298μs 64.8095 KOps/s
test_step_mdp_speed[True-False-False-True-True] 87.5410μs 47.1177μs 21.2234 KOps/s
test_step_mdp_speed[True-False-False-True-False] 88.0310μs 30.1323μs 33.1869 KOps/s
test_step_mdp_speed[True-False-False-False-True] 55.4010μs 26.5323μs 37.6899 KOps/s
test_step_mdp_speed[True-False-False-False-False] 48.9710μs 17.6604μs 56.6238 KOps/s
test_step_mdp_speed[False-True-True-True-True] 79.7110μs 44.6154μs 22.4138 KOps/s
test_step_mdp_speed[False-True-True-True-False] 55.7110μs 28.1833μs 35.4820 KOps/s
test_step_mdp_speed[False-True-True-False-True] 2.5694ms 28.7152μs 34.8248 KOps/s
test_step_mdp_speed[False-True-True-False-False] 49.2810μs 17.2482μs 57.9770 KOps/s
test_step_mdp_speed[False-True-False-True-True] 77.3220μs 46.8579μs 21.3411 KOps/s
test_step_mdp_speed[False-True-False-True-False] 72.9810μs 30.5925μs 32.6877 KOps/s
test_step_mdp_speed[False-True-False-False-True] 59.7410μs 30.8312μs 32.4346 KOps/s
test_step_mdp_speed[False-True-False-False-False] 49.0810μs 19.4762μs 51.3447 KOps/s
test_step_mdp_speed[False-False-True-True-True] 95.0720μs 49.7059μs 20.1183 KOps/s
test_step_mdp_speed[False-False-True-True-False] 62.0610μs 33.2760μs 30.0517 KOps/s
test_step_mdp_speed[False-False-True-False-True] 63.8520μs 30.7243μs 32.5476 KOps/s
test_step_mdp_speed[False-False-True-False-False] 48.6310μs 19.6844μs 50.8018 KOps/s
test_step_mdp_speed[False-False-False-True-True] 83.1220μs 51.7946μs 19.3070 KOps/s
test_step_mdp_speed[False-False-False-True-False] 63.1910μs 35.2919μs 28.3351 KOps/s
test_step_mdp_speed[False-False-False-False-True] 59.2810μs 32.6646μs 30.6142 KOps/s
test_step_mdp_speed[False-False-False-False-False] 50.6510μs 21.6988μs 46.0855 KOps/s
test_values[generalized_advantage_estimate-True-True] 25.6014ms 25.0669ms 39.8933 Ops/s
test_values[vec_generalized_advantage_estimate-True-True] 0.1136s 3.1726ms 315.2035 Ops/s
test_values[td0_return_estimate-False-False] 0.1076ms 80.1803μs 12.4719 KOps/s
test_values[td1_return_estimate-False-False] 60.2584ms 56.3491ms 17.7465 Ops/s
test_values[vec_td1_return_estimate-False-False] 1.4008ms 1.0920ms 915.7515 Ops/s
test_values[td_lambda_return_estimate-True-False] 94.7301ms 89.7323ms 11.1443 Ops/s
test_values[vec_td_lambda_return_estimate-True-False] 1.3648ms 1.0904ms 917.0719 Ops/s
test_gae_speed[generalized_advantage_estimate-False-1-512] 25.2209ms 24.9022ms 40.1570 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0473ms 0.7606ms 1.3148 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7641ms 0.6701ms 1.4924 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5230ms 1.4876ms 672.2453 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.8769ms 0.7118ms 1.4050 KOps/s
test_dqn_speed[False-None] 7.0902ms 1.5127ms 661.0490 Ops/s
test_dqn_speed[False-backward] 2.1733ms 2.1243ms 470.7327 Ops/s
test_dqn_speed[True-None] 0.1547s 0.6563ms 1.5238 KOps/s
test_dqn_speed[True-backward] 1.2912ms 1.2335ms 810.6965 Ops/s
test_dqn_speed[reduce-overhead-None] 0.6474ms 0.5859ms 1.7068 KOps/s
test_dqn_speed[reduce-overhead-backward] 1.1413ms 1.0772ms 928.3303 Ops/s
test_ddpg_speed[False-None] 3.3053ms 2.8710ms 348.3059 Ops/s
test_ddpg_speed[False-backward] 4.3218ms 4.2242ms 236.7300 Ops/s
test_ddpg_speed[True-None] 1.4430ms 1.3632ms 733.5833 Ops/s
test_ddpg_speed[True-backward] 2.6270ms 2.5835ms 387.0652 Ops/s
test_ddpg_speed[reduce-overhead-None] 1.4435ms 1.3694ms 730.2310 Ops/s
test_ddpg_speed[reduce-overhead-backward] 2.0779ms 2.0504ms 487.7031 Ops/s
test_sac_speed[False-None] 8.4196ms 8.0114ms 124.8219 Ops/s
test_sac_speed[False-backward] 11.7087ms 11.2303ms 89.0449 Ops/s
test_sac_speed[True-None] 2.1904ms 1.9050ms 524.9417 Ops/s
test_sac_speed[True-backward] 3.8115ms 3.7680ms 265.3936 Ops/s
test_sac_speed[reduce-overhead-None] 20.6977ms 11.8862ms 84.1308 Ops/s
test_sac_speed[reduce-overhead-backward] 1.8611ms 1.8255ms 547.8098 Ops/s
test_redq_speed[False-None] 8.2686ms 7.5077ms 133.1968 Ops/s
test_redq_speed[False-backward] 12.3039ms 11.6941ms 85.5133 Ops/s
test_redq_speed[True-None] 2.5716ms 2.3562ms 424.4048 Ops/s
test_redq_speed[True-backward] 4.2798ms 4.2241ms 236.7375 Ops/s
test_redq_speed[reduce-overhead-None] 2.7342ms 2.3880ms 418.7538 Ops/s
test_redq_speed[reduce-overhead-backward] 4.7071ms 4.2549ms 235.0257 Ops/s
test_redq_deprec_speed[False-None] 9.5097ms 9.0342ms 110.6904 Ops/s
test_redq_deprec_speed[False-backward] 12.7829ms 12.2688ms 81.5077 Ops/s
test_redq_deprec_speed[True-None] 2.8687ms 2.6829ms 372.7351 Ops/s
test_redq_deprec_speed[True-backward] 4.5817ms 4.5064ms 221.9064 Ops/s
test_redq_deprec_speed[reduce-overhead-None] 2.8725ms 2.6630ms 375.5195 Ops/s
test_redq_deprec_speed[reduce-overhead-backward] 4.5693ms 4.5023ms 222.1071 Ops/s
test_td3_speed[False-None] 8.2025ms 7.9493ms 125.7967 Ops/s
test_td3_speed[False-backward] 11.1540ms 10.5910ms 94.4194 Ops/s
test_td3_speed[True-None] 1.8864ms 1.7240ms 580.0431 Ops/s
test_td3_speed[True-backward] 3.4684ms 3.4123ms 293.0605 Ops/s
test_td3_speed[reduce-overhead-None] 52.4346ms 26.7531ms 37.3788 Ops/s
test_td3_speed[reduce-overhead-backward] 1.6086ms 1.5594ms 641.2609 Ops/s
test_cql_speed[False-None] 17.1326ms 16.7231ms 59.7974 Ops/s
test_cql_speed[False-backward] 22.9790ms 22.3618ms 44.7192 Ops/s
test_cql_speed[True-None] 3.4398ms 3.2967ms 303.3368 Ops/s
test_cql_speed[True-backward] 5.8610ms 5.5865ms 179.0016 Ops/s
test_cql_speed[reduce-overhead-None] 21.1959ms 13.2517ms 75.4618 Ops/s
test_cql_speed[reduce-overhead-backward] 2.1881ms 2.0535ms 486.9657 Ops/s
test_a2c_speed[False-None] 3.3122ms 3.1837ms 314.1014 Ops/s
test_a2c_speed[False-backward] 6.8923ms 6.3468ms 157.5586 Ops/s
test_a2c_speed[True-None] 1.4364ms 1.3638ms 733.2365 Ops/s
test_a2c_speed[True-backward] 3.0833ms 3.0426ms 328.6635 Ops/s
test_a2c_speed[reduce-overhead-None] 15.8961ms 9.0824ms 110.1036 Ops/s
test_a2c_speed[reduce-overhead-backward] 1.7538ms 1.6272ms 614.5479 Ops/s
test_ppo_speed[False-None] 4.1987ms 3.6980ms 270.4150 Ops/s
test_ppo_speed[False-backward] 7.4666ms 7.0717ms 141.4077 Ops/s
test_ppo_speed[True-None] 1.5312ms 1.4269ms 700.7996 Ops/s
test_ppo_speed[True-backward] 3.2785ms 3.2135ms 311.1909 Ops/s
test_ppo_speed[reduce-overhead-None] 1.1498ms 0.9948ms 1.0052 KOps/s
test_ppo_speed[reduce-overhead-backward] 1.7213ms 1.5803ms 632.8054 Ops/s
test_reinforce_speed[False-None] 2.5152ms 2.2737ms 439.8136 Ops/s
test_reinforce_speed[False-backward] 3.9000ms 3.4060ms 293.5986 Ops/s
test_reinforce_speed[True-None] 1.3764ms 1.3141ms 761.0059 Ops/s
test_reinforce_speed[True-backward] 3.1168ms 3.0741ms 325.2961 Ops/s
test_reinforce_speed[reduce-overhead-None] 18.3401ms 10.1246ms 98.7689 Ops/s
test_reinforce_speed[reduce-overhead-backward] 1.7584ms 1.6678ms 599.5784 Ops/s
test_iql_speed[False-None] 9.7174ms 9.2047ms 108.6405 Ops/s
test_iql_speed[False-backward] 13.6882ms 13.1856ms 75.8401 Ops/s
test_iql_speed[True-None] 2.4939ms 2.2609ms 442.3108 Ops/s
test_iql_speed[True-backward] 4.9729ms 4.9339ms 202.6802 Ops/s
test_iql_speed[reduce-overhead-None] 0.4823s 12.8440ms 77.8573 Ops/s
test_iql_speed[reduce-overhead-backward] 2.2061ms 2.1099ms 473.9470 Ops/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.8635ms 6.2599ms 159.7457 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.4941ms 0.2652ms 3.7712 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4606ms 0.2488ms 4.0188 KOps/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.2226ms 5.9681ms 167.5581 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.8893ms 0.3020ms 3.3109 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5723ms 0.3086ms 3.2401 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6523ms 1.3999ms 714.3587 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5313ms 1.3059ms 765.7314 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2871ms 6.1602ms 162.3311 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2282ms 0.4421ms 2.2621 KOps/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7619ms 0.4053ms 2.4674 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1719ms 6.0493ms 165.3090 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9749ms 0.3323ms 3.0092 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7492ms 0.3707ms 2.6974 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.6486ms 5.9176ms 168.9871 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6725ms 0.2814ms 3.5542 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4538ms 0.2403ms 4.1622 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.6209ms 6.1718ms 162.0279 Ops/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0054ms 0.4725ms 2.1166 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6377ms 0.4582ms 2.1822 KOps/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.9795ms 5.4135ms 184.7229 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.7457ms 2.0807ms 480.5977 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 6.6841ms 1.1527ms 867.5525 Ops/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.4599s 14.5584ms 68.6891 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 6.3361ms 2.0459ms 488.7749 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 8.9913ms 1.2429ms 804.5574 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 8.2309ms 5.6442ms 177.1741 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 7.9188ms 2.1843ms 457.8185 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 8.3155ms 1.4294ms 699.5802 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.4516ms 13.2098ms 75.7016 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 18.7972ms 16.9642ms 58.9478 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 18.9535ms 18.2728ms 54.7261 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 18.5158ms 16.9516ms 58.9916 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 18.6232ms 17.9985ms 55.5602 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 19.8732ms 18.3556ms 54.4793 Ops/s

@vmoens vmoens added the bug Something isn't working label Feb 18, 2025
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 19, 2025
ghstack-source-id: 91c745405f7bf0ace83c2567e39d49229a660609
Pull Request resolved: #2792
@vmoens vmoens changed the title [BugFix] Fix update shape mismatch in _skip_tensordict [Feature] ParallelEnv.consolidate Feb 19, 2025
@vmoens vmoens added enhancement New feature or request and removed bug Something isn't working labels Feb 19, 2025
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@vmoens vmoens merged commit a79bcd8 into gh/vmoens/89/base Feb 20, 2025
64 of 74 checks passed
vmoens added a commit that referenced this pull request Feb 20, 2025
ghstack-source-id: 27e7d444c126e48fdb70d951a0cc7beaee1db3a8
Pull Request resolved: #2792
@vmoens vmoens deleted the gh/vmoens/89/head branch February 20, 2025 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants