Commit 87cb26c
[PyTorch] Add max_logit support for MuonClip (#2195)
* add max_score for fused/unfused F16 non-CP
Signed-off-by: Charlene Yang <[email protected]>
* calculate max per head instead of max over all heads
Signed-off-by: Charlene Yang <[email protected]>
* fix fused attn max_score shape
Signed-off-by: Charlene Yang <[email protected]>
* revert FE to github
Signed-off-by: Charlene Yang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update FE to 1.15.0-rc
Signed-off-by: Charlene Yang <[email protected]>
* fix merge
Signed-off-by: Charlene Yang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* reduce ew kernels; fix causal masks; add more tests
Signed-off-by: Charlene Yang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* minor fix to tests
Signed-off-by: Charlene Yang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* remove logic for flash-attn
Signed-off-by: Charlene Yang <[email protected]>
* WIP: add CP support for p2p/a2a/all_gather
Signed-off-by: Charlene Yang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* minor improvements of implementation/tests
Signed-off-by: Charlene Yang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* WIP: add thd support
Signed-off-by: Charlene Yang <[email protected]>
* add thd to UnfusedDPA
Signed-off-by: Charlene Yang <[email protected]>
* fix lint
Signed-off-by: Charlene Yang <[email protected]>
* more fixes for lint
Signed-off-by: Charlene Yang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update to FE 1.15
Signed-off-by: Charlene Yang <[email protected]>
* remove unneeded changes
Signed-off-by: Charlene Yang <[email protected]>
* disable unfused for thd + pad_between_seqs
Signed-off-by: Charlene Yang <[email protected]>
* minor fixes
Signed-off-by: Charlene Yang <[email protected]>
* disable thd for unfused until bug is fixed
Signed-off-by: Charlene Yang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix all_gather
Signed-off-by: Charlene Yang <[email protected]>
* fix all gather
Signed-off-by: Charlene Yang <[email protected]>
* rename max_score to max_logit
Signed-off-by: Charlene Yang <[email protected]>
* fix all_gather
Signed-off-by: Charlene Yang <[email protected]>
* fix all_gather
Signed-off-by: Charlene Yang <[email protected]>
* disable fused attn + thd
Signed-off-by: Charlene Yang <[email protected]>
---------
Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>1 parent 060811c commit 87cb26c
File tree
19 files changed
+748
-305
lines changed- 3rdparty
- tests/pytorch
- attention
- transformer_engine
- common
- fused_attn
- include/transformer_engine
- jax/csrc/extensions
- pytorch
- attention/dot_product_attention
- cpp_extensions
- csrc
- extensions
19 files changed
+748
-305
lines changedSubmodule cudnn-frontend updated 108 files
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
248 | 248 | | |
249 | 249 | | |
250 | 250 | | |
| 251 | + | |
251 | 252 | | |
252 | 253 | | |
253 | 254 | | |
| |||
308 | 309 | | |
309 | 310 | | |
310 | 311 | | |
| 312 | + | |
311 | 313 | | |
312 | 314 | | |
313 | 315 | | |
| |||
322 | 324 | | |
323 | 325 | | |
324 | 326 | | |
| 327 | + | |
| 328 | + | |
325 | 329 | | |
326 | 330 | | |
327 | 331 | | |
| |||
400 | 404 | | |
401 | 405 | | |
402 | 406 | | |
| 407 | + | |
403 | 408 | | |
404 | 409 | | |
405 | 410 | | |
| |||
414 | 419 | | |
415 | 420 | | |
416 | 421 | | |
| 422 | + | |
| 423 | + | |
417 | 424 | | |
418 | 425 | | |
419 | 426 | | |
| |||
495 | 502 | | |
496 | 503 | | |
497 | 504 | | |
498 | | - | |
499 | | - | |
500 | | - | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
501 | 508 | | |
502 | 509 | | |
503 | 510 | | |
504 | 511 | | |
505 | 512 | | |
506 | | - | |
| 513 | + | |
507 | 514 | | |
508 | 515 | | |
509 | 516 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
131 | 131 | | |
132 | 132 | | |
133 | 133 | | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
134 | 139 | | |
135 | 140 | | |
136 | 141 | | |
| |||
172 | 177 | | |
173 | 178 | | |
174 | 179 | | |
175 | | - | |
| 180 | + | |
176 | 181 | | |
177 | 182 | | |
178 | 183 | | |
| |||
186 | 191 | | |
187 | 192 | | |
188 | 193 | | |
189 | | - | |
| 194 | + | |
190 | 195 | | |
191 | 196 | | |
192 | 197 | | |
| |||
198 | 203 | | |
199 | 204 | | |
200 | 205 | | |
201 | | - | |
| 206 | + | |
202 | 207 | | |
203 | 208 | | |
204 | 209 | | |
| |||
209 | 214 | | |
210 | 215 | | |
211 | 216 | | |
212 | | - | |
| 217 | + | |
213 | 218 | | |
214 | 219 | | |
215 | 220 | | |
| |||
222 | 227 | | |
223 | 228 | | |
224 | 229 | | |
225 | | - | |
| 230 | + | |
226 | 231 | | |
227 | 232 | | |
228 | 233 | | |
| |||
243 | 248 | | |
244 | 249 | | |
245 | 250 | | |
| 251 | + | |
| 252 | + | |
246 | 253 | | |
247 | 254 | | |
248 | 255 | | |
| |||
266 | 273 | | |
267 | 274 | | |
268 | 275 | | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
269 | 303 | | |
270 | 304 | | |
271 | 305 | | |
| |||
962 | 996 | | |
963 | 997 | | |
964 | 998 | | |
| 999 | + | |
| 1000 | + | |
965 | 1001 | | |
966 | 1002 | | |
967 | 1003 | | |
| |||
1071 | 1107 | | |
1072 | 1108 | | |
1073 | 1109 | | |
| 1110 | + | |
1074 | 1111 | | |
1075 | 1112 | | |
1076 | 1113 | | |
| |||
1108 | 1145 | | |
1109 | 1146 | | |
1110 | 1147 | | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
1111 | 1151 | | |
1112 | 1152 | | |
| 1153 | + | |
1113 | 1154 | | |
1114 | 1155 | | |
1115 | 1156 | | |
| 1157 | + | |
1116 | 1158 | | |
1117 | 1159 | | |
1118 | | - | |
| 1160 | + | |
1119 | 1161 | | |
1120 | | - | |
| 1162 | + | |
1121 | 1163 | | |
1122 | 1164 | | |
1123 | 1165 | | |
| |||
1146 | 1188 | | |
1147 | 1189 | | |
1148 | 1190 | | |
1149 | | - | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
1150 | 1196 | | |
1151 | | - | |
| 1197 | + | |
1152 | 1198 | | |
1153 | 1199 | | |
1154 | | - | |
| 1200 | + | |
1155 | 1201 | | |
1156 | | - | |
| 1202 | + | |
1157 | 1203 | | |
1158 | 1204 | | |
1159 | 1205 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
137 | 137 | | |
138 | 138 | | |
139 | 139 | | |
140 | | - | |
141 | | - | |
| 140 | + | |
| 141 | + | |
142 | 142 | | |
143 | 143 | | |
144 | 144 | | |
| |||
183 | 183 | | |
184 | 184 | | |
185 | 185 | | |
186 | | - | |
| 186 | + | |
187 | 187 | | |
188 | 188 | | |
189 | 189 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
205 | 205 | | |
206 | 206 | | |
207 | 207 | | |
| 208 | + | |
208 | 209 | | |
209 | 210 | | |
210 | 211 | | |
| |||
233 | 234 | | |
234 | 235 | | |
235 | 236 | | |
| 237 | + | |
236 | 238 | | |
237 | 239 | | |
238 | 240 | | |
| |||
318 | 320 | | |
319 | 321 | | |
320 | 322 | | |
| 323 | + | |
321 | 324 | | |
322 | 325 | | |
323 | 326 | | |
| |||
0 commit comments