Skip to content

Commit 21f0713

Browse files
authored
svtr_tiny 性能优化 和 修改多卡启动方式为msrun (#810)
* update rec models Readme and fix master_resnet bug * svtr_tiny GradSampler2D算子优化
1 parent 020120f commit 21f0713

31 files changed

+259
-45
lines changed

configs/det/dbnet/README.md

+8-2
Original file line numberDiff line numberDiff line change
@@ -282,9 +282,15 @@ python tools/train.py -c=configs/det/dbnet/db_r50_icdar15.yaml
282282
Please set `distribute` in yaml config file to be True.
283283

284284
```shell
285-
# n is the number of NPUs
286-
mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/dbnet/db_r50_icdar15.yaml
285+
# worker_num is the total number of Worker processes participating in the distributed task.
286+
# local_worker_num is the number of Worker processes pulled up on the current node.
287+
# The number of processes is equal to the number of NPUs used for training. In the case of single-machine multi-card worker_num and local_worker_num must be the same.
288+
msrun --worker_num=2 --local_worker_num=2 python tools/train.py --config configs/det/dbnet/db_r50_icdar15.yaml
289+
290+
# Based on verification,binding cores usually results in performance acceleration.Please configure the parameters and run.
291+
msrun --bind_core=True --worker_num=2 --local_worker_num=2 python tools/train.py --config configs/det/dbnet/db_r50_icdar15.yaml
287292
```
293+
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).
288294

289295
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir` in yaml config file. The default directory is `./tmp_det`.
290296

configs/det/dbnet/README_CN.md

+8-2
Original file line numberDiff line numberDiff line change
@@ -263,9 +263,15 @@ python tools/train.py --config configs/det/dbnet/db_r50_icdar15.yaml
263263
请确保yaml文件中的`distribute`参数为True。
264264

265265
```shell
266-
# n is the number of NPUs
267-
mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/dbnet/db_r50_icdar15.yaml
266+
# worker_num代表分布式总进程数量。
267+
# local_worker_num代表当前节点进程数量。
268+
# 进程数量即为训练使用的NPU的数量,单机多卡情况下worker_num和local_worker_num需保持一致。
269+
msrun --worker_num=2 --local_worker_num=2 python tools/train.py --config configs/det/dbnet/db_r50_icdar15.yaml
270+
271+
# 经验证,绑核在大部分情况下有性能加速,请配置参数并运行
272+
msrun --bind_core=True --worker_num=2 --local_worker_num=2 python tools/train.py --config configs/det/dbnet/db_r50_icdar15.yaml
268273
```
274+
**注意:** 有关 msrun 配置的更多信息,请参考[此处](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).
269275

270276
训练结果(包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的路径下,默认为`./tmp_det`。
271277

configs/det/dbnet/README_CN_PP-OCRv3.md

+9-1
Original file line numberDiff line numberDiff line change
@@ -330,8 +330,16 @@ model:
330330

331331
```shell
332332
# 在多个 Ascend 设备上进行分布式训练
333-
mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/det/dbnet/db_mobilenetv3_ppocrv3.yaml
333+
# worker_num代表分布式总进程数量。
334+
# local_worker_num代表当前节点进程数量。
335+
# 进程数量即为训练使用的NPU的数量,单机多卡情况下worker_num和local_worker_num需保持一致。
336+
msrun --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/det/dbnet/db_mobilenetv3_ppocrv3.yaml
337+
338+
# 经验证,绑核在大部分情况下有性能加速,请配置参数并运行
339+
msrun --bind_core=True --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/det/dbnet/db_mobilenetv3_ppocrv3.yaml
334340
```
341+
**注意:** 有关 msrun 配置的更多信息,请参考[此处](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).
342+
335343

336344
* 单卡训练
337345

configs/det/east/README.md

+8-2
Original file line numberDiff line numberDiff line change
@@ -120,9 +120,15 @@ python tools/train.py --config configs/det/east/east_r50_icdar15.yaml
120120
Please set `distribute` in yaml config file to be True.
121121

122122
```shell
123-
# n is the number of NPUs
124-
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/det/east/east_r50_icdar15.yaml
123+
# worker_num is the total number of Worker processes participating in the distributed task.
124+
# local_worker_num is the number of Worker processes pulled up on the current node.
125+
# The number of processes is equal to the number of NPUs used for training. In the case of single-machine multi-card worker_num and local_worker_num must be the same.
126+
msrun --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/det/east/east_r50_icdar15.yaml
127+
128+
# Based on verification,binding cores usually results in performance acceleration.Please configure the parameters and run.
129+
msrun --bind_core=True --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/det/east/east_r50_icdar15.yaml
125130
```
131+
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).
126132

127133
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir` in yaml config file. The default directory is `./tmp_det`.
128134

configs/det/east/README_CN.md

+8-2
Original file line numberDiff line numberDiff line change
@@ -124,9 +124,15 @@ python tools/train.py --config configs/det/east/east_r50_icdar15.yaml
124124
请确保yaml文件中的`distribute`参数为True。
125125

126126
```shell
127-
# n is the number of NPUs
128-
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/det/east/east_r50_icdar15.yaml
127+
# worker_num代表分布式总进程数量。
128+
# local_worker_num代表当前节点进程数量。
129+
# 进程数量即为训练使用的NPU的数量,单机多卡情况下worker_num和local_worker_num需保持一致。
130+
msrun --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/det/east/east_r50_icdar15.yaml
131+
132+
# 经验证,绑核在大部分情况下有性能加速,请配置参数并运行
133+
msrun --bind_core=True --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/det/east/east_r50_icdar15.yaml
129134
```
135+
**注意:** 有关 msrun 配置的更多信息,请参考[此处](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).
130136

131137
训练结果(包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的路径下,默认为`./tmp_det`。
132138

configs/det/psenet/README.md

+10-2
Original file line numberDiff line numberDiff line change
@@ -147,9 +147,17 @@ python tools/train.py --config configs/det/psenet/pse_r152_icdar15.yaml
147147
Please set `distribute` in yaml config file to be True.
148148

149149
```shell
150-
# n is the number of NPUs
151-
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/det/psenet/pse_r152_icdar15.yaml
150+
# worker_num is the total number of Worker processes participating in the distributed task.
151+
# local_worker_num is the number of Worker processes pulled up on the current node.
152+
# The number of processes is equal to the number of NPUs used for training. In the case of single-machine multi-card worker_num and local_worker_num must be the same.
153+
msrun --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/det/psenet/pse_r152_icdar15.yaml
154+
155+
# Based on verification,binding cores usually results in performance acceleration.Please configure the parameters and run.
156+
msrun --bind_core=True --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/det/psenet/pse_r152_icdar15.yaml
157+
152158
```
159+
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).
160+
153161

154162
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir` in yaml config file. The default directory is `./tmp_det`.
155163

configs/det/psenet/README_CN.md

+8-2
Original file line numberDiff line numberDiff line change
@@ -147,9 +147,15 @@ python tools/train.py --config configs/det/psenet/pse_r152_icdar15.yaml
147147
请确保yaml文件中的`distribute`参数为True。
148148

149149
```shell
150-
# n is the number of NPUs
151-
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/det/psenet/pse_r152_icdar15.yaml
150+
# worker_num代表分布式总进程数量。
151+
# local_worker_num代表当前节点进程数量。
152+
# 进程数量即为训练使用的NPU的数量,单机多卡情况下worker_num和local_worker_num需保持一致。
153+
msrun --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/det/psenet/pse_r152_icdar15.yaml
154+
155+
# 经验证,绑核在大部分情况下有性能加速,请配置参数并运行
156+
msrun --bind_core=True --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/det/psenet/pse_r152_icdar15.yaml
152157
```
158+
**注意:** 有关 msrun 配置的更多信息,请参考[此处](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).
153159

154160
训练结果(包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的路径下,默认为`./tmp_det`
155161

configs/layout/yolov8/README.md

+8-1
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,15 @@ It is easy to reproduce the reported results with the pre-defined training recip
9999

100100
```shell
101101
# distributed training on multiple Ascend devices
102-
mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/layout/yolov8/yolov8n.yaml
102+
# worker_num is the total number of Worker processes participating in the distributed task.
103+
# local_worker_num is the number of Worker processes pulled up on the current node.
104+
# The number of processes is equal to the number of NPUs used for training. In the case of single-machine multi-card worker_num and local_worker_num must be the same.
105+
msrun --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/layout/yolov8/yolov8n.yaml
106+
107+
# Based on verification,binding cores usually results in performance acceleration.Please configure the parameters and run.
108+
msrun --bind_core=True --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/layout/yolov8/yolov8n.yaml
103109
```
110+
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).
104111

105112

106113
* Standalone Training

configs/layout/yolov8/README_CN.md

+8-1
Original file line numberDiff line numberDiff line change
@@ -113,8 +113,15 @@ eval:
113113

114114
```shell
115115
# 在多个 Ascend 设备上进行分布式训练
116-
mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/layout/yolov8/yolov8n.yaml
116+
# worker_num代表分布式总进程数量。
117+
# local_worker_num代表当前节点进程数量。
118+
# 进程数量即为训练使用的NPU的数量,单机多卡情况下worker_num和local_worker_num需保持一致。
119+
msrun --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/layout/yolov8/yolov8n.yaml
120+
121+
# 经验证,绑核在大部分情况下有性能加速,请配置参数并运行
122+
msrun --bind_core=True --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/layout/yolov8/yolov8n.yaml
117123
```
124+
**注意:** 有关 msrun 配置的更多信息,请参考[此处](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).
118125

119126

120127
* 单卡训练

configs/rec/abinet/README.md

+9-1
Original file line numberDiff line numberDiff line change
@@ -221,8 +221,16 @@ It is easy to reproduce the reported results with the pre-defined training recip
221221

222222
```shell
223223
# distributed training on multiple Ascend devices
224-
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
224+
# worker_num is the total number of Worker processes participating in the distributed task.
225+
# local_worker_num is the number of Worker processes pulled up on the current node.
226+
# The number of processes is equal to the number of NPUs used for training. In the case of single-machine multi-card worker_num and local_worker_num must be the same.
227+
msrun --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
228+
229+
# Based on verification,binding cores usually results in performance acceleration.Please configure the parameters and run.
230+
msrun --bind_core=True --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
225231
```
232+
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).
233+
226234
The pre-trained model needs to be loaded during ABINet model training, and the weight of the pre-trained model is
227235
from [abinet_pretrain_en.ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt). It is needed to add the path of the pretrained weight to the model pretrained in "configs/rec/abinet/abinet_resnet45_en.yaml".
228236

configs/rec/abinet/README_CN.md

+10-1
Original file line numberDiff line numberDiff line change
@@ -239,8 +239,17 @@ eval:
239239

240240
```shell
241241
# 在多个 Ascend 设备上进行分布式训练
242-
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
242+
# worker_num代表分布式总进程数量。
243+
# local_worker_num代表当前节点进程数量。
244+
# 进程数量即为训练使用的NPU的数量,单机多卡情况下worker_num和local_worker_num需保持一致。
245+
msrun --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
246+
247+
# 经验证,绑核在大部分情况下有性能加速,请配置参数并运行
248+
msrun --bind_core=True --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
243249
```
250+
**注意:** 有关 msrun 配置的更多信息,请参考[此处](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).
251+
252+
244253
ABINet模型训练时需要加载预训练模型,预训练模型的权重来自[abinet_pretrain_en.ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt),需要在“configs/rec/abinet/abinet_resnet45_en.yaml”中model的pretrained添加预训练权重的路径。
245254

246255

configs/rec/crnn/README.md

+9-1
Original file line numberDiff line numberDiff line change
@@ -252,8 +252,16 @@ It is easy to reproduce the reported results with the pre-defined training recip
252252

253253
```shell
254254
# distributed training on multiple Ascend devices
255-
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
255+
# worker_num is the total number of Worker processes participating in the distributed task.
256+
# local_worker_num is the number of Worker processes pulled up on the current node.
257+
# The number of processes is equal to the number of NPUs used for training. In the case of single-machine multi-card worker_num and local_worker_num must be the same.
258+
msrun --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
259+
260+
# Based on verification,binding cores usually results in performance acceleration.Please configure the parameters and run.
261+
msrun --bind_core=True --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
256262
```
263+
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).
264+
257265

258266

259267
* Standalone Training

configs/rec/crnn/README_CN.md

+8-1
Original file line numberDiff line numberDiff line change
@@ -252,8 +252,15 @@ eval:
252252

253253
```shell
254254
# 在多个 Ascend 设备上进行分布式训练
255-
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
255+
# worker_num代表分布式总进程数量。
256+
# local_worker_num代表当前节点进程数量。
257+
# 进程数量即为训练使用的NPU的数量,单机多卡情况下worker_num和local_worker_num需保持一致。
258+
msrun --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
259+
260+
# 经验证,绑核在大部分情况下有性能加速,请配置参数并运行
261+
msrun --bind_core=True --worker_num=8 --local_worker_num=8 python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
256262
```
263+
**注意:** 有关 msrun 配置的更多信息,请参考[此处](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).
257264

258265

259266
* 单卡训练

configs/rec/master/README.md

+9-1
Original file line numberDiff line numberDiff line change
@@ -280,8 +280,16 @@ It is easy to reproduce the reported results with the pre-defined training recip
280280

281281
```shell
282282
# distributed training on multiple Ascend devices
283-
mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/master/master_resnet31.yaml
283+
# worker_num is the total number of Worker processes participating in the distributed task.
284+
# local_worker_num is the number of Worker processes pulled up on the current node.
285+
# The number of processes is equal to the number of NPUs used for training. In the case of single-machine multi-card worker_num and local_worker_num must be the same.
286+
msrun --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/rec/master/master_resnet31.yaml
287+
288+
# Based on verification,binding cores usually results in performance acceleration.Please configure the parameters and run.
289+
msrun --bind_core=True --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/rec/master/master_resnet31.yaml
284290
```
291+
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).
292+
285293

286294

287295
* Standalone Training

configs/rec/master/README_CN.md

+10-1
Original file line numberDiff line numberDiff line change
@@ -281,8 +281,17 @@ eval:
281281

282282
```shell
283283
# 在多个 Ascend 设备上进行分布式训练
284-
mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/master/master_resnet31.yaml
284+
# worker_num代表分布式总进程数量。
285+
# local_worker_num代表当前节点进程数量。
286+
# 进程数量即为训练使用的NPU的数量,单机多卡情况下worker_num和local_worker_num需保持一致。
287+
msrun --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/rec/master/master_resnet31.yaml
288+
289+
# 经验证,绑核在大部分情况下有性能加速,请配置参数并运行
290+
msrun --bind_core=True --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/rec/master/master_resnet31.yaml
285291
```
292+
**注意:** 有关 msrun 配置的更多信息,请参考[此处](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).
293+
294+
286295

287296

288297
* 单卡训练

configs/rec/rare/README.md

+8-1
Original file line numberDiff line numberDiff line change
@@ -243,8 +243,15 @@ It is easy to reproduce the reported results with the pre-defined training recip
243243

244244
```shell
245245
# distributed training on multiple Ascend devices
246-
mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/rare/rare_resnet34.yaml
246+
# worker_num is the total number of Worker processes participating in the distributed task.
247+
# local_worker_num is the number of Worker processes pulled up on the current node.
248+
# The number of processes is equal to the number of NPUs used for training. In the case of single-machine multi-card worker_num and local_worker_num must be the same.
249+
msrun --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/rec/rare/rare_resnet34.yaml
250+
251+
# Based on verification,binding cores usually results in performance acceleration.Please configure the parameters and run.
252+
msrun --bind_core=True --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/rec/rare/rare_resnet34.yaml
247253
```
254+
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/msrun_launcher.html).
248255

249256

250257
* Standalone Training

configs/rec/rare/README_CN.md

+8-1
Original file line numberDiff line numberDiff line change
@@ -243,8 +243,15 @@ eval:
243243

244244
```shell
245245
# 在多个 Ascend 设备上进行分布式训练
246-
mpirun --allow-run-as-root -n 4 python tools/train.py --config configs/rec/rare/rare_resnet34.yaml
246+
# worker_num代表分布式总进程数量。
247+
# local_worker_num代表当前节点进程数量。
248+
# 进程数量即为训练使用的NPU的数量,单机多卡情况下worker_num和local_worker_num需保持一致。
249+
msrun --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/rec/rare/rare_resnet34.yaml
250+
251+
# 经验证,绑核在大部分情况下有性能加速,请配置参数并运行
252+
msrun --bind_core=True --worker_num=4 --local_worker_num=4 python tools/train.py --config configs/rec/rare/rare_resnet34.yaml
247253
```
254+
**注意:** 有关 msrun 配置的更多信息,请参考[此处](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html).
248255

249256

250257
* 单卡训练

0 commit comments

Comments
 (0)