The code for flexible model generation
Use the Colossal-AI for training
colossalai run --nproc_per_node 4 train.py
colossalai run --nproc_per_node 4 train.py --model_ema --world_size 4
In config.py, Gradient Clipping is not working,so I comment out
#clip_grad_norm = 1.0