Skip to content

Tips for running in google colab & resuming from checkpoint #8

@Luke2642

Description

@Luke2642

Thanks for this repo, it's great!

To get it working in colab, I copied the bare minimum out from the docker file:

!pip install jsonnet
!apt install -y -q ninja-build
!pip install tensorfn rich

!pip install setuptools
!pip install numpy scipy nltk lmdb cython pydantic pyhocon

!apt install libsm6 libxext6 libxrender1
!pip install opencv-python-headless

It then works despite throwing two compatibility errors:

ERROR: requests 2.23.0 has requirement urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1, but you'll have urllib3 1.26.6 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.

I then made some manual edits to config/config-t.jsonnet so it runs on colab:

Under training:{} set image size to 128
Under training:{} batch size to 12 (650mb each so <8gb I guess)

In prepare_data.py I commented out line 14 for no resizing, just cropping. Could be useful config for some datasets.

In train.py main function line 322 and comment out 5 "logger" lines. the logger info didn't work, it just hangs then falls over without error out of the box in colab but I didn't investigate further.

I also couldn't get --ckpt=checkpoint/010000.pt to resume properly. I tried editing start iteration in the config too but no luck, it just seemed to start from zero again.

Also, it may be worth editing train.py with autocast() for half precision float16 instead of float32 to improve speed and memory limitations? Or even porting to TPU? https://github.com/pytorch/xla

So then run

!git clone https://github.com/rosinality/alias-free-gan-pytorch.git

After making these edits

#upload your zip file or use google drive import
!unzip /content/dataraw.zip -d /content/dataraw

%cd /content/alias-free-gan-pytorch
!python prepare_data.py --out /content/dataset --n_worker 8 --size=128 /content/dataraw

%cd /content/alias-free-gan-pytorch
!python train.py --n_gpu 1 --conf config/config-t.jsonnet path=/content/dataset/

Thanks again!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions