Colab Link: https://colab.research.google.com/drive/1Fg1O_joen0CY56hTO5UBNuoyjFewPEh2
Instructions for use of VQGAN+CLIP+ESRGAN_Implementation.ipynb:
- The file is meant to be run on Google Colab.
- A text prompt should be given in the parameter section. The default text prompt is 'Hogwarts Castle of Witchcraft and Wizardry Pencil Sketch'.
- An image is given as output every 50 iterations when VQGAN+CLIP is run.
- The code will run for ~20 minutes before reaching 500 iterations based on the GPU allotted by Google Colab.
- After 500 iterations, ESRGAN will process the image automatically and the super-resolution image thus generated can be found at content/ESRGAN/results/500.png
Thank you! If you have any questions, please feel free to reach out to [email protected]
References:
- Esser, Patrick, Robin Rombach, and Bjorn Ommer. "Taming transformers for high-resolution image synthesis." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
- Li, Yangguang, et al. "Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm." arXiv preprint arXiv:2110.05208 (2021).
- Wang, Xintao, et al. "Esrgan: Enhanced super-resolution generative adversarial networks." Proceedings of the European conference on computer vision (ECCV) workshops. 2018.