GitHub - SiddhantBikram/ImageGeneration: A system that generates an image based on a given text prompt. The system uses VQGAN+CLIP coupled with ESRGAN to produce high resolution images.

Image Generation by using a Text Prompt (VQGAN + CLIP + ESRGAN implementation)

Instructions for use of VQGAN+CLIP+ESRGAN_Implementation.ipynb:

The file is meant to be run on Google Colab.
A text prompt should be given in the parameter section. The default text prompt is 'Hogwarts Castle of Witchcraft and Wizardry Pencil Sketch'.
An image is given as output every 50 iterations when VQGAN+CLIP is run.
The code will run for ~20 minutes before reaching 500 iterations based on the GPU allotted by Google Colab.
After 500 iterations, ESRGAN will process the image automatically and the super-resolution image thus generated can be found at content/ESRGAN/results/500.png

Thank you! If you have any questions, please feel free to reach out to [email protected]

References:

Esser, Patrick, Robin Rombach, and Bjorn Ommer. "Taming transformers for high-resolution image synthesis." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
Li, Yangguang, et al. "Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm." arXiv preprint arXiv:2110.05208 (2021).
Wang, Xintao, et al. "Esrgan: Enhanced super-resolution generative adversarial networks." Proceedings of the European conference on computer vision (ECCV) workshops. 2018.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Outputs		Outputs
README.md		README.md
VQGAN+CLIP+ESRGAN_Implementation.ipynb		VQGAN+CLIP+ESRGAN_Implementation.ipynb