Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prompt_image的定义 #227

Open
qiqigit opened this issue Jan 13, 2025 · 0 comments
Open

prompt_image的定义 #227

qiqigit opened this issue Jan 13, 2025 · 0 comments

Comments

@qiqigit
Copy link

qiqigit commented Jan 13, 2025

非常感谢各位分享本项目!有两个小问题想要请教一下:

prompt_embeds = self.text_encoder(self.tokenize_captions([""], 2).to(self.gpu_id))[0]

1.我们注意到作为UNet的condition定义的prompt_embeds中对应的文本其实是空的。请问此处没有单独使用prompt_image作为condition,而是用长度为2的序列prompt_embeds做condition,单纯是为了便于进行cross attention的运算吗?(满足key的长度大于1)

2.stable diffusion中text prompt默认的token数似乎为77,即构建了一个长度为77的序列作为condition来与UNet进行cross attention运算,而本项目中没有进行padding将token数强行扩大为77而是直接运用了长度为2的序列,请问这样操作有什么理由吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant