New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

预训练的数据格式问题 #5

Open

Wheat12345 opened this issue Jan 13, 2024 · 1 comment

Wheat12345 commented Jan 13, 2024

请问就 ViT模块来说，输入的数据格式应该是怎样的呀？能辛苦给具体举个例子嘛，感激～

Owner

sunzeyeah commented Jan 15, 2024

ViT的输入就是一张图片，会转换成patch之后输入到transformer，最终通过pooling得到这张图片的向量化表示。当然电商的图片无效信息比较多，一般是需要做目标检测的预处理，把相关物品提取出来，再作为ViT的输入

需要对比2张图片是否相似的话，就计算这2张图片的向量相似度作为度量

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment