Thanks for your great work! I've tried to match Recap-DataComp-1B with DataComp-1B by the key.(Reference:#7) However, when I generate 50M data, I find that ~5M mismatches.(I use the huggingface annotations to get a key2recaption mapping. ~5M KeyErrors were raised.) Would you give me some advice? Sincerely looking forward to your reply!
Thanks for your great work! I've tried to match
Recap-DataComp-1BwithDataComp-1Bby thekey.(Reference:#7) However, when I generate 50M data, I find that ~5M mismatches.(I use the huggingface annotations to get akey2recaptionmapping. ~5MKeyErrors were raised.) Would you give me some advice? Sincerely looking forward to your reply!