FiGMaQ

FiGMaQ

This dataset consists of a quintuple format in the form of <reference image, reference caption, modification text, target image, target caption>. Unlike traditional LLM-generated multimodal triplet datasets, our dataset introduces three key characteristics that distinguish it:

Detailed Image Captions: Each image is accompanied by a detailed caption that elaborates on various attributes of the image. These captions are typically longer than 100 tokens, offering a nuanced and rich description of the visual content, which enhances multimodal understanding.
Rich Modification Text: The modification text in this dataset provides a more precise description of changes made to the image. It is written in a more natural, human-like style, incorporating vague and imprecise terms, making it closer to how humans would describe adjustments or edits in everyday language.
Quintuple Structure: Unlike typical triplets, each sample in this dataset consists of five parts. This expanded format facilitates a wide range of fine-tuning tasks, including multimodal generation and retrieval, enabling diverse applications that require an integrated understanding of both images and text.

We will release our data and code soon!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FiGMaQ

About

Releases

Packages

hbhalpha/FiGMaQ

Folders and files

Latest commit

History

Repository files navigation

FiGMaQ

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages