Clarification on JSON Lines Dataset for Multi-Task Fine-Tuning of Florence-2 #323

mariaalfaroc · 2024-10-14T16:50:24Z

mariaalfaroc
Oct 14, 2024

Hi everyone,

I came across the notebook discussing how to fine-tune Florence-2 for Object Detection, and I have a question regarding the structure of the JSON Lines dataset when fine-tuning for multiple tasks.

Specifically, how should the dataset be formatted if I want to fine-tune for more than one task?

Should the prefix field be a list of task string IDs, while the suffix field contains a list of strings that represent the answers for each task? For example, would the following structure be correct?

{
  "prefix": ["<OD>", "<OCR>"],
  "suffix": [
    "ace of hearts<loc_345><loc_315><loc_582><loc_721>2 of hearts<loc_709><loc_115><loc_888><loc_509>3 of hearts<loc_529><loc_228><loc_735><loc_613>4 of hearts<loc_98><loc_421><loc_415><loc_845>",
    "answer_for_ocr"
  ]
}

Additionally, is there a guide available on how to format datasets for each task?

I appreciate any guidance on this!

Thank you!

LinasKo · 2024-10-15T07:01:15Z

LinasKo
Oct 15, 2024

Hi @mariaalfaroc 👋

I don't have an answer, but I suggest looking at maestro. That's our newest project, aimed explicitly at fine-tuning multimodal models. Note that the next two weeks are intense, so they might not respond.

Here's where @SkalskiP talks about the data format for Florence 2: YouTube.

0 replies

mariaalfaroc · 2024-10-15T07:44:43Z

mariaalfaroc
Oct 15, 2024
Author

Hi @LinasKo,

Thanks for your response!

I've reviewed the maestro documentation and the YouTube tutorial. However, in both of them, the fine-tuning process for Florence-2 is focused on a single task at a time—Object Detection (OD) or Visual Question Answering (VQA).

For OD, a sample annotation from the dataset looks like this:

{
  "image": "IMG_20220316_165139_jpg.rf.e4c229a9128494d17992cbe88af575df.jpg",
  "prefix": "<OD>",
  "suffix": "9 of diamonds<loc_141><loc_18><loc_404><loc_465>jack of diamonds<loc_589><loc_120><loc_789><loc_454>queen of diamonds<loc_308><loc_482><loc_570><loc_966>king of diamonds<loc_549><loc_477><loc_777><loc_904>10 of diamonds<loc_396><loc_75><loc_613><loc_458>"
}

For VQA, it appears as:

{
  "image": "IMG_20220316_165139_jpg.rf.e4c229a9128494d17992cbe88af575df.jpg",
  "prefix": "<VQA> How many cards are in the image?",
  "suffix": "5"
}

What I'd like to know is: how should the annotations be structured if I want to fine-tune Florence-2 for both OD and VQA simultaneously? Would this structure be valid? Is this even possible?

{
  "image": "IMG_20220316_165139_jpg.rf.e4c229a9128494d17992cbe88af575df.jpg",
  "prefix": ["<OD>", "<VQA> How many cards are in the image?"],
  "suffix": [
    "9 of diamonds<loc_141><loc_18><loc_404><loc_465>jack of diamonds<loc_589><loc_120><loc_789><loc_454>queen of diamonds<loc_308><loc_482><loc_570><loc_966>king of diamonds<loc_549><loc_477><loc_777><loc_904>10 of diamonds<loc_396><loc_75><loc_613><loc_458>",
    "5"
  ]
}

Thank you so much again! :)

3 replies

Amirhosein2c · 2025-01-28T02:57:56Z

Amirhosein2c
Jan 28, 2025

Do you guys find any answer for this?
@mariaalfaroc Did you try the format you propose to see if it works for multi-task finetuning?
I'd appreciate it if anyone can help with this.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on JSON Lines Dataset for Multi-Task Fine-Tuning of Florence-2 #323

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Clarification on JSON Lines Dataset for Multi-Task Fine-Tuning of Florence-2 #323

Uh oh!

mariaalfaroc Oct 14, 2024

Replies: 3 comments · 3 replies

Uh oh!

LinasKo Oct 15, 2024

Uh oh!

mariaalfaroc Oct 15, 2024 Author

Uh oh!

LinasKo Oct 15, 2024

Uh oh!

siddiquemu Oct 29, 2024

Uh oh!

LinasKo Oct 30, 2024

Uh oh!

Amirhosein2c Jan 28, 2025

mariaalfaroc
Oct 14, 2024

Replies: 3 comments 3 replies

LinasKo
Oct 15, 2024

mariaalfaroc
Oct 15, 2024
Author

Amirhosein2c
Jan 28, 2025