Captioning off-the-shelf

Hello,

I am using pre-trained VL-T5 to generate captions for Flickr30K images off-the-shelf i.e. without any finetuning. I modified the captioning scripts to predict directly. I observe very short captions through, almost like noun phrases. I am including some examples below. I have played with the '--gen_max_length' and '--num_beams' parameters but I still get very short outputs. Do you have any ideas why this may be happening? Or any suggestions for how to potentially generate longer captions?

Thank you in advance!
Shruti

```
purple shirt
cutting cake
smiling
large group of people
skier
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Captioning off-the-shelf #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Captioning off-the-shelf #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions