unable to reproduce the results for S3 on StreamingBench

Following strictly the instructions but getting the following results, which are non-negligibly lower than what reported in the paper. Any idea why this happens. Thanks. 

**ViSpeak_real_stats.csv**
```
task_type,total,correct,accuracy
Clips Summarize,317,240,0.7570977917981072
total,2500,1667,0.6668
Object Recognition,367,279,0.7602179836512262
Attribute Recognition,306,227,0.7418300653594772
Prospective Reasoning,108,65,0.6018518518518519
Action Recognition,353,239,0.6770538243626062
Spatial Understanding,246,147,0.5975609756097561
Event Understanding,161,119,0.7391304347826086
Counting,193,45,0.23316062176165803
Text-Rich Understanding,321,223,0.6947040498442367
Causal Reasoning,128,83,0.6484375
```

**ViSpeak_sqa_stats.csv**
```
task_type,total,correct,accuracy
Sequential Question Answering,250,96,0.384
```


**ViSpeak_proactive_stats.csv**
```
task_type,total,time_correct,time_accuracy,answer_correct,answer_accuracy
Proactive Output,250,115,0.46,109,0.436
```


**ViSpeak_omni_stats.csv**
```
task_type,total,correct,accuracy
Misleading Context Understanding,250,84,0.336
total,1500,790,0.5266666666666666
Source Discrimination,250,141,0.564
Emotion Recognition,250,112,0.448
Anomaly Context Understanding,250,114,0.456
Scene Understanding,250,143,0.572
Multimodal Alignment,250,196,0.784
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to reproduce the results for S3 on StreamingBench #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

unable to reproduce the results for S3 on StreamingBench #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions