The implementation of “MR2V: Customized Storytelling”

Hi, thanks for releasing StoryMem.
  I am trying to reproduce the MR2V customized storytelling examples shown on the [project page](https://kevin-thu.github.io/StoryMem/), such as the clumsy man / young lovers / cafe examples. From the paper/project page, it seems that MR2V uses one or more reference images as memory and then generates a longer story video while preserving the referenced subject identity. I would like to understand how the long project-page videos were generated.

  Could you please clarify the intended reproduction workflow?

  1. Were the long MR2V project-page videos generated shot-by-shot using multiple text prompts, with memory updated after each generated shot? Or were they generated from a single long story-level prompt?
  2. If they were generated shot-by-shot, what is the expected input format for the story script? For example, do you use a JSON file containing `story_overview`, `video_prompts`, `first_frame_prompt`, and scene-cut indicators?
  3. How are the reference images provided as the initial memory for MR2V? Are they saved as initial keyframes before running the pipeline?
  4. Are the exact prompts used for the project-page MR2V demos available, or could you share an example configuration?

  I am currently able to generate a short single M2V clip, but I am not sure how to reproduce the longer MR2V story videos shown on the project page.

  Any guidance or example command would be very helpful. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The implementation of “MR2V: Customized Storytelling” #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The implementation of “MR2V: Customized Storytelling” #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions