https://github.com/google/storybench

Last synced: 3 months ago
JSON representation
Host: GitHub
URL: https://github.com/google/storybench
Owner: google
License: apache-2.0
Archived: true
Created: 2023-08-17T14:16:40.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-10-16T16:28:49.000Z (over 1 year ago)
Last Synced: 2025-03-22T00:42:42.953Z (3 months ago)
Language: Python
Size: 16.3 MB
Stars: 48
Watchers: 3
Forks: 3
Open Issues: 2
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

awesome-diffusion-categorized - [Code
README

        # StoryBench: A Multifaceted Benchmark for Continuous Story Visualization

This is the implementation of the approaches described in the paper:

> Emanuele Bugliarello, Hernan Moraldo, Ruben Villegas, Mohammad Babaeizadeh, Mohammad Taghi Saffar, Han Zhang, Dumitru Erhan, Vittorio Ferrari, Pieter-Jan Kindermans, Paul Voigtlaender. [StoryBench: A Multifaceted Benchmark for Continuous Story Visualization](https://arxiv.org/abs/2308.11606). _Advances in Neural Information Processing Systems 37 (NeurIPS 2023)_.

We provide our text annotations, guidelines for human evaluation, and the code for computing automatic metrics.

Leaderboards are available on [Papers With Code](https://paperswithcode.com/dataset/storybench).

## Data

[`data/`](data/) contains the evaluation data for StoryBench.

- [`data/llm_outputs/`](data/llm_outputs/) contains the captions split by our instruction-tuned LLM

- [`data/tasks/`](data/tasks/) contains the evaluation data formatted for the StoryBench tasks of `action_exe`, `story_cont` and `story_gen`

Training data can be dowloaded from the following links:

- [didemo-train](https://storage.googleapis.com/storybench/didemo-train.json): original DiDeMo data

- [oops-train_pipeline](https://storage.googleapis.com/storybench/oops-train_pipeline.json): original VidLN caption as well as algorithmically generated stories

- [oops-train_pipeline+traces](https://storage.googleapis.com/storybench/oops-train_pipeline+traces.json): same as above plus mouse traces from VidLN

- [uvo_dense-train_pipeline](https://storage.googleapis.com/storybench/uvo_dense-train_pipeline.json): original VidLN caption as well as algorithmically generated stories

- [uvo_dense-train_pipeline+traces](https://storage.googleapis.com/storybench/uvo_dense-train_pipeline+traces.json): same as above plus mouse traces from VidLN

- [uvo_sparse-train_pipeline](https://storage.googleapis.com/storybench/uvo_sparse-train_pipeline.json): original VidLN caption as well as algorithmically generated stories

- [uvo_sparse-train_pipeline+traces](https://storage.googleapis.com/storybench/uvo_sparse-train_pipeline+traces.json): same as above plus mouse traces from VidLN

While human-annotated evaluation files are recommended (see ['metrics/data/'](metrics/data/)), we also share our automatically generated Oops validation data, which we used to assess the robustness of our data transformation pipeline:

- [oops-valid_pipeline](https://storage.googleapis.com/storybench/oops-valid_pipeline.json): original VidLN caption as well as algorithmically generated stories

- [oops-valid_pipeline+traces](https://storage.googleapis.com/storybench/oops-valid_pipeline+traces.json): same as above plus mouse traces from VidLN

## Metrics

[`metrics/`](metrics/) contains the source code to perform automatic evaluation of generated videos.

To set up your Python virtual environment, run:

```bash

pip install -r metrics/requirements.txt

```

To compute a given metric (e.g., FID with InceptionV3) run as follows:

```bash

MODEL_NAME="phenaki"

TASK="action_exe"  # [action_exe, story_cont, story_gen]

DATA_SPLIT="oops_test"  # [{oops,uvo,didemo}_{val,test}]

DATA_DIR="/tmp/datadir/"

OUT_DIR="/tmp/out/"

python3 -m metrics.fid_inception --batch_size=256 --model="ground_truth" --task=${TASK} --dataset=${DATA_SPLIT} --data_dir=${DATA_DIR} --output_dir=${OUT_DIR} --num_videos=1

python3 -m metrics.fid_inception --batch_size=256 --model=${MODEL_NAME} --task=${TASK} --dataset=${DATA_SPLIT} --data_dir=${DATA_DIR} --output_dir=${OUT_DIR} --num_videos=4

```

In this example, we run the same script twice, first to extract the features from the ground-truth videos, and then to extract the features from the videos generated by a text-to-video model (`phenaki` here).

Note that we set `--num_videos=4` in the latter case as we sample four videos per text prompt when we generate videos with our models.

If you do not use our extracted features (see above), you only need to run the first script (to extract ground-truth features) once.

The input data to the scripts are `npz` files with the (ground-truth or generated) `video` as a NumPy array.

We rely on publicly available models and code to compute our automatic metrics.

For reference, our working directory is structured as follows.

Click to expand

```bash

checkpoints/

    | DOVER.pth

    | InternVideo-MM-L-14.ckpt

    | ViT-L-14-336px.pt

    | convnext_tiny_1k_224_ema.pth

    | i3d_torchscript.pt

    | pt_inception-2015-12-05-6726825d.pth

data/

    | ground_truth/

    |   | action_exe/

    |   |   | oops_test/

    |   |   |   | raw/

    |   |   |   |   | fn0.npz

    |   |   |   |   | ...

    |   |   |   | features/

    |   |   |   |   | fid_clip/

    |   |   |   |   |   | embeddings_0.npz

    |   |   |   |   | fid_inception/

    |   |   |   |   |   | embeddings_0.npz

    |   |   |   |   | ...

    |   |   |   |   | vtm_internvideo/

    |   |   |   |   |   | embeddings_0.npz

    |   |   | ...

    |   | ...

    | phenaki/

    |   | action_exe/

    |   |   | oops_test/

    |   |   |   | raw/

    |   |   |   |   | fn0.npz

    |   |   |   |   | ...

    |   |   | ...

    |   | ...

outputs/

    | phenaki/

    |   | action_exe/

    |   |   | oops_test/

    |   |   |   | features/

    |   |   |   |   | embeddings_0.npz

    |   |   |   |   | embeddings_1.npz

    |   |   |   |   | embeddings_2.npz

    |   |   |   |   | embeddings_3.npz

    |   |   | ...

    |   | ...

```

Note that:

- checkpoints can be downloaded from the corresponding repositories (see [`metrics/third_party/`](metrics/third_party/)):

    - [DOVER](https://github.com/VQAssessment/DOVER)

    - [InternVideo](https://github.com/OpenGVLab/InternVideo)

    - [pytorch-fid](https://github.com/mseitzer/pytorch-fid)

- after extracting the features for the ground-truth data, we move them from their `${OUT_DIR}` to the `features/` directory under `${DATA_DIR}`

## License

This work is licensed under the Apache License. See [`LICENSE`](LICENSE) for details.

We rely on third-party software and models to compute automatic evaluation metrics, released under MIT and Apache licenses.

The annotations are licensed by Google LLC under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license.

If you find our code/data/models or ideas useful in your research, please consider citing the paper:

```

@inproceedings{bugliarello-etal-2023-storybench,

    author = {Bugliarello, Emanuele and Moraldo, Hernan and Villegas, Ruben and Babaeizadeh, Mohammad and Taghi Saffar, Mohammad and Zhang, Han and Erhan, Dumitru and Ferrari, Vittorio and Kindermans, Pieter-Jan and Voigtlaender, Paul},

    title = "{{StoryBench}: {A} Multifaceted Benchmark for Continuous Story Visualization}",

    booktitle = {Advances in Neural Information Processing Systems},

    publisher = {Curran Associates, Inc.},

    url = {https://arxiv.org/pdf/2308.11606.pdf},

    volume = {37},

    year = {2023}

}
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/google/storybench

Awesome Lists containing this project

README