https://github.com/Ego4DSounds/Ego4DSounds

Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence
https://github.com/Ego4DSounds/Ego4DSounds

Last synced: 5 months ago
JSON representation

Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence

Host: GitHub
URL: https://github.com/Ego4DSounds/Ego4DSounds
Owner: Ego4DSounds
Created: 2024-06-14T08:26:42.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-06-14T18:49:02.000Z (11 months ago)
Last Synced: 2024-08-22T03:02:00.470Z (9 months ago)
Language: Python
Size: 3.95 MB
Stars: 6
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

Awesome-Video-Robotic-Papers - Code

README

# Ego4DSounds
Ego4DSounds is a subset of Ego4D, an existing large-scale egocentric video dataset. Videos have a high action-audio correspondence, making it a high-quality dataset for action-to-sound generation.

[Explore the dataset](https://ego4dsounds.github.io/)

## Action2Sound

Dataset introduced in _"Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos"_.

Action2Sound is an ambient-aware approach that disentangles the action sound from the ambient sound, allowing successful generation after training with diverse in-the-wild data, as well as controllable conditioning on ambient sound levels.

![action2sound](https://github.com/Ego4DSounds/Ego4DSounds/assets/59634524/40a9d037-9134-4edc-82a5-1d81c6bbb40c)

[Explore the project](https://vision.cs.utexas.edu/projects/action2sound/)

## Contents

This repository contains scripts for processing the Ego4DSounds dataset. It includes functionality for loading video and audio data and extracting clips using metadata.

- `extract_ego4d_clips.py`: Extracts clips from the Ego4D dataset
- `dataset.py`: Defines the Ego4DSounds dataset class for loading and processing video and audio clips
- Metadata files: `train_clips_1.2m.csv`, `test_clips_11k.csv`, `ego4d.json`

Each row in the csv files has the following columns
```
video_uid, video_dur, narration_source, narration_ind, narration_time, clip_start, clip_end, clip_text, tag_verb, tag_noun, positive, clip_file, speech, background_music, traffic_noise, wind_noise
```

## BibTeX
```
@article{chen2024action2sound,
title = {Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos},
author = {Changan Chen and Puyuan Peng and Ami Baid and Sherry Xue and Wei-Ning Hsu and David Harwath and Kristen Grauman},
year = {2024},
journal = {arXiv},
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Ego4DSounds/Ego4DSounds

Awesome Lists containing this project

README