Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Ego4DSounds/Ego4DSounds
Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence
https://github.com/Ego4DSounds/Ego4DSounds
Last synced: 9 days ago
JSON representation
Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence
- Host: GitHub
- URL: https://github.com/Ego4DSounds/Ego4DSounds
- Owner: Ego4DSounds
- Created: 2024-06-14T08:26:42.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-06-14T18:49:02.000Z (6 months ago)
- Last Synced: 2024-08-22T03:02:00.470Z (4 months ago)
- Language: Python
- Size: 3.95 MB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- Awesome-Video-Robotic-Papers - Code
README
# Ego4DSounds
Ego4DSounds is a subset of Ego4D, an existing large-scale egocentric video dataset. Videos have a high action-audio correspondence, making it a high-quality dataset for action-to-sound generation.[Explore the dataset](https://ego4dsounds.github.io/)
## Action2Sound
Dataset introduced in _"Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos"_.
Action2Sound is an ambient-aware approach that disentangles the action sound from the ambient sound, allowing successful generation after training with diverse in-the-wild data, as well as controllable conditioning on ambient sound levels.
![action2sound](https://github.com/Ego4DSounds/Ego4DSounds/assets/59634524/40a9d037-9134-4edc-82a5-1d81c6bbb40c)
[Explore the project](https://vision.cs.utexas.edu/projects/action2sound/)
## Contents
This repository contains scripts for processing the Ego4DSounds dataset. It includes functionality for loading video and audio data and extracting clips using metadata.
- `extract_ego4d_clips.py`: Extracts clips from the Ego4D dataset
- `dataset.py`: Defines the Ego4DSounds dataset class for loading and processing video and audio clips
- Metadata files: `train_clips_1.2m.csv`, `test_clips_11k.csv`, `ego4d.json`Each row in the csv files has the following columns
```
video_uid, video_dur, narration_source, narration_ind, narration_time, clip_start, clip_end, clip_text, tag_verb, tag_noun, positive, clip_file, speech, background_music, traffic_noise, wind_noise
```## BibTeX
```
@article{chen2024action2sound,
title = {Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos},
author = {Changan Chen and Puyuan Peng and Ami Baid and Sherry Xue and Wei-Ning Hsu and David Harwath and Kristen Grauman},
year = {2024},
journal = {arXiv},
}
```