Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-audio-visual

A curated list of different papers and datasets in various areas of audio-visual processing
https://github.com/krantiparida/awesome-audio-visual

Last synced: 3 days ago
JSON representation

Datasets
- MUSIC - Audio-Visual Source Separation
- AudioSetZSL - Audio-Visual Zero-shot Learning
- AudioSet - Audio-Visual Classification
- Visually Engaged and Grounded AudioSet (VEGAS) - Sound generation from video
- SoundNet-Flickr - Image-Audio pair for cross-modal learning
- Audio-Visual Event (AVE) - Audio-Visual Event Localization
- Kinetics-Sounds - Subset of Kinetics dataset
- EPIC-Kitchens - Egocentric Audio-Visual Action Recogniton
- Audio-Visually Indicated Actions Dataset - Multimodal dataset (RGB, acoustic data as raw audio) acquired using the acoustic-optical camera
- IMSDb dataset - Movie scripts downloaded from The [Internet Script Movie Database](https://www.imsdb.com)
- auDIoviSual Crowd cOunting dataset (DISCO) - 1,935 Images and audios from various typical scenes, a total of 170, 270 instances annotated with the head locations.
- MUSIC-Synthetic dataset - Category-balanced multi-source videos by artificially synthesizing solo videos from the [MUSIC](https://github.com/roudimit/MUSIC_dataset) dataset, to facilitate the learning and evaluation of multiple-soundings-sources localization in the cocktail-party scenario.
- ACAV100M - 140 million full-length videos (total duration 1,030 years) and produce a dataset of 100 million 10-second clips (31 years) with high audio-visual correspondence.
- AVSBench - A dataset for audio-visual pixel-wise segmentation task.
- UnAV-100 - The dataset consists of more than 10K untrimmed videos with over 30K audio-visual events covering 100 different event categories. There are often multiple audio-visual events that might be very short or long, and occur concurrently in each video as in real-life audio-visual scenes.
- EmoVoxCeleb
- Speech2Gesture - Gesture prediction from speech
- AVSpeech
- Kinetics-Sounds - Subset of Kinetics dataset
Licenses
- Kranti Kumar Parida
- ![CC0

Programming Languages

Python 1

Ecosyste.ms: Awesome

awesome-audio-visual

Datasets

Licenses