Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-audio-visual

A curated list of different papers and datasets in various areas of audio-visual processing
https://github.com/krantiparida/awesome-audio-visual

Last synced: 6 days ago
JSON representation

  • Datasets

    • MUSIC - Audio-Visual Source Separation
    • AudioSetZSL - Audio-Visual Zero-shot Learning
    • AudioSet - Audio-Visual Classification
    • Visually Engaged and Grounded AudioSet (VEGAS) - Sound generation from video
    • SoundNet-Flickr - Image-Audio pair for cross-modal learning
    • Audio-Visual Event (AVE) - Audio-Visual Event Localization
    • Kinetics-Sounds - Subset of Kinetics dataset
    • EPIC-Kitchens - Egocentric Audio-Visual Action Recogniton
    • Audio-Visually Indicated Actions Dataset - Multimodal dataset (RGB, acoustic data as raw audio) acquired using the acoustic-optical camera
    • IMSDb dataset - Movie scripts downloaded from The [Internet Script Movie Database](https://www.imsdb.com)
    • auDIoviSual Crowd cOunting dataset (DISCO) - 1,935 Images and audios from various typical scenes, a total of 170, 270 instances annotated with the head locations.
    • MUSIC-Synthetic dataset - Category-balanced multi-source videos by artificially synthesizing solo videos from the [MUSIC](https://github.com/roudimit/MUSIC_dataset) dataset, to facilitate the learning and evaluation of multiple-soundings-sources localization in the cocktail-party scenario.
    • ACAV100M - 140 million full-length videos (total duration 1,030 years) and produce a dataset of 100 million 10-second clips (31 years) with high audio-visual correspondence.
    • AVSBench - A dataset for audio-visual pixel-wise segmentation task.
    • UnAV-100 - The dataset consists of more than 10K untrimmed videos with over 30K audio-visual events covering 100 different event categories. There are often multiple audio-visual events that might be very short or long, and occur concurrently in each video as in real-life audio-visual scenes.
    • EmoVoxCeleb
    • Speech2Gesture - Gesture prediction from speech
    • AVSpeech
    • Kinetics-Sounds - Subset of Kinetics dataset
  • Licenses

Programming Languages
Sub Categories