{"id":30762888,"url":"https://github.com/interactiveaudiolab/voogle","last_synced_at":"2025-09-04T15:50:19.747Z","repository":{"id":46894323,"uuid":"120645043","full_name":"interactiveaudiolab/voogle","owner":"interactiveaudiolab","description":"This is code for an audio search engine that uses vocal imitations of the desired sound","archived":false,"fork":false,"pushed_at":"2023-05-16T19:38:45.000Z","size":9145,"stargazers_count":37,"open_issues_count":4,"forks_count":2,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-07-26T03:35:09.768Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/interactiveaudiolab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-02-07T17:04:13.000Z","updated_at":"2024-07-26T03:35:09.769Z","dependencies_parsed_at":"2022-09-23T08:12:40.803Z","dependency_job_id":null,"html_url":"https://github.com/interactiveaudiolab/voogle","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/interactiveaudiolab/voogle","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Fvoogle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Fvoogle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Fvoogle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Fvoogle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/interactiveaudiolab","download_url":"https://codeload.github.com/interactiveaudiolab/voogle/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Fvoogle/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273633458,"owners_count":25140775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-04T02:00:08.968Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-04T15:50:03.733Z","updated_at":"2025-09-04T15:50:19.731Z","avatar_url":"https://github.com/interactiveaudiolab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Voogle\nVoogle is an audio search engine that uses vocal imitations of the desired sound as the search query.\n\nVoogle is built in Python 3.6 and Javascript, using Node.js. Voogle runs best in Google Chrome.\n\n![](https://github.com/interactiveaudiolab/voogle/raw/master/static/images/voogle.png \"voogle\")\n\n## Installation\n### Server\nVoogle backend dependencies are installed with `pip install -r requirements.txt`.\n\n**Note:** Windows and Linux users must have [FFmpeg](https://www.ffmpeg.org/) installed.\n\n### Interface\nVoogle frontend dependencies are installed with `npm install`.\n\n**Note:** You must have [Node.js](https://nodejs.org/en/) installed before you can run `npm install`.\n\n## Available Datasets\nAny collection of audio files can be used as sounds returned by Voogle in response to a vocal query. The Interactive Audio Lab has released 2 datasets specifically for the training of query-by-vocal-imitation models: [Vocal Imitation Set](https://zenodo.org/record/1340763#.XAap0mhKiM8) and [VocalSketch](https://zenodo.org/record/1251982#.XAap1WhKiM8) [1, 2]. A small test dataset for demos can be downloaded [here](https://www.dropbox.com/s/lkj55uvz4z26i8d/test_dataset.zip?dl=1).\n\nAudio files should be placed in [`data/audio/\u003cdataset_name\u003e`](data/audio/). The dataset used during execution can be specified in [`config.yaml`](config.yaml).\n\n## Available Models\nInteractive Audio Lab has released the following models for query-by-vocal-imitation:\n - `siamese-style`: a siamese-style neural network [3]\n    - [weight file](https://www.dropbox.com/s/234i2ft9sfcdpty/siamese_style.h5?dl=1)\n - `VGGish-embedding`: cosine similarity of VGGish embeddings [4]\n    - [weight file](https://www.dropbox.com/s/5x5ceczislmyk0y/vggish_pretrained_convs.pth?dl=1)\n - `mcft`: multi-resolution common-fate transform [5]\n\nWeight files should be placed in [`model/weights`](model/weights/). The model used during execution can be specified in [`config.yaml`](config.yaml).\n\n## Setup\nAfter installing the dependencies, a dataset, and a model, the Voogle app can be deployed.\n\n### Deploying Locally\n1. Start the server by running `npm run production`.\n2. Navigate to `localhost:5000` in your browser.\n\nFrom there, please follow the directions found under \"Show Instructions\". Enjoy!\n\n**Note:** There are currently two frontend interfaces available for Voogle. If you would like to use the alternate interface, use the command `npm run old-interface` instead during step 1.\n\n## Testing\nUnit tests can be run with `npm run test`.\n\n## Extending\nVoogle can be extended to incorporate additional models and datasets. If you would like to make your model or dataset available to all users of Voogle, contact interactiveaudiolab@gmail.com.\n\n### Adding a model\n- Define your model as a subclass of [`QueryByVoiceModel`](model/QueryByVoiceModel.py) with all abstract methods implemented as described.\n- Add the model constructor to [`factory.py`](factory.py).\n- Place your model's weights in [`model/weights`](model/weights/).\n- Update the model name and filepath in [`config.yaml`](config.yaml).\n\nAn example model can be found [here](model/SiameseStyle.py).\n\n### Adding a dataset\n- Define your dataset as a subclass of [`QueryByVoiceDataset`](data/QueryByVoiceDataset.py) with all abstract methods implemented as described.\n- Add the dataset constructor to [`factory.py`](factory.py).\n- Place the audio files in [`data/audio/\u003cdataset_name\u003e`](data/audio/).\n- Update the dataset name in [`config.yaml`](config.yaml).\n\nAn example dataset can be found [here](data/TestDataset.py).\n\n## References\n- [1] Bongjun Kim, Madhav Ghei, Bryan Pardo, and Zhiyao Duan, \"Vocal Imitation Set: a dataset of vocally imitated sound events using the AudioSet ontology,\" Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), Surrey, UK, Nov. 2018. [[paper link](http://dcase.community/documents/workshop2018/proceedings/DCASE2018Workshop_Kim_135.pdf)]\n- [2] Mark Cartwright and Bryan Pardo, \"Vocalsketch: Vocally imitating audio concepts,\" Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (ACM), 2015. [[paper link](http://music.cs.northwestern.edu/publications/cartwright_pardo_chi2015.pdf)]\n- [3] Yichi Zhang, Bryan Pardo, and Zhiyao Duan, \"Siamese Style Convolutional Neural Networks for Sound Search by Vocal Imitation,\" IEEE/ACM Transactions on Audio Speech and Language Processing. [[paper link](https://ieeexplore.ieee.org/document/8453811)]\n- [4] Bongjun Kim and Bryan Pardo, \"Improving Content-based Audio Retrieval by Vocal Imitation Feedback,\" IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.\n- [5] Fatemeh Pishdadian and Bryan Pardo. “Multi-resolution Common Fate Transform,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018. [[paper link](http://music.eecs.northwestern.edu/publications/pishdadian_pardo_mcft_journal_2018.pdf)]\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finteractiveaudiolab%2Fvoogle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finteractiveaudiolab%2Fvoogle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finteractiveaudiolab%2Fvoogle/lists"}