https://github.com/jacobmarks/audio-retrieval-plugin
FiftyOne Plugin for searching images by audio clip using ImageBind and Qdrant
https://github.com/jacobmarks/audio-retrieval-plugin
fiftyone imagebind javascript machine-learning mui multimodal plugins python qdrant react replicate vector-search
Last synced: 2 months ago
JSON representation
FiftyOne Plugin for searching images by audio clip using ImageBind and Qdrant
- Host: GitHub
- URL: https://github.com/jacobmarks/audio-retrieval-plugin
- Owner: jacobmarks
- Created: 2023-10-15T22:45:30.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-01T20:06:37.000Z (about 2 years ago)
- Last Synced: 2024-04-16T07:21:07.616Z (over 1 year ago)
- Topics: fiftyone, imagebind, javascript, machine-learning, mui, multimodal, plugins, python, qdrant, react, replicate, vector-search
- Language: TypeScript
- Homepage:
- Size: 109 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Audio-to-Image Search Plugin 🔉 👉 🖼️
https://github.com/jacobmarks/audio-retrieval-plugin/assets/12500356/5365716f-5d65-4215-b6c4-889ee1d16f65
This plugin allows you to search your dataset for images that are similar to a
given audio clip.
How does it work?
- ImageBind embedding model embeds images and audio clips into a shared space (1024 dim)
- Qdrant similarity index stores the embeddings and allows for fast similarity search
- FiftyOne provides a UI for uploading the audio clip, pre-filtering, and searching the similarity index.
It demonstrates how to work with custom media types in FiftyOne, and how to create custom vector similarity indices.
Note: This plugin is a proof of concept and is not intended for production use.
It works with `ogg` and `wav` audio files, but not `mp3` files, and makes an API
call to replicate rather than running the embedding model locally, to avoid
potential installation issues.
## Watch On Youtube
[](https://www.youtube.com/watch?v=dn5DA4H9b-o&list=PLuREAXoPgT0RZrUaT0UpX_HzwKkoB-S9j&index=12)
## Installation
```shell
fiftyone plugins download https://github.com/jacobmarks/audio-retrieval-plugin
```
You will also need to install `replicate` and `qdrant-client`:
```shell
pip install replicate qdrant-client
```
## Operators
### `open_audio_retrieval_panel`
- Opens the audio retrieval panel on click
### `create_imagebind_index`
- Creates an index for the dataset using the ImageBind embedding model. This
operation can take a little while to run, so it is recommended to run it in
delegated execution mode. To do so, check the `Delegated` box in the operator's
modal, and then in a terminal run:
```shell
fiftyone delegated launch
```
### `search_images_from_audio`
- Searches the index for images that are similar to the given audio clip. This
should be relatively fast, although it may take a minute for the replicate
server to start up.
## Usage
Before you can use the plugin, you will need to create an account on
[Replicate.com](https://replicate.com/). Once you have created an account, you
can create an API token, and then add this token as an environment variable:
```shell
export REPLICATE_API_TOKEN=
```
You will also need to start a Qdrant server locally. To do so, start up your
Docker daemon, and then run:
```shell
docker run -p "6333:6333" -p "6334:6334" -d qdrant/qdrant
```
Then, you can run the `create_imagebind_index` operator, and the
`open_audio_retrieval_panel` operator. The latter will open a panel that allows
you to upload an audio clip, and then search for similar images.