https://github.com/hookdeck/index-all-the-things
A Python web application built on Flask that allows an asset with a URL to be analyzed and a textual and embedding representation stored in MongoDB Atlas. Uses Replicate and Hookdeck.
https://github.com/hookdeck/index-all-the-things
ai hookdeck llm mongodb mongodb-atlas replicate webhooks
Last synced: 6 months ago
JSON representation
A Python web application built on Flask that allows an asset with a URL to be analyzed and a textual and embedding representation stored in MongoDB Atlas. Uses Replicate and Hookdeck.
- Host: GitHub
- URL: https://github.com/hookdeck/index-all-the-things
- Owner: hookdeck
- License: mit
- Created: 2024-09-04T16:33:27.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-22T10:55:57.000Z (about 1 year ago)
- Last Synced: 2025-03-17T03:25:08.876Z (10 months ago)
- Topics: ai, hookdeck, llm, mongodb, mongodb-atlas, replicate, webhooks
- Language: Python
- Homepage:
- Size: 9.38 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Index All The Things!
A Python web application built on Flask that allows an asset with a URL to be analyzed and a textual and embedding representation stored in [MongoDB Atlas](https://www.mongodb.com/atlas).
A vector search can then be performed on the embeddings.
The application uses [Replicate](https://replicate.com) to run AI models and [Hookdeck](https://hookdeck.com?ref=github-iatt) to reliability receive asynchronous results from Replicate.
At present the application supports analyzing audio assets and getting the transcribed contents. However, there is a framework in place to support other asset types such as text, HTML, images, and video
## How it works
The following diagram shows the sequence of how assets are submitted within the Flask application and processed by Replicate, and the results sent via webhooks through Hookdeck back to the Flask application and stored in MongoDB.

## Prerequisites
- A free [Hookdeck account](https://dashboard.hookdeck.com/signup?ref=github-iatt)
- The [Hookdeck CLI installed](https://hookdeck.com/docs/cli?ref=github-iatt)
- A trial [MongoDB Atlas account](https://www.mongodb.com/cloud/atlas/register)
- A [Replicate account](https://replicate.com/signin)
- [Python 3](https://www.python.org/downloads/)
- [Poetry](https://python-poetry.org/docs/#installation) for package management
## Development setup
### Dependencies
Activate the virtual environment:
```sh
poetry shell
```
Install dependencies:
```sh
poetry install
```
### Configuration
Create a `.env` file with the following configuration, replacing with values as indicated:
```
# A secret used for signing session cookies
# https://flask.palletsprojects.com/en/2.3.x/config/#SECRET_KEY
SECRET_KEY=""
# MongoDB Atlas connection string
MONGODB_CONNECTION_URI=""
# Hookdeck Project API Key
# Hookdeck Dashboard -> Settings -> Secrets
HOOKDECK_PROJECT_API_KEY=""
# Replicate API Token
REPLICATE_API_TOKEN=""
# Hookdeck Source URLs
# These will be automatically populated for you in the next step
AUDIO_WEBHOOK_URL=""
EMBEDDINGS_WEBHOOK_URL=""
```
Run the following to create Hookdeck connections to receive webhooks from Replicate:
```sh
poetry run python create-hookdeck-connections.py
```
Run the following to create a search indexes within MongoDB:
> [!WARNING]
> You may need some data within MongoDB before you can create the indexes.
```sh
poetry run python create-indexes.py
```
### Run the app
Run the app:
```sh
poetry run python -m flask --app app --debug run
```
Create localtunnels to receive webhooks from the two Hookdeck Connections:
```sh
hookdeck listen 5000 '*'
```
Navigate to `localhost:5000` within your web browser.


## Learn more
- [Hookdeck docs](https://hookdeck.com/docs?ref=github-iatt)
- [MongoDB Atlas docs](https://www.mongodb.com/docs/atlas/)
- [Replicate docs](https://replicate.com/docs/)