{"id":26351611,"url":"https://github.com/unmonoqueteclea/voilib","last_synced_at":"2025-04-10T05:08:48.702Z","repository":{"id":177821216,"uuid":"627793180","full_name":"unmonoqueteclea/voilib","owner":"unmonoqueteclea","description":"🎧 Podcast Search Engine. Try it now for free or run your own instance.","archived":false,"fork":false,"pushed_at":"2025-03-01T11:28:21.000Z","size":6934,"stargazers_count":71,"open_issues_count":5,"forks_count":5,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-10T05:08:41.556Z","etag":null,"topics":["fastapi","podcast","search-engine","semantic-search","svelte"],"latest_commit_sha":null,"homepage":"https://voilib.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/unmonoqueteclea.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"ko_fi":"unmonoqueteclea"}},"created_at":"2023-04-14T08:02:31.000Z","updated_at":"2025-03-24T01:58:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"d41bf12a-63ad-4f25-b4c7-b469d4f17b00","html_url":"https://github.com/unmonoqueteclea/voilib","commit_stats":null,"previous_names":["unmonoqueteclea/voilib"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unmonoqueteclea%2Fvoilib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unmonoqueteclea%2Fvoilib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unmonoqueteclea%2Fvoilib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unmonoqueteclea%2Fvoilib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/unmonoqueteclea","download_url":"https://codeload.github.com/unmonoqueteclea/voilib/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248161269,"owners_count":21057555,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastapi","podcast","search-engine","semantic-search","svelte"],"created_at":"2025-03-16T10:33:31.624Z","updated_at":"2025-04-10T05:08:48.674Z","avatar_url":"https://github.com/unmonoqueteclea.png","language":"Python","funding_links":["https://ko-fi.com/unmonoqueteclea"],"categories":[],"sub_categories":[],"readme":"# Voilib: Open Source Podcast Search Engine 🔍\n\nVoilib offers **semantic search** in thousands of minutes of\nhigh-quality transcriptions of podcasts. Just type your query and it\nwill find related content in thousands of episodes. Voilib also allows\nusers to index their own audio files.\n\n![](https://github.com/unmonoqueteclea/voilib/actions/workflows/backend.yml/badge.svg)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n\n\n![Voilib](./docs/voilib.gif)\n\n## ▶️ Run your own instance now!\n\nYou can run **your own instance** of Voilib in your server, it\ndoesn't depend on any external paid service.\n\n```\nmkdir voilib \u0026\u0026 cd \"voilib\"\ncurl https://raw.githubusercontent.com/unmonoqueteclea/voilib/main/compose.yml -o compose.yml\ndocker compose up\n```\n\nYou will need an admin user and password. By default user\n`voilib-admin` with password `*audio*search*engine` will be created.\n\nYou can change default ports with environment variables:\n\n- `VOILIB_MANAGEMENT_PORT` (for management page: default `8501`)\n- `VOILIB_FRONTEND_PORT` (for frontend: default `80`)\n- `VOILIB_API_PORT` (for backend: default `81`)\n\nAfter all services are up, jump to\n[http://localhost:8501](http://localhost:8501) and follow the\ninstructions in [Tasks page](http://localhost:8501/Tasks) to populate\nVoilib with content. You can also check [first run tasks\nsection](./infra/readme.md#first-run-tasks).\n\n![Management](./docs/management.png)\n\nMore information about deployments in [infra/readme](./infra/readme.md).\n\n\n## ❓ How it works\nVoilib performs 4 main tasks: **collecting**, **transcribing**,\n**indexing** and **querying** podcasts episodes to find the most\ninteresting fragments for every user prompt.\n\n- **Collection**: Almost all public podcasts have an associated `RSS\n  feed` that contains **metadata** about every episode and a link to\n  the **audio file**. Voilib uses those feeds to **collect and store**\n  that metadata from the list of podcasts configured by the\n  application admin. Additionally, Voilib can also index your own\n  audio files.\n\n- **Transcription**: The collected episodes are then transcribed using\n  [Whisper: Open AI's Open Source Transcription\n  Model](https://openai.com/research/whisper).\n\n- **Index**: Episodes transcripts are divided into **fragments of\n  approximately 40 words** (check `DEFAULT_FRAGMENT_WORDS` constant to\n  see the value currently used). Then, Voilib calculates the\n  [embedding](https://en.wikipedia.org/wiki/Sentence_embedding) of\n  each fragment. In that way, every fragment is converted into a\n  vector of 384 floating point numbers (check `EMBEDDINGS_SIZE`\n  constant to see the embedding size currently used). Those vectors\n  are stored in a [vector database: Qdrant](https://qdrant.tech/).\n\n- **Queries**: For each new user prompt, Voilib just needs to\n  calculate the embedding of it and find the closest ones in the\n  vector database, returning the most relevant episodes fragments to\n  the user.\n\n## License\nVoilib is licensed under the GNU GPLv3 license. See [COPYING](./COPYING).\n\nPermissions of this strong copyleft license are conditioned on making\navailable complete source code of licensed works and modifications,\nwhich include larger works using a licensed work, under the same\nlicense. Copyright and license notices must be preserved. Contributors\nprovide an express grant of patent rights.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funmonoqueteclea%2Fvoilib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funmonoqueteclea%2Fvoilib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funmonoqueteclea%2Fvoilib/lists"}