Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/qniksefat/lexitalk
🤖🎙️ Explore Lex Fridman Podcast Transcripts with a smart chatbot!
https://github.com/qniksefat/lexitalk
cohere llama-index mongodb openai-chatgpt retrieval-augmented-generation streamlit weaviate
Last synced: about 1 month ago
JSON representation
🤖🎙️ Explore Lex Fridman Podcast Transcripts with a smart chatbot!
- Host: GitHub
- URL: https://github.com/qniksefat/lexitalk
- Owner: qniksefat
- Created: 2024-01-20T06:59:54.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-03-13T06:39:02.000Z (8 months ago)
- Last Synced: 2024-10-06T09:27:23.199Z (about 1 month ago)
- Topics: cohere, llama-index, mongodb, openai-chatgpt, retrieval-augmented-generation, streamlit, weaviate
- Language: Python
- Homepage: https://lexchat.streamlit.app
- Size: 90.7 MB
- Stars: 10
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LexChat
LexChat is a chatbot designed to guide you through discussions and analyze various perspectives from conversation transcripts of the [Lex Fridman Podcast](https://lexfridman.com/podcast). It shows the moments where a topic is discussed and helps you watch it from that time stamp. The chatbot aims to provide balanced responses by considering all sides of an argument. The project is built using Python and integrates with Streamlit for the web interface, Weaviate for vector search, and OpenAI for natural language processing.
## Motivation
I usually find myself diving deep (a true DFS) into the internet, trying to research on my questions. But, the noise is louder than the signal drowning out the nuggets of wisdom that I seek. Finding the high-influential individuals or profound books on a particular subject feels like hunting for a golden needle in a haystack.This chatbot searches through thousands of hours of talks with brilliant minds. Not just an idea, but its opposing ideas that were discussed during other episodes. Not just transcripts, but the exact timestamps of when an idea was discussed for convenient listening or watching. I can tap into a collective wisdom of the internet with ease.
## Quick Start
You can quickly start using the chatbot by visiting the [lexchat.streamlit.app](https://lexchat.streamlit.app) website. The chatbot will be available for interaction, and you can ask your questions.
## Usage
You can also run the chatbot locally by following the setup and execution instructions below.
```bash
git clone https://github.com/qniksefat/lexitalk.git
cd lexitalk
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```### Web Interface
From this point, you can run the chatbot through web interface using the following command which will launch a it on your localhost:
```bash
streamlit run streamlit_app.py
```### Command Line Interface
You can also run the chatbot through the command line interface using the following command:
```bash
python cli_app.py
```## Dependencies
The project relies on a variety of external libraries and APIs to implement its features:- **llama-index:** The main library used to build a Retrieval Augmented Generation (RAG) system, enhancing the chatbot's capabilities.
- **pymongo:** Used to store the transcripts and metadata of the podcast episodes.
- **openai and anthropic:** Utilized for Large Language Model (LLM) integration behind the chatbot, enhancing natural language processing.
- **cohere:** Used for reranking the retrieved context to effectively answer user questions.
- **streamlit:** Facilitates the creation of the web interface, enabling users to interact seamlessly with the chatbot.
- **pandas:** Simple data manipulation tasks within the project.## Data
### Raw Transcripts
Please note that episdoes are not up to date containing up to episode #325 excluding episodes #84 and #100. The transcripts are available in the `data/raw/all` directory. The transcripts were provided in [Lexicap](https://karpathy.ai/lexicap/) by Andrej Karpathy using Whisper.### Metadata
### Vector Store Index
The vector store index is built in MongoDB Atlas with the following schema:
```json
{
"mappings": {
"dynamic": true,
"fields": [
{
"numDimensions": 1536,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
},
{
"path": "metadata.views",
"type": "filter"
}
]
}
}
```
## Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue if you have any suggestions or ideas for improvement.