https://github.com/datastaxdevs/workshop-wikipedia-qa
Real-time document Q&A using Pulsar, Cassandra, LangChain, and open-source language models.
https://github.com/datastaxdevs/workshop-wikipedia-qa
Last synced: 12 months ago
JSON representation
Real-time document Q&A using Pulsar, Cassandra, LangChain, and open-source language models.
- Host: GitHub
- URL: https://github.com/datastaxdevs/workshop-wikipedia-qa
- Owner: datastaxdevs
- License: apache-2.0
- Created: 2023-10-31T06:51:45.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-03T03:44:07.000Z (over 2 years ago)
- Last Synced: 2023-11-03T23:22:51.657Z (over 2 years ago)
- Language: Python
- Size: 31.3 KB
- Stars: 1
- Watchers: 6
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# wikipedia_demo
Real-time document Q&A using Pulsar, Cassandra, LangChain, and open-source language models.
Don't want to complete the exercises? The complete working code is available on the `complete` branch.
## Project overview
This workshop code runs a Retrieval Augmented Generation (RAG) application stack that takes data from Wikipedia, stores it in a vector database (Astra DB), and provides a chat interface for asking questions about the Wikipedia documents.
The project uses Astra Streaming (serverless Apache Pulsar) and Astra DB (serverless Apache Cassandra) and 4 microservices built using:
- Python
- LangChain for the LLM framework
- Open source Instructor Embedding model
- Open source Mistral 7B LLM
- Gradio for a simple chat web UI
- Fast API to provide the document embedding service
## Running the project
The project consists of 4 microservices
- `docstream` Gets random Wikipedia articles in English and adds them to a Pulsar topic for processing
- `embeddings` A RESTful API service that turns text into embeddings.
- `procstream` Consumes articles from the Pulsar topic, scrapes the webpage to get the full text, generates embeddings, and stores in Astra DB
- `chatbot` Provides both the UI for the chatbot and the agent code for running the chatbot
### With docker
`docker compose up --build`
Individual services can also be started directly. Note that `procstream` and `chatbot` require that the `embeddings` microservice is running.
- `docker compose up --build docstream`
- `docker compose up --build embeddings`
- `docker compose up --build procstream`
- `docker compose up --build chatbot`
### Without docker
If you do not wish to run with docker, you can run each of the 4 microservices separately. Use pip to install the requirements for each microservice and then run it directly with python.
```
cd docstream
pip install -r requirements.txt
python app.py
```
```
cd embeddings
pip install -r requirements.txt
gunicorn --workers 1 -k uvicorn.workers.UvicornWorker app:app --bind 0.0.0.0:8000
```
```
cd procstream
pip install -r requirements.txt
python app.py
```
```
cd chatbot
pip install -r requirements.txt
python app.py
```
## Using the services
You can access the embeddings API in your Chrome browser at http://127.0.0.1:8000/docs.
The chatbot can be opened in your Chrome browser at http://127.0.0.1:7860.