https://github.com/datastaxdevs/workshop-wikipedia-qa

Real-time document Q&A using Pulsar, Cassandra, LangChain, and open-source language models.
https://github.com/datastaxdevs/workshop-wikipedia-qa

Last synced: about 1 year ago
JSON representation

Real-time document Q&A using Pulsar, Cassandra, LangChain, and open-source language models.

Host: GitHub
URL: https://github.com/datastaxdevs/workshop-wikipedia-qa
Owner: datastaxdevs
License: apache-2.0
Created: 2023-10-31T06:51:45.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-11-03T03:44:07.000Z (over 2 years ago)
Last Synced: 2023-11-03T23:22:51.657Z (over 2 years ago)
Language: Python
Size: 31.3 KB
Stars: 1
Watchers: 6
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# wikipedia_demo
Real-time document Q&A using Pulsar, Cassandra, LangChain, and open-source language models.

Don't want to complete the exercises? The complete working code is available on the `complete` branch.

## Project overview

This workshop code runs a Retrieval Augmented Generation (RAG) application stack that takes data from Wikipedia, stores it in a vector database (Astra DB), and provides a chat interface for asking questions about the Wikipedia documents.

The project uses Astra Streaming (serverless Apache Pulsar) and Astra DB (serverless Apache Cassandra) and 4 microservices built using:

- Python
- LangChain for the LLM framework
- Open source Instructor Embedding model
- Open source Mistral 7B LLM
- Gradio for a simple chat web UI
- Fast API to provide the document embedding service

## Running the project

The project consists of 4 microservices

- `docstream` Gets random Wikipedia articles in English and adds them to a Pulsar topic for processing
- `embeddings` A RESTful API service that turns text into embeddings.
- `procstream` Consumes articles from the Pulsar topic, scrapes the webpage to get the full text, generates embeddings, and stores in Astra DB
- `chatbot` Provides both the UI for the chatbot and the agent code for running the chatbot

### With docker

`docker compose up --build`

Individual services can also be started directly. Note that `procstream` and `chatbot` require that the `embeddings` microservice is running.

- `docker compose up --build docstream`
- `docker compose up --build embeddings`
- `docker compose up --build procstream`
- `docker compose up --build chatbot`

### Without docker

If you do not wish to run with docker, you can run each of the 4 microservices separately. Use pip to install the requirements for each microservice and then run it directly with python.

```
cd docstream
pip install -r requirements.txt
python app.py
```

```
cd embeddings
pip install -r requirements.txt
gunicorn --workers 1 -k uvicorn.workers.UvicornWorker app:app --bind 0.0.0.0:8000
```

```
cd procstream
pip install -r requirements.txt
python app.py
```

```
cd chatbot
pip install -r requirements.txt
python app.py
```

## Using the services

You can access the embeddings API in your Chrome browser at http://127.0.0.1:8000/docs.

The chatbot can be opened in your Chrome browser at http://127.0.0.1:7860.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/datastaxdevs/workshop-wikipedia-qa

Awesome Lists containing this project

README