Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/spenceypantsy1/rag-sql-langchain-chat-bot
https://github.com/spenceypantsy1/rag-sql-langchain-chat-bot
Last synced: 11 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/spenceypantsy1/rag-sql-langchain-chat-bot
- Owner: spenceypantsy1
- Created: 2024-11-28T01:46:31.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-12-17T18:32:13.000Z (12 days ago)
- Last Synced: 2024-12-17T19:27:35.399Z (12 days ago)
- Language: Jupyter Notebook
- Size: 6.48 MB
- Stars: 0
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SQL Data Exploration with Langchain and RAG 💬
This project involves three main parts:
1. **Database creation and EDA**: The first notebook(`eda-data-exploration.ipynb`) cleans the dataset and combines. both authors.csv & papers.csv into a single table. Exploratory Data Analysis is done here to understand the dataset better.
2. **NLP hot topics generation**: The second notebook (`nlp-hot-topics-generation.ipynb`) creates word clouds and finds hot topics for each year with TF-IDF NLP to upload into the SQLite database.
3. **RAG SQL Langchain Chatbot**: The third notebook (`rags-sql-langchain-chat-bot.ipynb`) leverages the database and Langchain to create an interactive chatbot for SQL-based queries.The dataset used comes from https://www.kaggle.com/datasets/rowhitswami/nips-papers-1987-2019-updated/data
## Overview
This was a group effort as part of our SIM Data Analytics Club - Data Science Academy internal projects.
The system uses **Langchain** for structured query generation, **RAG** (Retrieval-Augmented Generation) to retrieve relevant documents from the database, and **OpenAI's GPT** to generate responses based on SQL query results. The two notebooks must be run in order to set up and utilize the database and chatbot functionality.
---
## Prerequisites
1. **Python 3.x** (Recommended: Python 3.7+ 🐍)
2. **Install required Python packages**: You'll need to install the necessary dependencies from the `requirements.txt` file.To install the required packages, use the following command:
```bash
pip install -r requirements.txt bash
```
3. OpenAI API Key: You'll need to set up your OpenAI API key to interact with GPT models. You can get your API key from OpenAI's API platform 🌐.Create a .env file in the root of the project and add your OpenAI API key as follows:
```bash
OPENAI_API_KEY=your_openai_api_key
```4. SQLite Database: The first notebook creates the SQLite database (main.db) that will be used in the second notebook. This file must exist before running the chatbot notebook 🗃️.
# Steps
### Step 1: Run the Database Creation ane EDA Notebook
The first notebook, `eda-data-exploration.ipynb`, generates the SQLite database (`main.db`) and gives us an understanding on the intricacies of the data.
**To run the database creation notebook:**
1. Open Jupyter Notebook or JupyterLab 🖥️.
2. Open the `eda-data-exploration.ipynb` notebook.
3. Run the cells sequentially to:
- Create the SQLite database (`main.db`) 🗃️.
- Clean the data, removing NA fields and combining both data files into a single table.
- Plot the emerging trends in NIPS research papers.Once the notebook finishes, the `main.db` database and word clouds will be ready 🎉.
---
### Step 2: Run the Database Creation Notebook
The second notebook, `sql-data-exploration-database-creation.ipynb`, engaging NLP techniques like TF-IDF to hot-topic trends and updates our SQLite database.
**To run the database creation notebook:**
1. Open Jupyter Notebook or JupyterLab 🖥️.
2. Open the `sql-data-exploration-database-creation.ipynb` notebook.
3. Run the cells sequentially to:
- Populate the database with your data (you can modify the notebook to customize the data source).
- Generate word clouds based on the content of the database to visualize the data.---
### Step 3: Run the RAG SQL Langchain Chatbot Notebook
Once the database is created, the second notebook, `rags-sql-langchain-chat-bot.ipynb`, leverages Langchain to build a chatbot capable of interacting with the SQLite database using natural language queries 💬.
**To run the chatbot notebook:**
1. Open Jupyter Notebook or JupyterLab 🖥️.
2. Open the `rags-sql-langchain-chat-bot.ipynb` notebook.
3. Run the cells sequentially to:
- Use Langchain to generate SQL queries based on user input 📝.
- Execute the generated SQL queries on the `main.db` database 🗃️.
- Use OpenAI's GPT to generate human-readable answers based on the query results 💬.
- Present the results interactively, powered by Langchain and RAG 🔄.---
# Example Usage
Once the chatbot is running, you can interact with it via the notebook interface. Example interactions:
- **User**: "How many customers purchased product X last month?"
**Response**: The chatbot will generate an SQL query, execute it on the `main.db`, and return a response like: "Product X was purchased by 150 customers last month."- **User**: "Show me the top 5 products sold in the last quarter."
**Response**: The chatbot will execute a query to fetch the top 5 products from the database and display them.---
# Future Improvements 🛠️
- **Support for Additional Databases**: Enhance the tool to support other databases like MySQL, PostgreSQL, etc.
- **Error Handling**: Improve error handling for invalid SQL queries or database issues.
- **User Interface**: Consider adding a web interface with Gradio or a more interactive UI for ease of use.