Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gauravooo2/text-to-sql-llm-ai-chatbot
https://github.com/gauravooo2/text-to-sql-llm-ai-chatbot
Last synced: 8 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/gauravooo2/text-to-sql-llm-ai-chatbot
- Owner: GauravOOO2
- Created: 2024-10-22T06:33:44.000Z (28 days ago)
- Default Branch: main
- Last Pushed: 2024-10-22T07:43:08.000Z (28 days ago)
- Last Synced: 2024-10-23T10:59:54.648Z (27 days ago)
- Language: Python
- Homepage: https://text-to-sql-llm-ai-chatbot.onrender.com/
- Size: 5.86 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π§ AI Chatbot with Gemini LLM, LangChain, SQL, and Streamlit
This project is an AI-powered chatbot application built using **Google's Gemini LLM**, **LangChain**, **ChromaDB**, and **Streamlit**. The chatbot is specifically designed to retrieve answers based solely on a local SQL database of students' records. This ensures that the bot doesn't generate random or hallucinated answers but provides SQL query-based responses strictly from the dataset.
## π Project Overview
This chatbot is a data retrieval tool that leverages **Google Gemini's Language Model (LLM)**, allowing it to convert natural language questions into SQL queries. The queries are then run against a local SQLite database, and the results are returned to the user via a simple and intuitive **Streamlit** UI. **LangChain** is used as part of the backend to help integrate the LLM and the SQL query generation process.
The core idea behind this project is to have a reliable, data-constrained chatbot that will **only answer based on the data available in the database**βit does not generate responses that are not tied to the underlying database. This makes it particularly useful for scenarios where data accuracy and consistency are paramount.
## π Key Features
1. **Gemini LLM Integration**: The app uses Google's Gemini LLM to process user inputs and convert them into SQL queries dynamically.
2. **SQL Query Generation**: The bot is capable of transforming user questions into precise SQL queries that retrieve the necessary information from the `STUDENT` table in an SQLite database.
3. **Data Accuracy**: The chatbot restricts its answers to the information available in the `student.db` SQLite database, preventing random responses.
4. **Streamlit UI**: A user-friendly interface built with **Streamlit** allows users to input queries and get responses directly from the chatbot.
5. **LangChain**: Leveraged to build a seamless pipeline between the language model and the database query execution.
6. **SQLite Database**: The data is stored in a local `SQLite` database (`student.db`), containing information about students such as their name, class, section, and marks.
7. **Prompt-Based AI**: The chatbot uses a well-defined prompt to guide its behavior, ensuring SQL query generation adheres to the defined schema of the database.## βοΈ Technologies Used
- **Google Gemini LLM**: A language model that processes user queries and generates responses.
- **LangChain**: Used to build robust LLM-powered applications that connect the language model to the database.
- **ChromaDB**: An optional component that helps manage and store embeddings or vectorized data (currently not fully utilized in this version but can be expanded).
- **SQLite**: A lightweight SQL database to store the student data.
- **Streamlit**: A Python framework to build interactive and user-friendly web apps.
- **Python**: Backend and core logic.## π οΈ Setup Instructions
To run this project locally or on a cloud platform like Render, follow the steps below.
### 1. Clone the Repository
```bash
git clone https://github.com/GauravOOO2/Text-To-SQL-LLM-AI-ChatBot.git
cd Text-To-SQL-LLM-AI-ChatBot
```### 2. Install Dependencies
Make sure you have **Python 3.8+** installed. Then, install the required dependencies using `pip`:
```bash
pip install -r requirements.txt
```### 3. Set Up Environment Variables
Create a `.env` file in the project directory and add your **Google API key** for Gemini LLM:
```bash
GOOGLE_API_KEY=your-google-api-key
```### 4. Initialize the SQLite Database
You can either create the SQLite database (`student.db`) manually, or you can run the `sql.py` script to set up the database with sample student records.
To create the database with predefined student data:
```bash
python sql.py
```### 5. Run the Streamlit Application
Run the Streamlit app locally using the following command:
```bash
streamlit run app.py
```### 6. Deploy on Render
If deploying on Render, ensure you:
1. Set the start command as `streamlit run app.py`.
2. Include the `Procfile` in the root of your project with the following content:```bash
web: streamlit run app.py
```3. Configure the environment variable `GOOGLE_API_KEY` in the Render dashboard.
## π» Project Structure
```
.
βββ app.py # Main Streamlit app
βββ sql.py # Script to initialize SQLite database with sample student data
βββ student.db # SQLite database (optional, can be generated dynamically)
βββ requirements.txt # Project dependencies
βββ .env # Environment variables (not included in repo)
βββ Procfile # Start command for Render deployment
βββ README.md # Project documentation (this file)
```### File Descriptions
- **`app.py`**: Contains the main logic for the Streamlit web app, including user input handling, invoking the Gemini LLM, and displaying the SQL query results.
- **`sql.py`**: Initializes an SQLite database with student records, creating the table and populating it with data.
- **`student.db`**: The SQLite database where student data is stored. It contains the table `STUDENT` with columns for name, class, section, and marks.
- **`requirements.txt`**: Lists the Python packages required to run the project (Streamlit, google-generativeai, LangChain, etc.).## π How It Works
1. **User Query Input**: The user enters a natural language question in the Streamlit app.
2. **LLM Processing**: The Gemini LLM takes the user question and converts it into an SQL query based on the pre-defined prompt.
3. **SQL Query Execution**: The SQL query is executed against the local SQLite database (`student.db`).
4. **Response**: The chatbot displays the query result as the response to the user. The bot **only retrieves and responds with data present in the `STUDENT` table** and does not generate any other information.### Example Queries:
- **User**: *"Tell me all students studying in AI & ML class?"*
- **Bot**: *SELECT * FROM STUDENT WHERE CLASS='AI & ML';*
**Response**: `Emily Davis, Michael Brown`- **User**: *"How many students are there in total?"*
- **Bot**: *SELECT COUNT(*) FROM STUDENT;*
**Response**: `10`## π Future Enhancements
- **Expand Dataset**: Include more detailed data fields (e.g., grades, attendance, etc.) to make the chatbot more informative.
- **ChromaDB Integration**: Fully integrate ChromaDB for better vector search and embedding management, expanding the bot's ability to understand queries beyond simple SQL.
- **Enhanced UI**: Improve the frontend design with additional features like charts and visual data representations.
- **Error Handling**: Add more robust error handling for invalid or out-of-scope queries.## π§βπ» Contributing
Contributions are welcome! If you'd like to improve this project or add new features, feel free to open a pull request.
1. Fork the project.
2. Create a new feature branch.
3. Submit a pull request detailing your changes.