https://github.com/deliciousboy/llm-chatbot-backend
https://github.com/deliciousboy/llm-chatbot-backend
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/deliciousboy/llm-chatbot-backend
- Owner: DeliciousBoy
- Created: 2025-04-05T05:53:35.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-14T07:00:58.000Z (about 1 year ago)
- Last Synced: 2025-05-14T08:21:43.108Z (about 1 year ago)
- Language: Python
- Size: 1.32 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LLM Chatbot
[](https://kedro.org)
## Overview
A Retrieval-Augmented Generation (RAG) system for scraping website data, embedding text, and answering questions via LLM
## How to install dependencies
Declare any dependencies in `requirements.txt` and `pyproject.toml` for `pip` installation.
### clone the repository
```bash
git clone https://github.com/DeliciousBoy/llm-chatbot-backend.git
cd llm-chatbot-backend
```
### Installing `uv`
this project uses `uv` to manage virtual environments and dependencies for different Python versions. You can install `uv` run:
```bash
curl -Ls https://astral.sh/uv/install.sh | sh
```
Or follow the instructions from the official GitHub repository: https://github.com/astral-sh/uv
Once installed, you can set up the environment with:
### Install with `uv` (Recommended) `This project requires Python 3.11.11`
```bash
uv venv
source .venv/bin/activate # Or .venv/Scripts/activate for Windows
uv pip install -r requirements.txt
uv pip install -e .[dev, docs]
```
If you prefer not to use uv, you can fall back to pip (see below).
### Install with `pip` (Not recommended)
This is not recommended as it may lead to dependency conflicts, especially if you are using different Python versions.
```bash
python -m venv .venv
source .venv/bin/activate # Or .venv/Scripts/activate for Windows
pip install -r requirements.txt
pip install -e .[dev,docs]
```
## How to run Kedro pipeline
This project uses [Kedro](https://kedro.org) to organize data workflows into modular pipelines.
### Avaliable pipelines
| Pipeline Name | Description |
|--------------------|--------------------------------------|
| `data_processing` | Cleans and embeds text data into vectors |
| `web_scraping` | Asynchronously scrapes web content and stores it as raw data |
Each pipeline is defined in `src/llm_chatbot_backend/pipelines/` and can be run individually or as a group. You can also run specific nodes within a pipeline.
```bash
kedro run # Run all pipelines
kedro run --pipeline=web_scraping # Run web scraping pipeline
kedro run --pipeline=data_processing # Run data processing pipeline
```
## Visualize Kedro pipeline
You can visualize the pipeline using Kedro's built-in visualization tool. This will generate a graph of the pipeline nodes and their dependencies.
```bash
kedro viz run --autoreload
```
## Running Scheduled Jobs
This project includes a scheduler using `APScheduler` to automate periodic tasks such as scraping data, generating embeddings, or updating indexes.
To start the scheduler, run:
```bash
python scheduler.py
```
## How to test your Kedro project
this project uses `pytest` to run test cases. You can run your tests with:
```bash
pytest
```
## How to run chat interface
This project includes a Streamlit app for interacting with the chatbot. You can run the app with:
```
streamlit run main.py
```
To run the app locally, make sure the virtual environment is activated and dependencies are installed
## Proejct Structure
This project follows the [Kedro](https://kedro.org) project layout with additional components for web scraping, vector embeddings, and an LLM chatbot interface via Streamlit.
```
πllm-chatbot-backend/
βββ πconf/ # Kedro configuration files
β βββ πbase/
β βββπcatalog.yml # Dataset definitions (inputs/outputs for pipelines)
β βββπparameters.yml # Project-level parameters for nodes/pipelines
βββ πdata/ # raw/cleaned/embedded/chromadb
βββ πsrc/ # Source code (Kedro pipelines, modules)
β βββ πllm_chatbot_backend/
β βββ πdatasets/ # Custom Kedro dataset classes
β | βββ πutf8_json.py # Custom JSON
β βββ πpipelines/ # All Kedro pipelines
β βββ πdata_processing/
β | βββπnodes.py # Data cleaning / embedding logic
β | βββπpipeline.py # Defines the data_processing pipeline
β βββ πweb_scraping/
β βββπnodes.py # Async scraping logic
β βββπpipeline.py # Defines the web_scraping pipeline
βββ πtests/ # Pytest test cases
β βββ πpipelines/
β βββ πdata_processing/
β | βββπtest_pipeline.py
β βββ πweb_scraping/
| βββπtest_pipeline.py
βββπmain.py # Streamlit chat interface\
βββπscheduler.py # Automate Web Scraping Task
βββπpyproject.toml # Project config & dependencies
βββπrequirements.txt # Pip requirements
βββπuv.lock # uv dependency lockfile
βββπ.env # Environment variables
```