{"id":50412336,"url":"https://github.com/cskwork/rag-mysql","last_synced_at":"2026-05-31T04:04:56.198Z","repository":{"id":305089266,"uuid":"1021817772","full_name":"cskwork/rag-mysql","owner":"cskwork","description":null,"archived":false,"fork":false,"pushed_at":"2025-10-19T23:54:09.000Z","size":50,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-20T05:18:59.017Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cskwork.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-18T02:10:53.000Z","updated_at":"2025-07-18T05:22:16.000Z","dependencies_parsed_at":"2025-07-18T09:35:10.251Z","dependency_job_id":"881b207e-09c5-400b-82a0-b9137a54f681","html_url":"https://github.com/cskwork/rag-mysql","commit_stats":null,"previous_names":["cskwork/rag-mysql"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cskwork/rag-mysql","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cskwork%2Frag-mysql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cskwork%2Frag-mysql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cskwork%2Frag-mysql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cskwork%2Frag-mysql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cskwork","download_url":"https://codeload.github.com/cskwork/rag-mysql/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cskwork%2Frag-mysql/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33718496,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-31T04:04:55.296Z","updated_at":"2026-05-31T04:04:56.192Z","avatar_url":"https://github.com/cskwork.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RAG with MySQL, Ollama, and ChromaDB\n\nThis project demonstrates how to use the `vanna` library to generate SQL queries for a MySQL database using a Retrieval-Augmented Generation (RAG) approach. It leverages a local Large Language Model (LLM) via Ollama and uses ChromaDB as the vector store for the training data.\n\nThe application provides a web interface using Flask, allowing users to ask questions in natural language, which are then converted into SQL queries and executed against the database.\n\nFor a detailed explanation of the project's components and workflow, please see the [**Project Architecture Documentation**](.docs/architecture.md).\n\n## Features\n\n- **Natural Language to SQL**: Ask questions in plain English and get SQL queries in return.\n- **Local LLM**: Uses a locally running Ollama instance, ensuring data privacy and cost-effectiveness.\n- **Vector Store**: Employs ChromaDB to store and retrieve training data (DDL, documentation, and sample queries).\n- **Web Interface**: A user-friendly web UI built with Flask for easy interaction.\n- **Multiple Training Options**: Train from database schema or DDL files in `input/` folder.\n- **Configuration-driven**: Easy to set up and configure using environment variables.\n\n## Prerequisites\n\n- Python 3.8+\n- [Ollama](https://ollama.com/) installed and running.\n- A MySQL database.\n\n## Setup\n\nFollow these steps to get the project up and running:\n\n### 1. Clone the Repository\n\n```bash\ngit clone \u003crepository-url\u003e\ncd rag-mysql\n```\n\n### 2. Create and Activate a Virtual Environment\n\nIt is recommended to use a virtual environment to manage project dependencies.\n\n```bash\npython3 -m venv venv\nsource venv/bin/activate\n```\n\n### 3. Install Dependencies\n\nInstall the required Python packages from the `requirements.txt` file.\n\n```bash\npip install -r requirements.txt\n```\n\n### 4. Configure Environment Variables\n\nCreate a `.env` file by copying the example file:\n\n```bash\ncp .env.example .env\n```\n\nNow, open the `.env` file and fill in your configuration details, especially your MySQL database credentials.\n\n```ini\n# --- Database Configuration ---\nDB_HOST=your-mysql-host\nDB_PORT=3306\nDB_USER=your-mysql-user\nDB_PASSWORD=your-mysql-password\nDB_NAME=your-mysql-database\n\n# --- Ollama Configuration ---\n# See https://ollama.com/library\nOLLAMA_MODEL=\"gemma3n:latest\"\n\n# --- Vanna.ai Configuration ---\n# You can get a free API key from https://vanna.ai\n# This is optional and only used if you want to use the Vanna.ai hosted vector store.\n# VANNA_API_KEY=\"\" \n# You can create a model name at https://vanna.ai/models\n# VANNA_MODEL=\"my-model\" # Your model name from Vanna.ai\n\n# --- Application Configuration ---\nFLASK_PORT=8084\n```\n\n### 5. Train the Model\n\nBefore you can ask questions, you need to \"train\" the Vanna instance on your database schema. Choose one of these training methods:\n\n**Option A: Train from Database Schema (Recommended)**\n```bash\npython app.py --train\n```\n\n**Option B: Train from DDL Files**\n1. Place your `.sql` DDL files in the `input/` folder\n2. Run:\n```bash\npython app.py --train-ddl\n```\n\nBoth methods store the schema information in ChromaDB for the LLM to use as context.\n\n## Usage\n\nAfter the training is complete, you can start the Flask web application:\n\n```bash\npython app.py\n```\n\nThe application will be running on `http://localhost:8084` by default. Open this URL in your web browser to access the Vanna UI and start asking questions.\n\n### Production Usage\n\nFor a production environment, it is recommended to use a more robust web server like Gunicorn. A `run.sh` script is provided to make this easy.\n\nFirst, make sure the script is executable:\n```bash\nchmod +x run.sh\n```\n\nThen, run the script to start the application with Gunicorn:\n```bash\n./run.sh\n```\n\nThis will start the server on the configured host and port with a default of 4 worker processes. You can adjust the number of workers by setting the `GUNICORN_WORKERS` environment variable in your `.env` file.\n\n## Glossary\n\n- **RAG (Retrieval-Augmented Generation)**: A technique that combines a retrieval system (like a vector database) with a generative model (like an LLM). It first retrieves relevant information and then uses that information as context to generate a more accurate and informed response.\n- **LLM (Large Language Model)**: A type of artificial intelligence model trained on vast amounts of text data to understand and generate human-like text.\n- **Ollama**: A tool that allows you to run open-source LLMs, such as Llama 2, locally on your own machine.\n- **ChromaDB**: An open-source vector database that makes it easy to store and search embeddings (numerical representations) of your data.\n- **Vanna**: A Python library that helps you build AI-powered SQL generation applications. It connects to your database, \"learns\" from your schema and data, and allows you to ask questions that it translates into SQL.\n- **Flask**: A lightweight web framework for Python used here to create the user interface for the application.\n- **Gunicorn**: A Python WSGI HTTP Server for UNIX. It's a pre-fork worker model, meaning it's a production-ready server that is much more robust than the default Flask development server.\n\n---\n*This file was generated by an AI assistant.* ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcskwork%2Frag-mysql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcskwork%2Frag-mysql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcskwork%2Frag-mysql/lists"}