An open API service indexing awesome lists of open source software.

https://github.com/codestrate/vigilius_analyst

AI Agent that ingests datasets to query from Natural Language to SQL. Connected via Streamlit UI. More feats. TBD
https://github.com/codestrate/vigilius_analyst

ai-agents langchain-python langgraph-python ollama openai-api python

Last synced: 2 days ago
JSON representation

AI Agent that ingests datasets to query from Natural Language to SQL. Connected via Streamlit UI. More feats. TBD

Awesome Lists containing this project

README

          

# πŸ“Š Vigilius Analyst

[![Python 3.12.9+](https://img.shields.io/badge/python-β‰₯3.12.9-blue.svg)](https://www.python.org/downloads/release/python-3129/)
[![Conda](https://img.shields.io/badge/conda-ready-green.svg)](https://docs.conda.io/en/latest/)

An intelligent **data analysis assistant** built with **Streamlit** and **LangGraph**/**Langchain**.
Vigilius lets you query datasets in **natural language**, automatically generating SQL, visualizations, and insights β€” all powered by multiple AI providers.

---

## πŸš€ Key Features

- πŸ”— **Multi-Provider AI Support**: OpenAI, Groq, Gemini, and Ollama
- 🧠 **Smart SQL Agent**: Generates, validates, and executes SQL from natural language
- πŸ’¬ **Data Assistant**: Handles small talk & intent classification
- πŸ“‚ **Multiple File Formats**: CSV, Excel, and SQLite database support
- ⚑ **Streaming Responses**: Real-time answers with clean formatting
- πŸ’Ύ **Session Management**: Persistent chat history (Graph CheckPointer) & model configs

---

## πŸ“ Project Structure

Based on the [`sql-agent`](https://github.com/CodeStrate/Vigilius_Analyst/tree/sql-agent) branch:

```
Vigilius_Analyst/
β”œβ”€β”€ agent/
β”‚ β”œβ”€β”€ agent_handler.py # Core SQL agent logic
β”‚ β”œβ”€β”€ data_assistant_handler.py # Intent classification + small talk
β”‚ β”œβ”€β”€ llm_factory.py # Multi-provider LLM factory
β”‚ └── prompts.py # Agent system prompts
β”‚
β”œβ”€β”€ assets/
β”‚ └── chat_icons/ # User & bot avatars
β”‚
β”œβ”€β”€ backend/ # FastAPI backend (future scope)
β”‚
β”œβ”€β”€ datasets/ # Uploaded and processed datasets
β”‚
β”œβ”€β”€ debug/
β”‚ └── check_agent.py # CLI testing tool for agents
β”‚
β”œβ”€β”€ prebuilt/
β”‚ └── react_sql_agent.py # LangGraph ReAct SQL agent template
β”‚
β”œβ”€β”€ utils/
β”‚ β”œβ”€β”€ ai_providers.py # Provider configs + available models
β”‚ β”œβ”€β”€ app_utils.py # Streamlit utilities
β”‚ └── misc_utils.py # General helper functions
β”‚
β”œβ”€β”€ .env # env file for API Keys (More in future)
β”œβ”€β”€ agent_graph.png # Mermaid Image for Agent Graph Architecture
β”œβ”€β”€ app.py # Streamlit frontend entrypoint
β”œβ”€β”€ requirements.txt # Python dependencies
└── README.md
```

---

## βš™οΈ Setup Instructions

### Requirements
- **Python** β‰₯ 3.12.9
- Works on macOS, Linux, Windows

### Option 1: Virtualenv
```bash
git clone https://github.com/CodeStrate/Vigilius_Analyst.git
cd Vigilius_Analyst
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```

### Option 2: Conda
```bash
git clone https://github.com/CodeStrate/Vigilius_Analyst.git
cd Vigilius_Analyst
conda create -n vigilius python=3.12.9
conda activate vigilius
pip install -r requirements.txt
```

### Configure Environment
Create a `.env` file with your keys:

```ini
# AI Provider Keys
OPENAI_API_KEY=your_openai_key
GROQ_API_KEY=your_groq_key
GEMINI_API_KEY=your_gemini_key

# Ollama requires no API key (runs locally)
# Install from: https://ollama.com/download
```

### (Optional) Install Ollama Models
```bash
ollama pull llama3:8b
```

---

## ▢️ Running Vigilius

**Web App (Streamlit)**
```bash
streamlit run app.py
```

**Terminal Debugging**
```bash
python -m debug.check_agent
```

---

## 🎯 Usage Guide

1. **Upload Your Dataset**
- Supported: CSV, Excel (.xlsx), SQLite (.db)
- Files are converted to SQLite + schema analyzed

2. **Select Models**
- Choose AI providers + models for SQL Agent & Data Assistant
- Confirm selection to initialize

3. **Chat with Your Data**
- Example queries:
- β€œTop 10 customers by sales”
- β€œRevenue trends by month”
- β€œMost popular products”

4. **Get Results**
- Auto-generated SQL β†’ executed on database
- Outputs as tables (whenever available, pandas WIP)
- Streaming responses with formatting

---

## 🧩 AI Agent Architecture

### SQL Agent
- Schema discovery
- Query generation + validation
- Results formatting

### Data Assistant
- Handles non-data queries
- Classifies and validates intent (SQL vs. small talk)
- Maintains conversation flow

### LLM Factory
- Unified interface for all providers
- Dynamic model switching
- Provider-specific optimizations

---

## πŸ”§ Configuration

| Provider | Models | Best For |
|-----------|--------|----------|
| **OpenAI** | GPT-4, GPT-3.5 | High accuracy, complex queries |
| **Groq** | Llama-3, Mixtral | Ultra-fast inference |
| **Gemini** | Gemini-Pro | Google’s latest models |
| **Ollama** | Llama3, Mistral, CodeLlama | Local, private, free |

- Edit prompts β†’ `agent/prompts.py`
- Adjust model configs β†’ `agent/llm_factory.py`
- UI tweaks β†’ `app.py`

---

## πŸ§ͺ Testing

**CLI Debugging**
```bash
python -m debug.check_agent
```

---

## πŸ› οΈ Future Roadmap

- βœ… FastAPI backend (multi-user, sessions, API access)
- βœ… Persistent chat history
- βœ… Export results (CSV, Excel, PDF)
- βœ… Advanced visualizations + customization
- βœ… Scheduled reports + notifications

---

## 🀝 Contributing

1. Fork this repo
2. Create a branch (`git checkout -b feature/your-feature`)
3. Commit (`git commit -m "Add your feature"`)
4. Push (`git push origin feature/your-feature`)
5. Open a Pull Request

---

## πŸ“ž Support

For help or feature requests, please [open an issue](../../issues).