https://github.com/flyersworder/analytics-env
A standardized analytics environment template with pre-configured tooling for data science, LLM integration, and document processing
https://github.com/flyersworder/analytics-env
analytics data-science jupyter langchain pandas plotly python quarto template uv
Last synced: 2 months ago
JSON representation
A standardized analytics environment template with pre-configured tooling for data science, LLM integration, and document processing
- Host: GitHub
- URL: https://github.com/flyersworder/analytics-env
- Owner: flyersworder
- License: mit
- Created: 2024-10-05T08:01:44.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2026-03-26T10:09:19.000Z (3 months ago)
- Last Synced: 2026-03-27T03:05:30.695Z (3 months ago)
- Topics: analytics, data-science, jupyter, langchain, pandas, plotly, python, quarto, template, uv
- Language: Jupyter Notebook
- Size: 2.29 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# analytics-env
[](https://github.com/flyersworder/analytics-env/actions/workflows/ci.yml)
[](https://www.python.org/downloads/)
[](LICENSE)
[](https://github.com/astral-sh/uv)
[](https://github.com/j178/prek)
[](https://github.com/flyersworder/analytics-env/generate)
A standardized analytics environment template with pre-configured tooling for data science, LLM integration, and document processing.
## Features
- **Dependency management** with [uv](https://docs.astral.sh/uv/) and modular dependency groups (core, llm, pdf, dev)
- **Code quality** via [prek](https://github.com/j178/prek) git hooks — ruff, black, sqlfluff, nbqa, nbstripout
- **SQL formatting** in Jupyter notebooks with sqlnbfmt
- **Testing** with pytest and automated notebook validation
- **CI/CD** with GitHub Actions (lint + test on every PR)
- **LLM workflows** — example notebooks for code generation, RAG data Q&A, and automated pipelines with multi-provider support (OpenAI, Anthropic, Google, Ollama)
- **Notebook rendering** with Quarto (HTML and PDF with Plotly support)
## Quick Start
### Prerequisites
- Python 3.12+
- [uv](https://docs.astral.sh/uv/) — `curl -LsSf https://astral.sh/uv/install.sh | sh`
- [prek](https://github.com/j178/prek) — `brew install prek` or `uv tool install prek`
### Setup
```bash
git clone https://github.com/flyersworder/analytics-env.git
cd analytics-env
make setup
```
This installs all dependencies and configures git hooks.
### Verify
```bash
make lint && make test
```
## Project Structure
```
analytics-env/
├── notebooks/ # Jupyter notebooks and Quarto documents
│ └── _quarto.yml # Quarto rendering config (HTML + PDF)
├── scripts/ # Standalone Python scripts
├── docs/ # Documentation and presentations
├── tests/ # pytest test suite
├── models/ # Trained model storage
├── pyproject.toml # Project config, dependencies, tool settings
├── .pre-commit-config.yaml # Hook definitions (used by prek)
├── .sqlfluff # SQL linting rules
├── config.yaml # SQL formatting config for sqlnbfmt
├── Makefile # Common task runner
└── .env.example # Required environment variables (copy to .env)
```
## Dependency Groups
Install only what you need:
| Group | Install command | What's included |
|-------|----------------|-----------------|
| **Core** | `uv sync` | pandas, scipy, plotly, duckdb, matplotlib, seaborn, statsmodels, notebook, marimo |
| **LLM** | `uv sync --extra llm` | LangChain, langchain-openai, langchain-anthropic, langchain-ollama, langchain-chroma, ChromaDB |
| **PDF** | `uv sync --extra pdf` | docling, pdfplumber, pymupdf4llm, pdf2docx, reportlab |
| **Dev** | `uv sync --extra dev` | black, ruff, pytest, sqlfluff, sqlglot, nbqa, nbstripout |
| **All** | `uv sync --all-extras` | Everything above |
## LLM Workflows
The template includes example notebooks for three LLM-driven analytics patterns:
| Notebook | Pattern | What it demonstrates |
|----------|---------|---------------------|
| [llm_code_generation.ipynb](notebooks/llm_code_generation.ipynb) | Code generation | Pandas DataFrame agent — ask questions in English, get code + results |
| [llm_data_qa_rag.ipynb](notebooks/llm_data_qa_rag.ipynb) | Data Q&A with RAG | ChromaDB vector store + data dictionary retrieval for grounded answers |
| [llm_automated_pipeline.ipynb](notebooks/llm_automated_pipeline.ipynb) | Automated pipeline | LCEL chain with Pydantic structured output for batch text processing |
### Provider Setup
Each notebook supports multiple LLM providers. Set the relevant API key in `.env`:
| Provider | Env variable | Notes |
|----------|-------------|-------|
| OpenAI | `OPENAI_API_KEY` | GPT-4o, text-embedding-3-small |
| Anthropic | `ANTHROPIC_API_KEY` | Claude Sonnet |
| Google | `GOOGLE_API_KEY` | Gemini 2.5 Flash |
| Ollama | (none — local) | [Install Ollama](https://ollama.com), then `ollama pull llama3.2` |
### Local Models with Ollama
For fully offline operation, use [Ollama](https://ollama.com) as both LLM and embedding provider:
- **Embeddings** (`nomic-embed-text`): ~2GB RAM, runs on CPU
- **Generation** (`llama3.2`, `mistral`): 8GB+ RAM recommended, GPU significantly improves speed
- Cloud providers are recommended for production workloads
## Development Workflow
### Environment variables
```bash
cp .env.example .env
# Fill in your API keys
```
### Git hooks
Hooks run automatically on commit via prek. To run manually:
```bash
prek run --all-files
```
### Linting and formatting
```bash
make lint # Check style (ruff + black)
make format # Auto-fix style
```
### Testing
```bash
make test
```
### Rendering notebooks
```bash
make docs # Render via Quarto
```
## Makefile Targets
| Target | Description |
|--------|-------------|
| `make setup` | Install all deps + git hooks |
| `make lint` | Check code style |
| `make format` | Auto-fix code style |
| `make test` | Run pytest |
| `make docs` | Render Quarto notebooks |
| `make clean` | Remove caches and build artifacts |