https://github.com/flyersworder/analytics-env

A standardized analytics environment template with pre-configured tooling for data science, LLM integration, and document processing
https://github.com/flyersworder/analytics-env

analytics data-science jupyter langchain pandas plotly python quarto template uv

Last synced: 3 months ago
JSON representation

A standardized analytics environment template with pre-configured tooling for data science, LLM integration, and document processing

Host: GitHub
URL: https://github.com/flyersworder/analytics-env
Owner: flyersworder
License: mit
Created: 2024-10-05T08:01:44.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2026-03-26T10:09:19.000Z (4 months ago)
Last Synced: 2026-03-27T03:05:30.695Z (3 months ago)
Topics: analytics, data-science, jupyter, langchain, pandas, plotly, python, quarto, template, uv
Language: Jupyter Notebook
Size: 2.29 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # analytics-env

[![CI](https://github.com/flyersworder/analytics-env/actions/workflows/ci.yml/badge.svg)](https://github.com/flyersworder/analytics-env/actions/workflows/ci.yml)

[![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/)

[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)

[![prek](https://img.shields.io/badge/pre--commit-prek-brightgreen?logo=pre-commit)](https://github.com/j178/prek)

[![Template](https://img.shields.io/badge/use%20this-template-blue?logo=github)](https://github.com/flyersworder/analytics-env/generate)

A standardized analytics environment template with pre-configured tooling for data science, LLM integration, and document processing.

## Features

- **Dependency management** with [uv](https://docs.astral.sh/uv/) and modular dependency groups (core, llm, pdf, dev)

- **Code quality** via [prek](https://github.com/j178/prek) git hooks — ruff, black, sqlfluff, nbqa, nbstripout

- **SQL formatting** in Jupyter notebooks with sqlnbfmt

- **Testing** with pytest and automated notebook validation

- **CI/CD** with GitHub Actions (lint + test on every PR)

- **LLM workflows** — example notebooks for code generation, RAG data Q&A, and automated pipelines with multi-provider support (OpenAI, Anthropic, Google, Ollama)

- **Notebook rendering** with Quarto (HTML and PDF with Plotly support)

## Quick Start

### Prerequisites

- Python 3.12+

- [uv](https://docs.astral.sh/uv/) — `curl -LsSf https://astral.sh/uv/install.sh | sh`

- [prek](https://github.com/j178/prek) — `brew install prek` or `uv tool install prek`

### Setup

```bash

git clone https://github.com/flyersworder/analytics-env.git

cd analytics-env

make setup

```

This installs all dependencies and configures git hooks.

### Verify

```bash

make lint && make test

```

## Project Structure

```

analytics-env/

├── notebooks/          # Jupyter notebooks and Quarto documents

│   └── _quarto.yml     # Quarto rendering config (HTML + PDF)

├── scripts/            # Standalone Python scripts

├── docs/               # Documentation and presentations

├── tests/              # pytest test suite

├── models/             # Trained model storage

├── pyproject.toml      # Project config, dependencies, tool settings

├── .pre-commit-config.yaml  # Hook definitions (used by prek)

├── .sqlfluff           # SQL linting rules

├── config.yaml         # SQL formatting config for sqlnbfmt

├── Makefile            # Common task runner

└── .env.example        # Required environment variables (copy to .env)

```

## Dependency Groups

Install only what you need:

| Group | Install command | What's included |

|-------|----------------|-----------------|

| **Core** | `uv sync` | pandas, scipy, plotly, duckdb, matplotlib, seaborn, statsmodels, notebook, marimo |

| **LLM** | `uv sync --extra llm` | LangChain, langchain-openai, langchain-anthropic, langchain-ollama, langchain-chroma, ChromaDB |

| **PDF** | `uv sync --extra pdf` | docling, pdfplumber, pymupdf4llm, pdf2docx, reportlab |

| **Dev** | `uv sync --extra dev` | black, ruff, pytest, sqlfluff, sqlglot, nbqa, nbstripout |

| **All** | `uv sync --all-extras` | Everything above |

## LLM Workflows

The template includes example notebooks for three LLM-driven analytics patterns:

| Notebook | Pattern | What it demonstrates |

|----------|---------|---------------------|

| [llm_code_generation.ipynb](notebooks/llm_code_generation.ipynb) | Code generation | Pandas DataFrame agent — ask questions in English, get code + results |

| [llm_data_qa_rag.ipynb](notebooks/llm_data_qa_rag.ipynb) | Data Q&A with RAG | ChromaDB vector store + data dictionary retrieval for grounded answers |

| [llm_automated_pipeline.ipynb](notebooks/llm_automated_pipeline.ipynb) | Automated pipeline | LCEL chain with Pydantic structured output for batch text processing |

### Provider Setup

Each notebook supports multiple LLM providers. Set the relevant API key in `.env`:

| Provider | Env variable | Notes |

|----------|-------------|-------|

| OpenAI | `OPENAI_API_KEY` | GPT-4o, text-embedding-3-small |

| Anthropic | `ANTHROPIC_API_KEY` | Claude Sonnet |

| Google | `GOOGLE_API_KEY` | Gemini 2.5 Flash |

| Ollama | (none — local) | [Install Ollama](https://ollama.com), then `ollama pull llama3.2` |

### Local Models with Ollama

For fully offline operation, use [Ollama](https://ollama.com) as both LLM and embedding provider:

- **Embeddings** (`nomic-embed-text`): ~2GB RAM, runs on CPU

- **Generation** (`llama3.2`, `mistral`): 8GB+ RAM recommended, GPU significantly improves speed

- Cloud providers are recommended for production workloads

## Development Workflow

### Environment variables

```bash

cp .env.example .env

# Fill in your API keys

```

### Git hooks

Hooks run automatically on commit via prek. To run manually:

```bash

prek run --all-files

```

### Linting and formatting

```bash

make lint     # Check style (ruff + black)

make format   # Auto-fix style

```

### Testing

```bash

make test

```

### Rendering notebooks

```bash

make docs     # Render via Quarto

```

## Makefile Targets

| Target | Description |

|--------|-------------|

| `make setup` | Install all deps + git hooks |

| `make lint` | Check code style |

| `make format` | Auto-fix code style |

| `make test` | Run pytest |

| `make docs` | Render Quarto notebooks |

| `make clean` | Remove caches and build artifacts |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/flyersworder/analytics-env

Awesome Lists containing this project

README