{"id":38679165,"url":"https://github.com/flyersworder/analytics-env","last_synced_at":"2026-04-02T14:01:59.822Z","repository":{"id":258379097,"uuid":"868001925","full_name":"flyersworder/analytics-env","owner":"flyersworder","description":"A standardized analytics environment template with pre-configured tooling for data science, LLM integration, and document processing","archived":false,"fork":false,"pushed_at":"2026-03-26T10:09:19.000Z","size":2404,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-27T03:05:30.695Z","etag":null,"topics":["analytics","data-science","jupyter","langchain","pandas","plotly","python","quarto","template","uv"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/flyersworder.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-10-05T08:01:44.000Z","updated_at":"2026-03-26T10:09:24.000Z","dependencies_parsed_at":"2025-01-02T10:27:04.931Z","dependency_job_id":"f7357b2e-a08e-4c64-baa3-46789d246429","html_url":"https://github.com/flyersworder/analytics-env","commit_stats":null,"previous_names":["flyersworder/analytics-env"],"tags_count":1,"template":true,"template_full_name":null,"purl":"pkg:github/flyersworder/analytics-env","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyersworder%2Fanalytics-env","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyersworder%2Fanalytics-env/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyersworder%2Fanalytics-env/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyersworder%2Fanalytics-env/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/flyersworder","download_url":"https://codeload.github.com/flyersworder/analytics-env/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyersworder%2Fanalytics-env/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31307462,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T12:59:32.332Z","status":"ssl_error","status_checked_at":"2026-04-02T12:54:48.875Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","data-science","jupyter","langchain","pandas","plotly","python","quarto","template","uv"],"created_at":"2026-01-17T10:09:39.632Z","updated_at":"2026-04-02T14:01:59.803Z","avatar_url":"https://github.com/flyersworder.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# analytics-env\n\n[![CI](https://github.com/flyersworder/analytics-env/actions/workflows/ci.yml/badge.svg)](https://github.com/flyersworder/analytics-env/actions/workflows/ci.yml)\n[![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)\n[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)\n[![prek](https://img.shields.io/badge/pre--commit-prek-brightgreen?logo=pre-commit)](https://github.com/j178/prek)\n[![Template](https://img.shields.io/badge/use%20this-template-blue?logo=github)](https://github.com/flyersworder/analytics-env/generate)\n\nA standardized analytics environment template with pre-configured tooling for data science, LLM integration, and document processing.\n\n## Features\n\n- **Dependency management** with [uv](https://docs.astral.sh/uv/) and modular dependency groups (core, llm, pdf, dev)\n- **Code quality** via [prek](https://github.com/j178/prek) git hooks — ruff, black, sqlfluff, nbqa, nbstripout\n- **SQL formatting** in Jupyter notebooks with sqlnbfmt\n- **Testing** with pytest and automated notebook validation\n- **CI/CD** with GitHub Actions (lint + test on every PR)\n- **LLM workflows** — example notebooks for code generation, RAG data Q\u0026A, and automated pipelines with multi-provider support (OpenAI, Anthropic, Google, Ollama)\n- **Notebook rendering** with Quarto (HTML and PDF with Plotly support)\n\n## Quick Start\n\n### Prerequisites\n\n- Python 3.12+\n- [uv](https://docs.astral.sh/uv/) — `curl -LsSf https://astral.sh/uv/install.sh | sh`\n- [prek](https://github.com/j178/prek) — `brew install prek` or `uv tool install prek`\n\n### Setup\n\n```bash\ngit clone https://github.com/flyersworder/analytics-env.git\ncd analytics-env\nmake setup\n```\n\nThis installs all dependencies and configures git hooks.\n\n### Verify\n\n```bash\nmake lint \u0026\u0026 make test\n```\n\n## Project Structure\n\n```\nanalytics-env/\n├── notebooks/          # Jupyter notebooks and Quarto documents\n│   └── _quarto.yml     # Quarto rendering config (HTML + PDF)\n├── scripts/            # Standalone Python scripts\n├── docs/               # Documentation and presentations\n├── tests/              # pytest test suite\n├── models/             # Trained model storage\n├── pyproject.toml      # Project config, dependencies, tool settings\n├── .pre-commit-config.yaml  # Hook definitions (used by prek)\n├── .sqlfluff           # SQL linting rules\n├── config.yaml         # SQL formatting config for sqlnbfmt\n├── Makefile            # Common task runner\n└── .env.example        # Required environment variables (copy to .env)\n```\n\n## Dependency Groups\n\nInstall only what you need:\n\n| Group | Install command | What's included |\n|-------|----------------|-----------------|\n| **Core** | `uv sync` | pandas, scipy, plotly, duckdb, matplotlib, seaborn, statsmodels, notebook, marimo |\n| **LLM** | `uv sync --extra llm` | LangChain, langchain-openai, langchain-anthropic, langchain-ollama, langchain-chroma, ChromaDB |\n| **PDF** | `uv sync --extra pdf` | docling, pdfplumber, pymupdf4llm, pdf2docx, reportlab |\n| **Dev** | `uv sync --extra dev` | black, ruff, pytest, sqlfluff, sqlglot, nbqa, nbstripout |\n| **All** | `uv sync --all-extras` | Everything above |\n\n## LLM Workflows\n\nThe template includes example notebooks for three LLM-driven analytics patterns:\n\n| Notebook | Pattern | What it demonstrates |\n|----------|---------|---------------------|\n| [llm_code_generation.ipynb](notebooks/llm_code_generation.ipynb) | Code generation | Pandas DataFrame agent — ask questions in English, get code + results |\n| [llm_data_qa_rag.ipynb](notebooks/llm_data_qa_rag.ipynb) | Data Q\u0026A with RAG | ChromaDB vector store + data dictionary retrieval for grounded answers |\n| [llm_automated_pipeline.ipynb](notebooks/llm_automated_pipeline.ipynb) | Automated pipeline | LCEL chain with Pydantic structured output for batch text processing |\n\n### Provider Setup\n\nEach notebook supports multiple LLM providers. Set the relevant API key in `.env`:\n\n| Provider | Env variable | Notes |\n|----------|-------------|-------|\n| OpenAI | `OPENAI_API_KEY` | GPT-4o, text-embedding-3-small |\n| Anthropic | `ANTHROPIC_API_KEY` | Claude Sonnet |\n| Google | `GOOGLE_API_KEY` | Gemini 2.5 Flash |\n| Ollama | (none — local) | [Install Ollama](https://ollama.com), then `ollama pull llama3.2` |\n\n### Local Models with Ollama\n\nFor fully offline operation, use [Ollama](https://ollama.com) as both LLM and embedding provider:\n\n- **Embeddings** (`nomic-embed-text`): ~2GB RAM, runs on CPU\n- **Generation** (`llama3.2`, `mistral`): 8GB+ RAM recommended, GPU significantly improves speed\n- Cloud providers are recommended for production workloads\n\n## Development Workflow\n\n### Environment variables\n\n```bash\ncp .env.example .env\n# Fill in your API keys\n```\n\n### Git hooks\n\nHooks run automatically on commit via prek. To run manually:\n\n```bash\nprek run --all-files\n```\n\n### Linting and formatting\n\n```bash\nmake lint     # Check style (ruff + black)\nmake format   # Auto-fix style\n```\n\n### Testing\n\n```bash\nmake test\n```\n\n### Rendering notebooks\n\n```bash\nmake docs     # Render via Quarto\n```\n\n## Makefile Targets\n\n| Target | Description |\n|--------|-------------|\n| `make setup` | Install all deps + git hooks |\n| `make lint` | Check code style |\n| `make format` | Auto-fix code style |\n| `make test` | Run pytest |\n| `make docs` | Render Quarto notebooks |\n| `make clean` | Remove caches and build artifacts |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflyersworder%2Fanalytics-env","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflyersworder%2Fanalytics-env","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflyersworder%2Fanalytics-env/lists"}