{"id":23871941,"url":"https://github.com/mta-tech/seeknal","last_synced_at":"2026-04-16T07:03:24.894Z","repository":{"id":270838392,"uuid":"883652861","full_name":"mta-tech/seeknal","owner":"mta-tech","description":"Seeknal is an all-in-one platform for data and AI/ML engineering","archived":false,"fork":false,"pushed_at":"2026-03-06T06:46:48.000Z","size":14854,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-06T09:36:12.975Z","etag":null,"topics":["analytics-engineering","data-engineering","data-science","duckdb","feature-engineering","feature-management","feature-store","machine-learning","mlops"],"latest_commit_sha":null,"homepage":"https://mta-tech.github.io/seeknal/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mta-tech.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-11-05T10:46:51.000Z","updated_at":"2026-03-05T12:16:36.000Z","dependencies_parsed_at":"2025-04-14T15:55:39.479Z","dependency_job_id":null,"html_url":"https://github.com/mta-tech/seeknal","commit_stats":null,"previous_names":["mta-tech/seeknal"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/mta-tech/seeknal","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mta-tech%2Fseeknal","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mta-tech%2Fseeknal/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mta-tech%2Fseeknal/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mta-tech%2Fseeknal/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mta-tech","download_url":"https://codeload.github.com/mta-tech/seeknal/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mta-tech%2Fseeknal/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30329697,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T05:25:20.737Z","status":"ssl_error","status_checked_at":"2026-03-10T05:25:17.430Z","response_time":106,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics-engineering","data-engineering","data-science","duckdb","feature-engineering","feature-management","feature-store","machine-learning","mlops"],"created_at":"2025-01-03T15:17:31.318Z","updated_at":"2026-04-16T07:03:24.887Z","avatar_url":"https://github.com/mta-tech.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n    \u003ch1\u003eSeeknal\u003c/h1\u003e\n    \u003cp\u003e\u003cstrong\u003eTransform data with SQL and Python. Build ML features with point-in-time joins. Materialize to PostgreSQL and Iceberg — all from one CLI.\u003c/strong\u003e\u003c/p\u003e\n    \u003cp\u003e\n        \u003ca href=\"https://pypi.org/project/seeknal/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/seeknal.svg\" alt=\"PyPI version\"\u003e\u003c/a\u003e\n        \u003ca href=\"https://pypi.org/project/seeknal/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/pyversions/seeknal.svg\" alt=\"Python versions\"\u003e\u003c/a\u003e\n        \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/github/license/mta-tech/seeknal.svg\" alt=\"License\"\u003e\u003c/a\u003e\n        \u003ca href=\"https://github.com/mta-tech/seeknal/actions\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/mta-tech/seeknal/release.yml\" alt=\"CI\"\u003e\u003c/a\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\nSeeknal is an all-in-one platform for data and AI/ML engineering. Define pipelines in YAML or Python, run them through a safe `draft → dry-run → apply` workflow, and materialize outputs to PostgreSQL and Apache Iceberg simultaneously. Python 3.11+ required.\n\n## Quick Start\n\n```bash\npip install seeknal\n\nseeknal init --name my_project\nseeknal draft --name my_pipeline --type transform\nseeknal dry-run\nseeknal apply\n```\n\nExplore your data interactively or search docs from the terminal:\n\n```bash\nseeknal repl          # Interactive SQL on pipeline outputs\nseeknal docs query    # Search documentation from the CLI\n```\n\n```sql\nSELECT customer_id, COUNT(*) as order_count\nFROM target.my_transform\nGROUP BY customer_id;\n```\n\n## Key Features\n\n**Dual Pipeline Authoring** — Write pipelines in YAML, Python decorators, or both:\n\n```python\nfrom seeknal.pipeline import source, transform\n\n@source(name=\"orders\", source=\"csv\", table=\"data/orders.csv\")\ndef orders():\n    pass\n\n@transform(name=\"order_metrics\", inputs=[\"source.orders\"])\ndef order_metrics(ctx):\n    df = ctx.ref(\"source.orders\")\n    return ctx.duckdb.sql(\n        \"SELECT customer_id, SUM(amount) as total FROM df GROUP BY customer_id\"\n    ).df()\n```\n\n**Multi-Target Materialization** — Write to PostgreSQL and Iceberg from a single node:\n\n```yaml\nmaterializations:\n  - type: postgresql\n    connection: local_pg\n    table: analytics.my_table\n    mode: upsert_by_key\n    unique_keys: [id]\n  - type: iceberg\n    table: atlas.namespace.my_table\n```\n\n**Environment Management** — Isolated namespaces with per-environment profiles:\n\n```bash\nseeknal env plan dev --profile profiles-dev.yml\nseeknal env apply dev\nseeknal run --env dev\n```\n\n**Feature Store** — Define ML features in YAML or Python with entity keys, point-in-time joins, and automatic versioning. Supports offline (batch) and online (real-time) serving.\n\n```yaml\n# seeknal/feature_groups/customer_features.yml\nkind: feature_group\nname: customer_features\nentity:\n  name: customer\n  join_keys: [\"customer_id\"]\nmaterialization:\n  event_time_col: latest_order_date\n  offline: { enabled: true, format: parquet }\n  online: { enabled: false, ttl: 7d }\nfeatures:\n  total_orders: { dtype: integer }\n  total_spent: { dtype: float }\n  avg_order_value: { dtype: float }\ninputs:\n  - ref: transform.customer_orders\n```\n\n```python\n# Or use Python decorators\n@feature_group(name=\"customer_rfm\", entity=\"customer\")\ndef customer_rfm(ctx):\n    df = ctx.ref(\"transform.clean_transactions\")\n    return ctx.duckdb.sql(\"\"\"\n        SELECT CustomerID, COUNT(DISTINCT InvoiceNo) as frequency,\n               SUM(TotalAmount) as monetary_value\n        FROM df GROUP BY CustomerID\n    \"\"\").df()\n```\n\n```bash\nseeknal entity list                           # Cross-feature-group consolidation\nseeknal entity show customer                  # Inspect entity schema and feature groups\n```\n\n**Interactive SQL REPL** — Auto-registers parquets, PostgreSQL, and Iceberg sources at startup. Query pipeline outputs, explore data, iterate on SQL — all without leaving the terminal.\n\n**AI-Powered Thinking Partner** — `seeknal ask chat` is your collaborative partner for data work. The agent uses 16 tools for fast data access and 11 built-in skills for multi-step workflows like report generation, pipeline building, and data profiling — all loaded on demand to keep responses fast:\n\n```bash\nseeknal ask chat                        # Start a brainstorm / build session\nseeknal ask \"What are the top 5 customers by revenue?\"  # Quick one-shot question\nseeknal ask report \"customer analysis\"  # Generate interactive HTML dashboard\nseeknal ask chat --web                  # Enable web search for benchmarks\n```\n\nAsk it to build a pipeline from scratch, and it will draft a plan, walk you through the design, and wait for your go-ahead before generating code. Publish reports to a self-hosted **Seeknal Report Server** and share them with your team via a URL.\n\n```bash\nseeknal report-server start             # Host published reports\nseeknal gateway start                   # Expose ask as an API (WebSocket/SSE/REST)\n```\n\nSupports Google Gemini (default) and Ollama (local). Use `--provider ollama` for fully local, private analysis.\n\n## Documentation\n\n| | |\n|---|---|\n| **[Getting Started](docs/index.md)** | Installation, configuration, first pipeline |\n| **[CLI Reference](docs/reference/cli.md)** | All commands and flags |\n| **[YAML Schema](docs/reference/yaml-schema.md)** | Pipeline YAML reference |\n| **[CLI Docs Search](docs/cli/docs.md)** | Search documentation from the terminal (`seeknal docs`) |\n| **Tutorials** | [YAML Pipelines](docs/tutorials/yaml-pipeline-tutorial.md) · [Python Pipelines](docs/tutorials/python-pipelines-tutorial.md) · [Mixed](docs/tutorials/mixed-yaml-python-pipelines.md) · [Seeknal Ask Agent](docs/tutorials/seeknal-ask-agent.md) · [Report Exposures](docs/tutorials/report-exposures.md) |\n| **Guides** | [Python Pipelines](docs/guides/python-pipelines.md) · [Testing \u0026 Audits](docs/guides/testing-and-audits.md) · [Iceberg Materialization](docs/iceberg-materialization.md) · [Training to Serving](docs/guides/training-to-serving.md) |\n| **Servers** | [Gateway Server](docs/cli/gateway.md) · [Report Server](docs/cli/report-server.md) |\n| **Concepts** | [Point-in-Time Joins](docs/concepts/point-in-time-joins.md) · [Virtual Environments](docs/concepts/virtual-environments.md) · [Exposures](docs/concepts/exposures.md) · [Glossary](docs/concepts/glossary.md) |\n\n## Changelog\n\n### v2.6.0 (April 2026)\n\n**Skills-Powered Agent + Report Server** — The ask agent now uses a thin-tools/fat-skills architecture: 16 lean tools for fast data access, 11 built-in skills for multi-step workflows (reports, pipelines, profiling, metrics, publishing). Skills load on demand via progressive disclosure, keeping the agent's context lean.\n\n- **Seeknal Report Server** (`seeknal report-server start`): self-hosted server for publishing and sharing reports via unique URLs — publish from the chat TUI or the agent tool\n- **11 built-in skills**: report generation, pipeline building, data profiling, Python analysis, semantic model bootstrap, metric query/save, report exposure codification, Proof Editor publishing\n- **Chat enhancements**: `--style` (concise/explanatory/formal/conversational), `--budget` (USD cap), `--web` (DuckDuckGo search), `--session`/`--name` (named session resume)\n- **Gateway improvements**: cloud-only backend mode, standalone workers, Redis multi-replica, split topology\n- **Auto `.env` loading**: `--project \u003cpath\u003e` loads `\u003cpath\u003e/.env` automatically\n- **Error UX**: network errors classified with actionable hints; error logs saved to `~/.seeknal/logs/`\n\n### v2.5.0 (April 2026)\n\n**Seeknal as Your Thinking Partner** — `seeknal ask chat` is now a collaborative partner that brainstorms, builds pipelines, and trains models with you through conversation. It always asks for confirmation before acting — you stay in control.\n\n- **Interactive chat mode** (`seeknal ask chat`): multi-turn brainstorm and build sessions with persistent history, streaming UI with Claude Code-inspired visual hierarchy\n- **Confirmation-first workflow**: the agent proposes plans and analysis directions, then waits for your go-ahead via interactive menus before executing\n- **Pipeline and ML building**: describe what you want to build in plain language — the agent drafts YAML pipelines, feature groups, or model training code and checks in before generating\n- **Session management**: create, resume, list, and delete sessions with full message persistence (`seeknal session list/show/delete`)\n- **Iceberg REST catalog support**: integrates with any Iceberg REST catalog provider (Lakekeeper, Tabular, Polaris, etc.)\n- **Gateway server**: WebSocket, SSE, and REST endpoints for web clients; optional Telegram bot integration\n- **UI refresh**: animated fox mascot, interactive arrow-key menus, real token/tool counters, subordinate reasoning display\n\n### v2.4.0 (March 2026)\n\n**Seeknal Ask — AI-Powered Data Agent** — Natural language data analysis with 12 built-in tools:\n\n```bash\nseeknal ask \"What are the top 5 customers by revenue?\"\nseeknal ask chat                                        # Interactive multi-turn session\nseeknal ask report \"customer segmentation\"              # AI-guided HTML dashboard\nseeknal ask report --exposure monthly_kpis              # Deterministic report exposure\nseeknal ask report serve my-report                      # Live-preview with Evidence dev server\n```\n\n- **One-shot \u0026 chat modes**: Ask questions or start multi-turn sessions with conversation memory\n- **12 agent tools**: Data discovery, SQL execution, Python analysis (pandas/scipy/matplotlib), pipeline inspection, and report generation\n- **Report exposures**: Define repeatable reports in YAML with pinned SQL queries, chart types (BigValue, BarChart, LineChart, AreaChart, DataTable), and LLM-generated narratives\n- **Deterministic reports**: `sections` key pins SQL and charts — LLM only writes commentary\n- **Dual output**: Both interactive HTML dashboards and standalone Markdown reports\n- **LLM providers**: Google Gemini (default) and Ollama (local, no API key)\n- **Subprocess sandbox**: Python execution runs in isolated subprocess with restricted imports\n\n### v2.3.0 (March 2026)\n\n**Incremental Detection** — Automatically skip unchanged data sources and process only new data:\n\n```yaml\n# PostgreSQL watermark-based incremental detection\n- kind: source\n  name: events\n  source: postgresql\n  table: public.events\n  freshness:\n    time_column: created_at  # Tracks MAX(created_at) watermark\n  params:\n    connection: my_pg\n```\n\n- **PostgreSQL Incremental**: Watermark-based detection using `MAX(time_column)` comparison. Automatically generates `WHERE time_col \u003e 'watermark' OR time_col IS NULL` for incremental reads.\n- **Iceberg Incremental**: Snapshot-based detection comparing current snapshot ID. Supports partition pruning for time-partitioned tables.\n- **Skip Optimization**: If fingerprint and watermark match, source execution is skipped entirely.\n- **Cascade Invalidation**: Dependent nodes are automatically invalidated when source data changes.\n- **Full Refresh**: Use `--full` flag to ignore stored watermarks and reload all data.\n\n**Other Changes**:\n- Enhanced QA automation with multi-spec execution support\n- Pipeline error logging with `--verbose` mode\n- Security fix: Updated `cryptography` to 46.0.5 (CVE-2026-26007)\n\n### v2.2.2 (February 2026)\n\n- Entity consolidation for per-entity feature views\n- Multi-target materialization (PostgreSQL + Iceberg from single node)\n- Environment-aware execution with namespace prefixing\n\n## Install from Source\n\nFor development or contributing:\n\n```bash\ngit clone https://github.com/mta-tech/seeknal.git\ncd seeknal\nuv venv --python 3.11 \u0026\u0026 source .venv/bin/activate\nuv pip install -e \".[all]\"\n```\n\n## Contributing\n\nContributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for setup, code style, testing, and PR guidelines.\n\n## License\n\nSeeknal is [Apache 2.0 licensed](LICENSE).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmta-tech%2Fseeknal","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmta-tech%2Fseeknal","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmta-tech%2Fseeknal/lists"}