{"id":46973982,"url":"https://github.com/platob/yggdrasil","last_synced_at":"2026-06-02T11:00:32.133Z","repository":{"id":328006838,"uuid":"1106461036","full_name":"Platob/Yggdrasil","owner":"Platob","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-30T21:51:52.000Z","size":17777,"stargazers_count":3,"open_issues_count":2,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-30T22:03:54.521Z","etag":null,"topics":["arrow","data","databricks","pandas","polars","spark","sql"],"latest_commit_sha":null,"homepage":"https://platob.github.io/Yggdrasil/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Platob.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-11-29T09:47:43.000Z","updated_at":"2026-05-30T20:10:40.000Z","dependencies_parsed_at":"2026-05-30T22:00:32.827Z","dependency_job_id":null,"html_url":"https://github.com/Platob/Yggdrasil","commit_stats":null,"previous_names":["platob/yggdrasil"],"tags_count":108,"template":false,"template_full_name":null,"purl":"pkg:github/Platob/Yggdrasil","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Platob%2FYggdrasil","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Platob%2FYggdrasil/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Platob%2FYggdrasil/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Platob%2FYggdrasil/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Platob","download_url":"https://codeload.github.com/Platob/Yggdrasil/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Platob%2FYggdrasil/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33818568,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-02T02:00:07.132Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","data","databricks","pandas","polars","spark","sql"],"created_at":"2026-03-11T12:24:49.371Z","updated_at":"2026-06-02T11:00:32.025Z","avatar_url":"https://github.com/Platob.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Yggdrasil\n\n**Schema-aware data interchange for Python.** One conversion registry that moves values cleanly between Python types, dataclasses, Arrow, Polars, pandas, Spark, Databricks, and the wire — without losing schema, nullability, or metadata along the way.\n\n| Package | What it is | Where it lives |\n|---|---|---|\n| `ygg` (PyPI) / `yggdrasil` (import) | Pure-Python core: cast registry, Arrow schema, engine bridges, IO/HTTP, Databricks, FastAPI | [`python/`](python/) |\n| Power Query connector | Excel `.pq` and Power BI `.mez` connectors that call the FastAPI service | [`powerquery/`](powerquery/) |\n\n📚 **Docs site:** https://platob.github.io/Yggdrasil/\n\n---\n\n## Install\n\n```bash\npip install ygg                   # core\npip install \"ygg[data]\"           # + pandas, numpy, sqlglot\npip install \"ygg[bigdata]\"        # + pyspark, delta-spark\npip install \"ygg[databricks]\"     # + databricks-sdk\npip install \"ygg[api]\"            # + fastapi, uvicorn, pydantic\npip install \"ygg[http]\"           # + urllib3, xxhash\npip install \"ygg[pickle]\"         # + cloudpickle, dill, zstandard, blake3\npip install \"ygg[mongo]\"          # + mongoengine\npip install \"ygg[postgres]\"       # + psycopg, adbc-driver-postgresql\npip install \"ygg[kafka]\"          # + confluent-kafka\npip install \"ygg[delta]\"          # + deltalake\n```\n\nThe only hard runtime deps are `pyarrow\u003e=20` and `polars\u003e=1.3`. Everything else is opt-in.\n\n---\n\n## 60-second tour\n\n### Cast anything into anything\n\n```python\nfrom yggdrasil.data.cast.registry import convert\n\nconvert(\"42\", int)              # 42\nconvert(\"true\", bool)           # True\nconvert(\"2024-01-15\", \"date\")   # datetime.date(2024, 1, 15)\n```\n\n### Dict → typed dataclass (forgiving on input, strict on meaning)\n\n```python\nfrom dataclasses import dataclass\nfrom yggdrasil.data.cast.registry import convert\n\n@dataclass\nclass Order:\n    id: int\n    amount: float\n    paid: bool = False\n\nconvert({\"id\": \"7\", \"amount\": \"99.50\", \"paid\": \"yes\"}, Order)\n# Order(id=7, amount=99.5, paid=True)\n```\n\n### Arrow schema as the contract surface\n\n```python\nimport yggdrasil.arrow as pa\nfrom yggdrasil.arrow.cast import cast_arrow_tabular\nfrom yggdrasil.data.cast.options import CastOptions\n\nraw = pa.table({\"id\": [\"1\", \"2\"], \"score\": [\"9.1\", \"8.7\"]})\ntarget = pa.schema([\n    pa.field(\"id\",    pa.int64(),   nullable=False),\n    pa.field(\"score\", pa.float64(), nullable=False),\n])\n\nout = cast_arrow_tabular(raw, CastOptions(target_field=target))\nprint(out.schema)\n```\n\n### Cross-engine in one move\n\n```python\nfrom yggdrasil.databricks import DatabricksClient\n\nstmt = DatabricksClient().sql.execute(\"SELECT * FROM main.default.orders LIMIT 100\")\n\nstmt.to_arrow_table()   # pyarrow.Table\nstmt.to_pandas()        # pandas.DataFrame\nstmt.to_polars()        # polars.DataFrame\nstmt.to_spark()         # pyspark.sql.DataFrame\nstmt.to_pylist()        # list[dict]\n```\n\n---\n\n## What you get\n\n- **One conversion registry.** Register a converter once, dispatch from anywhere. Order: exact match → identity → `Any` wildcard → MRO fallback → one-hop composition.\n- **Arrow schema as the contract.** Field names, order, nullability, metadata, nested structure, timezone intent are preserved across boundaries.\n- **Engines bridge into Arrow.** Polars, pandas, Spark each register on import — `from yggdrasil.polars.cast import cast_polars_dataframe` etc.\n- **Production HTTP stack.** `HTTPSession`, prepared requests, batch dispatch, typed response → Arrow/pandas/Polars/Spark.\n- **Databricks toolkit.** `DatabricksClient` covers SQL, Unity Catalog, Compute, DBFS/Volumes, Secrets, IAM, Genie, Spark Connect.\n- **Optional dep guards.** Base installs stay light. `from yggdrasil.polars.lib import polars` is the safe import.\n\n---\n\n## Performance\n\nThe cast registry, schema layer, and engine bridges are all tuned for the hot path — type checks, equality, hash, projection, and same-shape merges live in the nanosecond range so per-batch overhead stays negligible.\n\nRun the benchmark sweep locally:\n\n```bash\ncd python\npython benchmarks/run_all.py --repeat 5\n```\n\nBenches are organized to mirror the source tree:\n\n- [`benchmarks/data/`](python/benchmarks/data) — `Field`, `DataType`, cast registry, equality, merge, options\n- [`benchmarks/dataclasses/`](python/benchmarks/dataclasses) — `ExpiringDict`, `WaitingConfig`, pickle helpers\n- [`benchmarks/concurrent/`](python/benchmarks/concurrent) — `Job`, `JobResult`, `JobPoolExecutor`, `ThreadJob`\n- [`benchmarks/io/`](python/benchmarks/io) — `URL`, `Headers`, `BytesIO`, `Memory`, paths, primitive + nested leaves\n- [`benchmarks/databricks/`](python/benchmarks/databricks) — Databricks-specific code paths (live)\n\n---\n\n## Use cases at a glance\n\n| You want to… | Reach for |\n|---|---|\n| Normalize dicts/JSON into typed dataclasses | `convert(payload, MyDataclass)` |\n| Pin a downstream Arrow schema | `cast_arrow_tabular(t, CastOptions(target_field=schema))` |\n| Convert Polars ↔ Arrow ↔ pandas ↔ Spark | `yggdrasil.{polars,pandas,spark}.cast` |\n| Fan out HTTP requests with retries | `HTTPSession().send_many(reqs, SendManyConfig(...))` |\n| Run SQL on Databricks and get a DataFrame | `DatabricksClient().sql.execute(q).to_polars()` |\n| Read/write DBFS or Volume files | `DatabricksClient().dbfs_path(\"...\").write_text(...)` |\n| Type-check job widget params | `MyConfig.from_environment()` (subclass `NotebookConfig`) |\n| Talk to Databricks from Excel/Power BI | Power Query connector via FastAPI service |\n\n---\n\n## Repository guide\n\n- [`python/`](python/) — `ygg` source, tests, MkDocs site.\n  - [`python/README.md`](python/README.md) — package guide with progressive examples (scalars → schema → engines → HTTP → Databricks).\n  - [`python/docs/`](python/docs/) — published documentation source (https://platob.github.io/Yggdrasil/).\n- [`powerquery/`](powerquery/) — Excel `.pq` and Power BI `.mez` connectors over the FastAPI service.\n- [`AGENTS.md`](AGENTS.md) — house style, error-message tone, comment voice, API ergonomics.\n- [`CLAUDE.md`](CLAUDE.md) — agent-facing notes for AI contributors.\n\n---\n\n## Develop locally\n\n```bash\ngit clone https://github.com/Platob/Yggdrasil.git\ncd Yggdrasil/python\n\nuv venv --seed .venv \u0026\u0026 source .venv/bin/activate   # Windows: .venv\\Scripts\\activate\nuv pip install -e .[dev]                       # core + dev tooling\n```\n\n```bash\ncd python\npytest                          # full suite\npytest tests/test_yggdrasil/test_io/test_url.py   # one file\nruff check\nblack .\nmkdocs serve                    # docs at http://127.0.0.1:8000\n```\n\nDatabricks live-integration tests are gated by the `integration` marker and skipped unless `DATABRICKS_HOST` is set.\n\n---\n\n## Release pipeline\n\nThe version in [`python/pyproject.toml`](python/pyproject.toml) is the single source of truth.\n\n| Workflow | Builds | Triggers |\n|---|---|---|\n| [`publish.yml`](.github/workflows/publish.yml) | `ygg` sdist + pure-Python wheel → PyPI, then tags `vX.Y.Z` and cuts a GitHub Release | push to `main` touching `python/src/**`, `pyproject.toml`, README, LICENSE, or workflow itself |\n| [`docs.yml`](.github/workflows/docs.yml) | MkDocs Material site → GitHub Pages (https://platob.github.io/Yggdrasil/) | push to `main` touching `python/docs/**`, `python/src/**`, `mkdocs.yml`, or workflow itself |\n\nDo not push to `main` from an agent session — develop on a branch and open a PR.\n\n---\n\n## License\n\n[Apache-2.0](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplatob%2Fyggdrasil","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fplatob%2Fyggdrasil","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplatob%2Fyggdrasil/lists"}