{"id":50834990,"url":"https://github.com/prakulhiremath/flashback","last_synced_at":"2026-06-14T02:32:53.858Z","repository":{"id":360866562,"uuid":"1252049900","full_name":"prakulhiremath/flashback","owner":"prakulhiremath","description":"⏪ Git for DataFrames. Time-travel debugging, exact temporal lineage, and feature evolution tracking for Pandas and Polars.","archived":false,"fork":false,"pushed_at":"2026-06-07T16:59:00.000Z","size":100,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-07T18:26:38.775Z","etag":null,"topics":["data-lineage","data-versioning","mlops","polars","time-travel"],"latest_commit_sha":null,"homepage":"http://aliensonearth.in/flashback/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/prakulhiremath.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-28T06:30:18.000Z","updated_at":"2026-06-07T17:00:29.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/prakulhiremath/flashback","commit_stats":null,"previous_names":["prakulhiremath/flashback"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/prakulhiremath/flashback","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prakulhiremath%2Fflashback","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prakulhiremath%2Fflashback/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prakulhiremath%2Fflashback/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prakulhiremath%2Fflashback/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/prakulhiremath","download_url":"https://codeload.github.com/prakulhiremath/flashback/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prakulhiremath%2Fflashback/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34307683,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-lineage","data-versioning","mlops","polars","time-travel"],"created_at":"2026-06-14T02:32:52.212Z","updated_at":"2026-06-14T02:32:53.853Z","avatar_url":"https://github.com/prakulhiremath.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ⚡ flashback\n\n\u003e **Git for Datasets** — time-travel debugging and transformation lineage tracking for pandas \u0026 Polars.\n\n[![CI](https://github.com/flashback-dev/flashback/actions/workflows/ci.yml/badge.svg)](https://github.com/flashback-dev/flashback/actions)\n[![PyPI](https://img.shields.io/pypi/v/flashback.svg)](https://pypi.org/project/flashback)\n[![Python](https://img.shields.io/pypi/pyversions/flashback.svg)](https://pypi.org/project/flashback)\n[![Coverage](https://codecov.io/gh/prakulhiremath/flashback/branch/main/graph/badge.svg?token=XXXX)](https://codecov.io/gh/prakulhiremath/flashback)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20440635.svg)](https://doi.org/10.5281/zenodo.20440635)\n[![Medium](https://img.shields.io/badge/Medium-Read%20the%20Story-12100E?style=flat\u0026logo=medium\u0026logoColor=white)](https://medium.com/@prakulhiremath/the-6-hour-training-job-mystery-why-flashback-changes-everything-for-data-engineers-e81290fbdc84)\n[![PyPI Downloads](https://static.pepy.tech/personalized-badge/flashback-df?period=total\u0026units=INTERNATIONAL_SYSTEM\u0026left_color=BLACK\u0026right_color=GREEN\u0026left_text=downloads)](https://pepy.tech/projects/flashback-df)\n\n```\n📂 load  ──▶  🔍 filter  ──▶  ➕ with_columns  ──▶  ⏪ lag  ──▶  HEAD\n                  │\n              (before-lag)  ◀── fb.checkout(\"before-lag\")\n```\n\n---\n\n## Why this exists\n\nEvery ML researcher has asked: **\"Why did my metric change?\"** Nobody knows.\n\nYou ran a 6-hour training job, the Sharpe ratio dropped from 1.4 to 0.9, and\nsomewhere between the raw tick data and the feature matrix a silent\ntransformation introduced look-ahead bias. You have no idea where.\n\n**DVC is too heavy** — it versions entire files with S3 backends, CI pipelines,\nand YAML configs.  You don't want to learn a new orchestration system; you\nwant to know what happened to column `price_lag1` between step 3 and step 7.\n\n**Git doesn't understand columns.** `git diff` on a Parquet file is binary\nnoise.  It cannot tell you \"this `.filter()` removed 412 rows\" or \"this\n`.with_columns()` introduced a null in 3% of rows.\"\n\n**flashback fixes this.**\n\nIt wraps your DataFrame in a zero-cost proxy that records every transformation\nas a node in an in-memory Directed Acyclic Graph (DAG).  Each node is\nidentified by a deterministic SHA-256 hash of the schema + operation\narguments, giving you:\n\n- **Instant time-travel** — `fb.checkout(\"before-lag\")` returns the exact\n  frame at that checkpoint with no I/O unless you ask for it.\n- **Structural diffing** — `frame.diff(other)` shows you exactly which rows\n  were added or removed between any two checkpoints.\n- **Beautiful lineage views** — `fb.visualize()` renders a `rich`-powered\n  git-log-style tree in your terminal, or an SVG graph in Jupyter.\n- **Reproducibility** — identical transformations applied to identical data\n  always produce the same node ID — transformations are deterministic by\n  construction.\n\n---\n\n## Install\n\n```bash\npip install flashback-df\n# or, if you use uv (recommended):\nuv add flashback\n```\n\n**Requirements:** Python ≥ 3.10, Polars ≥ 0.20, pandas ≥ 2.0.\n\n---\n\n## Quickstart\n\n```python\nimport flashback as fb\n\n# ── 1. Load any source ──────────────────────────────────────────────────────\ndf = fb.load(\"trades.parquet\")          # Parquet\ndf = fb.load(\"prices.csv\")             # CSV\ndf = fb.load(my_polars_df)             # existing Polars DataFrame\ndf = fb.load(my_pandas_df)             # existing Pandas DataFrame\n\n# ── 2. Transform — every step is recorded automatically ─────────────────────\ndf = df.filter(fb.col(\"price\") \u003e 0)\ndf = df.with_columns(\n    (fb.col(\"price\") * fb.col(\"volume\")).alias(\"notional\")\n)\n\n# Tag a checkpoint before the next risky operation.\ndf = df.tag(\"before-lag\")\n\ndf = df.lag(\"price\", 1)               # sugar for shift(-1) + tracking\ndf = df.rolling_mean(\"notional\", 5)\n\n# ── 3. Time-travel ──────────────────────────────────────────────────────────\ndf_clean = fb.checkout(\"before-lag\")  # ← instant; no disk I/O\n\n# ── 4. See what broke your Sharpe ratio ─────────────────────────────────────\nfb.visualize()\n```\n\nTerminal output:\n\n```\n╭─ flashback lineage  •  4 commits  •  HEAD → rolling_mean ──────────────────╮\n│                                                                             │\n│  📂 LOAD  5,000 rows × 4 cols  [14:03:01]                                  │\n│  │                                                                          │\n│  ├─ 🔍 filter  arg_0=...col(\"price\")...  4,823 rows × 4 cols  #a1b2c3d4   │\n│  │                                                                          │\n│  ├─ ➕ with_columns  arg_0=...alias(\"notional\")  4,823 rows × 5  #e5f6a7  │\n│  │                                                                          │\n│  ├─ ⏪ lag  column='price'  n=1  4,823 rows × 6  [before-lag]  #b8c9d0    │\n│  │                                                                          │\n│  └─ 📈 rolling_mean  window=5  4,823 rows × 7 ● HEAD  #01e2f3a4           │\n│                                                                             │\n╰─────────────────────────────────────────────────────────────────────────────╯\n```\n\n---\n\n## API Reference\n\n### `fb.load(source, *, label=None, track=True)`\n\nLoad a DataFrame from a file path, Polars DataFrame, or Pandas DataFrame and\nbegin tracking its lineage.\n\n| Param | Type | Description |\n|-------|------|-------------|\n| `source` | `str \\| pl.DataFrame \\| pd.DataFrame \\| FlashbackFrame` | Data source |\n| `label` | `str \\| None` | Human-readable root label (default: filename stem or `\"root\"`) |\n| `track` | `bool` | Register with the global registry (default: `True`) |\n\n**Supported formats:** `.parquet`, `.csv`, `.json`, `.ndjson`, `.ipc`, `.arrow`\n\n---\n\n### `fb.col(name)`\n\nAlias for `polars.col`.  Use inside transform chains for IDE-friendly imports:\n\n```python\ndf = df.filter(fb.col(\"price\") \u003e 0)\n```\n\n---\n\n### `fb.commit(frame, label, *, message=\"\")`\n\nTag the current state of `frame` with a human-readable label — analogous to\n`git tag`.\n\n```python\ndf = fb.commit(df, \"before-normalise\", message=\"Raw features, no scaling\")\n```\n\nOr use the method form:\n\n```python\ndf = df.tag(\"before-normalise\", message=\"Raw features, no scaling\")\n```\n\n---\n\n### `fb.checkout(label, *, frame=None)`\n\nTime-travel to a named checkpoint.  Returns a new `FlashbackFrame` at that\nexact state, fully materialised.\n\n```python\ndf_original = fb.checkout(\"before-normalise\")\n```\n\nIf `frame` is provided, searches only that frame's lineage.  Otherwise,\nsearches the global registry.\n\n---\n\n### `fb.visualize(frame=None, *, style=\"tree\", max_width=120)`\n\nRender the transformation lineage.\n\n- `style=\"tree\"` — rich tree with icons, timestamps, shapes, node IDs.\n- `style=\"dag\"` — compact ASCII graph (`git log --graph` style).\n- In Jupyter, automatically falls back to an SVG/HTML widget.\n\n---\n\n### `FlashbackFrame.lag(column, n=1, *, alias=None)`\n\nShift `column` by `n` periods with a tracked checkpoint.\n\n```python\ndf = df.lag(\"price\", 1)                    # → price_lag1\ndf = df.lag(\"price\", 3, alias=\"price_t3\")  # → price_t3\n```\n\n---\n\n### `FlashbackFrame.rolling_mean(column, window, *, alias=None, min_periods=None)`\n\nRolling mean over `window` periods with lineage tracking.\n\n```python\ndf = df.rolling_mean(\"notional\", 20)  # → notional_rmean20\n```\n\n---\n\n### `FlashbackFrame.diff(other)`\n\nStructural diff between two frames.  Returns a Polars DataFrame with a `_diff`\ncolumn of `\"added\"` / `\"removed\"`.\n\n```python\ndelta = df_now.diff(df_old)\nprint(delta.filter(pl.col(\"_diff\") == \"removed\"))\n```\n\n---\n\n### `FlashbackFrame.history()`\n\nReturn the full transformation chain as a list of dicts (root → HEAD):\n\n```python\nfor step in df.history():\n    print(step[\"op_name\"], step[\"shape\"], step[\"label\"])\n```\n\n---\n\n## Persistence\n\nLineage graphs can be saved to and loaded from disk:\n\n```python\nfrom flashback.storage import Storage\n\nstore = Storage(\".flashback\")  # or Storage.from_cwd()\nstore.save(df, frame_id=\"experiment-001\")\n\n# Later, in another session:\ndf = store.load(\"experiment-001\")\n```\n\nThe `.flashback/` directory layout:\n\n```\n.flashback/\n├── config.json\n├── graphs/\n│   └── experiment-001.json   # serialised DAG\n└── cache/\n    └── \u003cnode_id\u003e.parquet     # materialised node snapshots\n```\n\n---\n\n## How it works\n\n```\n┌──────────────────────────────────────────────────────────┐\n│  FlashbackFrame                                          │\n│                                                          │\n│  ┌──────────────┐    intercept    ┌───────────────────┐  │\n│  │  Polars API  │ ─────────────▶ │   LineageDAG      │  │\n│  │  .filter()   │                │                   │  │\n│  │  .sort()     │  record node   │  root ──▶ filter  │  │\n│  │  .join()     │ ◀──────────── │         ──▶ sort  │  │\n│  └──────────────┘                │         ──▶ join  │  │\n│         │                        └───────────────────┘  │\n│         ▼                                               │\n│  polars.DataFrame  (unchanged; Polars still optimises)  │\n└──────────────────────────────────────────────────────────┘\n```\n\n**Node identity** is a 20-character hex SHA-256 of:\n```json\n{\n  \"parents\": [\"\u003cparent_node_id\u003e\"],\n  \"op\": \"filter\",\n  \"kwargs\": {\"arg_0\": \"[(col(\\\"price\\\")) \u003e (0)]\"},\n  \"schema\": {\"id\": \"Int64\", \"price\": \"Float64\", ...}\n}\n```\n\nThis means:\n- Identical pipelines on identical data always hash to the same node → instant\n  cache hits.\n- Changing *any* argument or parent state produces a *different* hash → no\n  silent collisions.\n\n---\n\n## Development\n\n```bash\ngit clone https://github.com/flashback-dev/flashback\ncd flashback\npip install -e \".[dev]\"\n\n# Lint\nruff check flashback tests\nruff format --check flashback tests\n\n# Type-check\nmypy flashback\n\n# Test with coverage\npytest\n```\n\nThe CI matrix runs across **Ubuntu × macOS × Windows** and **Python 3.10 –\n3.13** with a hard 90% coverage threshold.\n\n---\n\n## Roadmap\n\n- [ ] **Branching** — `fb.branch(\"experiment-A\")` for parallel pipeline exploration\n- [ ] **Merge** — reconcile two branches at the DAG level\n- [ ] **Remote storage** — push/pull lineage graphs to S3 / GCS\n- [ ] **Streaming Polars** — track lazy plans before `.collect()`\n- [ ] **Notebook integration** — `%load_ext flashback` magic with live DAG sidebar\n- [ ] **Export to DVC** — generate `.dvc` stage files from a flashback DAG\n\n---\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n\n---\n\n\u003cp align=\"center\"\u003e\n  Built with \u003ca href=\"https://pola.rs\"\u003ePolars\u003c/a\u003e · \u003ca href=\"https://github.com/Textualize/rich\"\u003eRich\u003c/a\u003e · \u003ca href=\"https://networkx.org\"\u003eNetworkX\u003c/a\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprakulhiremath%2Fflashback","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprakulhiremath%2Fflashback","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprakulhiremath%2Fflashback/lists"}