{"id":50509715,"url":"https://github.com/dmatking/dtlab","last_synced_at":"2026-06-02T19:01:12.565Z","repository":{"id":314209238,"uuid":"1054579401","full_name":"dmatking/dtlab","owner":"dmatking","description":"Date Time Lab ","archived":false,"fork":false,"pushed_at":"2026-04-03T16:48:36.000Z","size":44,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-03T19:41:12.540Z","etag":null,"topics":["csv","data-analysis","data-quality","datetime","python","timezone"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dmatking.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-11T03:43:22.000Z","updated_at":"2026-04-03T16:48:39.000Z","dependencies_parsed_at":"2025-09-11T07:51:26.234Z","dependency_job_id":"db1a8b95-074d-4d52-ab60-8cfd5a347324","html_url":"https://github.com/dmatking/dtlab","commit_stats":null,"previous_names":["dmatking/dtlab"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dmatking/dtlab","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmatking%2Fdtlab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmatking%2Fdtlab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmatking%2Fdtlab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmatking%2Fdtlab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dmatking","download_url":"https://codeload.github.com/dmatking/dtlab/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmatking%2Fdtlab/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33833277,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-02T02:00:07.132Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","data-analysis","data-quality","datetime","python","timezone"],"created_at":"2026-06-02T19:01:11.560Z","updated_at":"2026-06-02T19:01:12.555Z","avatar_url":"https://github.com/dmatking.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dtlab — datetime equivalence detector\n\nFinds datetime columns in a CSV or DataFrame that represent the same instant in different formats or timezones, and groups them together.\n\nUseful when working with wide tables that have many timestamp columns — epoch seconds alongside ISO strings alongside RFC5322 headers, all meaning the same thing.\n\n## What it detects\n\n- ISO 8601 (with Z, numeric offsets, naive)\n- RFC 5322 / HTTP-date\n- Epoch integers (seconds, milliseconds, microseconds, nanoseconds — inferred by magnitude)\n- Slash-style dates (MDY/DMY, with ambiguity flagging)\n- ISO week / ordinal formats\n- TZ abbreviations (PST/PDT, CST/CDT, EST/EDT, MST/MDT, UTC/GMT)\n\n## Install\n\n```bash\npip install pandas numpy python-dateutil\n```\n\nRequires Python 3.9+ (uses `zoneinfo`).\n\n## CLI\n\n```bash\npython dt_equivalence.py --in data.csv\n```\n\n```\n=== dt-equivalence report ===\nSource: data.csv\n\n-- Detected columns --\ncolumn         | role        | format         | unit | parse_rate | parser                       | naive_policy | notes\n...\n\n-- Equivalence groups (tol=1s, min_overlap=100, min_match_ratio=98%) --\n  Group 1: ts_iso_utc, ts_iso_cdt, ts_iso_pdt, ts_epoch_s, ts_epoch_ms, ts_rfc5322\n  Singletons (no match): ts_naive_local\n```\n\nAlso writes `data.dt_report.json` with full pairwise details.\n\n### Options\n\n| Flag                | Default  | Description                                                       |\n| ------------------- | -------- | ----------------------------------------------------------------- |\n| `--in`              | required | Input CSV path                                                    |\n| `--delimiter`       | auto     | CSV delimiter                                                     |\n| `--naive-tz`        | UTC      | IANA timezone for naive datetime strings (e.g. `America/Chicago`) |\n| `--encoding`        | utf-8    | File encoding                                                     |\n| `--max-rows`        | all      | Limit rows read                                                   |\n| `--tolerance`       | 1        | Max seconds difference to consider two timestamps equivalent      |\n| `--min-overlap`     | 100      | Minimum non-null row overlap required to compare two columns      |\n| `--min-match-ratio` | 0.98     | Fraction of overlapping rows that must match within tolerance     |\n| `--include-columns` | all      | Comma-separated list of columns to analyze                        |\n| `--exclude-columns` | none     | Comma-separated list of columns to skip                           |\n| `--preview`         | off      | Write a normalized UTC preview CSV (first 50 rows)                |\n\n## Notebook / script API\n\n```python\nimport pandas as pd\nfrom dt_equivalence import analyze\n\ndf = pd.read_parquet(\"events.parquet\")\nresult = analyze(df, naive_tz=\"America/Chicago\")\n\nresult.report()              # print text report\nresult.summary()             # pd.DataFrame of column metadata\nresult.normalized()          # pd.DataFrame of detected columns as UTC ISO strings\nresult.equivalent_groups()   # list of groups with 2+ members\nresult.groups                # all groups including singletons\nresult.parsed                # dict of col → UTC pd.Series\nresult.sim                   # pairwise {overlap, match_ratio, equivalent}\n```\n\n`analyze()` accepts the same parameters as the CLI flags:\n\n```python\nresult = analyze(\n    df,\n    naive_tz=\"America/New_York\",\n    tolerance_seconds=5,\n    min_overlap=50,\n    min_match_ratio=0.95,\n    include_columns=[\"created_at\", \"event_ts\", \"ts_epoch\"],\n    exclude_columns=[\"id\"],\n)\n```\n\n## How equivalence works\n\nAll detected columns are normalized to UTC. Two columns are considered equivalent if:\n\n1. They share at least `min_overlap` non-null rows (or 1% of total rows for large files)\n2. At least `min_match_ratio` of those rows have timestamps within `tolerance_seconds` of each other\n\nGrouping uses union-find, so transitivity is handled correctly (if A≡B and B≡C, all three end up in the same group).\n\n## Caveats\n\n- **Naive timestamps**: without `--naive-tz`, naive strings are assumed to be UTC. If your data has naive local times, set `--naive-tz` to get correct grouping.\n- **Ambiguous slash dates**: `03/08/2025` is ambiguous (MDY vs DMY). These are flagged in the notes column but still parsed by pandas using its default interpretation.\n- **Floating-point epoch loss**: epoch values stored as floats with few significant digits may not match string timestamps exactly — raise `--tolerance` if needed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmatking%2Fdtlab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdmatking%2Fdtlab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmatking%2Fdtlab/lists"}