{"id":51433163,"url":"https://github.com/tha-guy-nate/tha-csv-runner","last_synced_at":"2026-07-05T05:03:45.604Z","repository":{"id":357233306,"uuid":"1235301261","full_name":"tha-guy-nate/tha-csv-runner","owner":"tha-guy-nate","description":"A Tabular Helper API library that reads and writes CSVs with progress tracking, header validation, and structured per-row errors.","archived":false,"fork":false,"pushed_at":"2026-06-30T02:24:11.000Z","size":115,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-30T04:14:48.951Z","etag":null,"topics":["cli","csv","data-processing","python","tabular-helper"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tha-guy-nate.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-11T07:38:37.000Z","updated_at":"2026-06-30T02:24:15.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/tha-guy-nate/tha-csv-runner","commit_stats":null,"previous_names":["tha-guy-nate/tha-csv-runner"],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/tha-guy-nate/tha-csv-runner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tha-guy-nate%2Ftha-csv-runner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tha-guy-nate%2Ftha-csv-runner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tha-guy-nate%2Ftha-csv-runner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tha-guy-nate%2Ftha-csv-runner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tha-guy-nate","download_url":"https://codeload.github.com/tha-guy-nate/tha-csv-runner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tha-guy-nate%2Ftha-csv-runner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35143815,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-05T02:00:06.290Z","response_time":100,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","csv","data-processing","python","tabular-helper"],"created_at":"2026-07-05T05:03:44.032Z","updated_at":"2026-07-05T05:03:45.582Z","avatar_url":"https://github.com/tha-guy-nate.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# tha-csv-runner\n\n[![CI](https://github.com/tha-guy-nate/tha-csv-runner/actions/workflows/ci.yml/badge.svg)](https://github.com/tha-guy-nate/tha-csv-runner/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/tha-guy-nate/tha-csv-runner/graph/badge.svg)](https://codecov.io/gh/tha-guy-nate/tha-csv-runner)\n[![PyPI](https://img.shields.io/pypi/v/tha-csv-runner)](https://pypi.org/project/tha-csv-runner/)\n[![Python](https://img.shields.io/pypi/pyversions/tha-csv-runner)](https://pypi.org/project/tha-csv-runner/)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)\n[![wheel size](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftha-csv-runner%2Fjson\u0026label=wheel%20size\u0026query=%24.urls%5B0%5D.size\u0026suffix=%20B)](https://pypi.org/project/tha-csv-runner/#files)\n\nA Tabular Helper API library that reads and writes CSVs with progress tracking, header validation, and structured per-row errors. Runs a function against every row — with a progress bar, required header validation, and structured error capture per row.\n\n## Install\n\n```bash\npip install tha-csv-runner\n```\n\n## Quick start\n\n```python\nfrom tha_csv_runner import ThaCSV\n\ndef process(row: dict) -\u003e None:\n    \"\"\"Raise any exception to mark the row as an error. Return value is ignored.\"\"\"\n    if not row[\"email\"].endswith(\"@example.com\"):\n        raise ValueError(\"invalid email domain\")\n\nrunner = ThaCSV()\n\nrows = runner.read(\"Step 1 of 2\", \"data.csv\", [\"name\", \"email\"], process)\nrunner.write(\"Step 2 of 2\", \"output.csv\")\n```\n\n## How it works\n\n1. Opens the CSV and validates that all `required_headers` are present — raises immediately if any are missing\n2. Iterates every row with a `tqdm` progress bar labelled with `desc`\n3. Calls your `validator(row)` function — if it raises, that row is marked as an error and processing continues\n4. Appends three columns to every row: `row number`, `row status`, and `message`\n   - `row number` starts at 2 (row 1 is the header)\n   - On success: `row status` and `message` are blank\n   - On error: `row status = \"error\"`, `message = str(exception)`\n5. `write()` writes all rows (success and error) to a CSV\n\n## API\n\n### `ThaCSV`\n\n```python\nThaCSV(\n    delimiter=\",\",        # optional — pass \"\\t\" for TSV, or any single-character separator\n    encoding=\"utf-8\",     # optional — pass \"cp1252\" or \"latin-1\" for Excel exports\n)\n```\n\n### `runner.read()`\n\n```python\nrunner.read(\n    \"Step 1 of 2\",           # progress bar label — pass None to use the filename\n    \"data.csv\",              # path to input CSV\n    [\"a\", \"b\"],              # columns that must exist — raises CsvError if missing\n    validator=my_func,       # optional: callable(row: dict) -\u003e None\n    enrich=True,             # optional: set False to skip row number/status/message columns\n)\n```\n\nReads and processes all rows. Returns the rows as a `list[dict]` (same object as `runner.rows`).\n\nThe `validator` is designed for **offline, in-memory checks** — field presence, format, business rules. It runs synchronously on each row; don't use it for API calls or database lookups.\n\nWhen `enrich=False`, validator exceptions are re-raised instead of captured.\n\n### `runner.write()`\n\n```python\nrunner.write(\n    \"Step 2 of 2\",                     # progress bar label — pass None for \"Writing {stem} CSV\"\n    output_path=\"output.csv\",          # optional — auto-named input_processed_TIMESTAMP.csv if omitted\n    rows=my_rows,                      # optional — use these rows instead of runner.rows\n    sort_by=\"name\",                    # optional — column name, or list of column names\n    ascending=True,                    # optional — bool or list of bools matching sort_by\n    column_order=[\"name\", \"email\"],    # optional — listed columns come first, rest follow\n    keep=[\"name\", \"email\"],            # optional — keep only these columns (mutually exclusive with drop)\n    drop=[\"row number\"],               # optional — remove these columns (mutually exclusive with keep)\n    chunk_size=1000,                   # optional — split output into files of this many rows\n)\n```\n\nPrints `✅ Done! CSV was written to: {path}` on completion. Override by setting `runner.status_cb = my_fn`.\n\nReturns the `Path` that was written, or a `list[Path]` when `chunk_size` is set.\n\n#### `chunk_size`\n\nWhen provided, `write()` splits the output into multiple files named `output_001.csv`, `output_002.csv`, etc. and returns a `list[Path]`.\n\n```python\npaths = runner.write(\"Step 2 of 2\", \"output.csv\", chunk_size=1000)\n# [\"output_001.csv\", \"output_002.csv\", ...]\n```\n\n## Alternatives\n\nThis library is intentionally limited in scope — it handles row-by-row processing with error capture and a progress bar, not data analysis or transformation. For heavier workloads:\n\n- [**pandas**](https://pandas.pydata.org) — the standard for CSV processing and in-memory data manipulation; use when you need filtering, grouping, joins, or vectorized operations\n- [**polars**](https://pola.rs) — faster alternative to pandas for large files with a cleaner API and lazy evaluation\n- [**csv**](https://docs.python.org/3/library/csv.html) (stdlib) — raw CSV reading/writing with no dependencies; sufficient when you don't need progress tracking or structured error capture\n\nChoose this library when you need per-row error capture with `row status` and `message` columns baked in — pandas and polars process data, they don't track individual row failures.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftha-guy-nate%2Ftha-csv-runner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftha-guy-nate%2Ftha-csv-runner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftha-guy-nate%2Ftha-csv-runner/lists"}