https://github.com/tha-guy-nate/tha-csv-runner
A Tabular Helper API library that reads and writes CSVs with progress tracking, header validation, and structured per-row errors.
https://github.com/tha-guy-nate/tha-csv-runner
cli csv data-processing python tabular-helper
Last synced: about 13 hours ago
JSON representation
A Tabular Helper API library that reads and writes CSVs with progress tracking, header validation, and structured per-row errors.
- Host: GitHub
- URL: https://github.com/tha-guy-nate/tha-csv-runner
- Owner: tha-guy-nate
- License: other
- Created: 2026-05-11T07:38:37.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-06-30T02:24:11.000Z (6 days ago)
- Last Synced: 2026-06-30T04:14:48.951Z (6 days ago)
- Topics: cli, csv, data-processing, python, tabular-helper
- Language: Python
- Size: 112 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# tha-csv-runner
[](https://github.com/tha-guy-nate/tha-csv-runner/actions/workflows/ci.yml)
[](https://codecov.io/gh/tha-guy-nate/tha-csv-runner)
[](https://pypi.org/project/tha-csv-runner/)
[](https://pypi.org/project/tha-csv-runner/)
[](https://github.com/pre-commit/pre-commit)
[](https://pypi.org/project/tha-csv-runner/#files)
A Tabular Helper API library that reads and writes CSVs with progress tracking, header validation, and structured per-row errors. Runs a function against every row — with a progress bar, required header validation, and structured error capture per row.
## Install
```bash
pip install tha-csv-runner
```
## Quick start
```python
from tha_csv_runner import ThaCSV
def process(row: dict) -> None:
"""Raise any exception to mark the row as an error. Return value is ignored."""
if not row["email"].endswith("@example.com"):
raise ValueError("invalid email domain")
runner = ThaCSV()
rows = runner.read("Step 1 of 2", "data.csv", ["name", "email"], process)
runner.write("Step 2 of 2", "output.csv")
```
## How it works
1. Opens the CSV and validates that all `required_headers` are present — raises immediately if any are missing
2. Iterates every row with a `tqdm` progress bar labelled with `desc`
3. Calls your `validator(row)` function — if it raises, that row is marked as an error and processing continues
4. Appends three columns to every row: `row number`, `row status`, and `message`
- `row number` starts at 2 (row 1 is the header)
- On success: `row status` and `message` are blank
- On error: `row status = "error"`, `message = str(exception)`
5. `write()` writes all rows (success and error) to a CSV
## API
### `ThaCSV`
```python
ThaCSV(
delimiter=",", # optional — pass "\t" for TSV, or any single-character separator
encoding="utf-8", # optional — pass "cp1252" or "latin-1" for Excel exports
)
```
### `runner.read()`
```python
runner.read(
"Step 1 of 2", # progress bar label — pass None to use the filename
"data.csv", # path to input CSV
["a", "b"], # columns that must exist — raises CsvError if missing
validator=my_func, # optional: callable(row: dict) -> None
enrich=True, # optional: set False to skip row number/status/message columns
)
```
Reads and processes all rows. Returns the rows as a `list[dict]` (same object as `runner.rows`).
The `validator` is designed for **offline, in-memory checks** — field presence, format, business rules. It runs synchronously on each row; don't use it for API calls or database lookups.
When `enrich=False`, validator exceptions are re-raised instead of captured.
### `runner.write()`
```python
runner.write(
"Step 2 of 2", # progress bar label — pass None for "Writing {stem} CSV"
output_path="output.csv", # optional — auto-named input_processed_TIMESTAMP.csv if omitted
rows=my_rows, # optional — use these rows instead of runner.rows
sort_by="name", # optional — column name, or list of column names
ascending=True, # optional — bool or list of bools matching sort_by
column_order=["name", "email"], # optional — listed columns come first, rest follow
keep=["name", "email"], # optional — keep only these columns (mutually exclusive with drop)
drop=["row number"], # optional — remove these columns (mutually exclusive with keep)
chunk_size=1000, # optional — split output into files of this many rows
)
```
Prints `✅ Done! CSV was written to: {path}` on completion. Override by setting `runner.status_cb = my_fn`.
Returns the `Path` that was written, or a `list[Path]` when `chunk_size` is set.
#### `chunk_size`
When provided, `write()` splits the output into multiple files named `output_001.csv`, `output_002.csv`, etc. and returns a `list[Path]`.
```python
paths = runner.write("Step 2 of 2", "output.csv", chunk_size=1000)
# ["output_001.csv", "output_002.csv", ...]
```
## Alternatives
This library is intentionally limited in scope — it handles row-by-row processing with error capture and a progress bar, not data analysis or transformation. For heavier workloads:
- [**pandas**](https://pandas.pydata.org) — the standard for CSV processing and in-memory data manipulation; use when you need filtering, grouping, joins, or vectorized operations
- [**polars**](https://pola.rs) — faster alternative to pandas for large files with a cleaner API and lazy evaluation
- [**csv**](https://docs.python.org/3/library/csv.html) (stdlib) — raw CSV reading/writing with no dependencies; sufficient when you don't need progress tracking or structured error capture
Choose this library when you need per-row error capture with `row status` and `message` columns baked in — pandas and polars process data, they don't track individual row failures.
## License
MIT