https://github.com/tha-guy-nate/tha-csv-runner

A Tabular Helper API library that reads and writes CSVs with progress tracking, header validation, and structured per-row errors.
https://github.com/tha-guy-nate/tha-csv-runner

cli csv data-processing python tabular-helper

Last synced: 21 days ago
JSON representation

A Tabular Helper API library that reads and writes CSVs with progress tracking, header validation, and structured per-row errors.

Host: GitHub
URL: https://github.com/tha-guy-nate/tha-csv-runner
Owner: tha-guy-nate
License: other
Created: 2026-05-11T07:38:37.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-06-30T02:24:11.000Z (26 days ago)
Last Synced: 2026-06-30T04:14:48.951Z (26 days ago)
Topics: cli, csv, data-processing, python, tabular-helper
Language: Python
Size: 112 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          # tha-csv-runner

[![CI](https://github.com/tha-guy-nate/tha-csv-runner/actions/workflows/ci.yml/badge.svg)](https://github.com/tha-guy-nate/tha-csv-runner/actions/workflows/ci.yml)

[![codecov](https://codecov.io/gh/tha-guy-nate/tha-csv-runner/graph/badge.svg)](https://codecov.io/gh/tha-guy-nate/tha-csv-runner)

[![PyPI](https://img.shields.io/pypi/v/tha-csv-runner)](https://pypi.org/project/tha-csv-runner/)

[![Python](https://img.shields.io/pypi/pyversions/tha-csv-runner)](https://pypi.org/project/tha-csv-runner/)

[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)

[![wheel size](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftha-csv-runner%2Fjson&label=wheel%20size&query=%24.urls%5B0%5D.size&suffix=%20B)](https://pypi.org/project/tha-csv-runner/#files)

A Tabular Helper API library that reads and writes CSVs with progress tracking, header validation, and structured per-row errors. Runs a function against every row — with a progress bar, required header validation, and structured error capture per row.

## Install

```bash

pip install tha-csv-runner

```

## Quick start

```python

from tha_csv_runner import ThaCSV

def process(row: dict) -> None:

    """Raise any exception to mark the row as an error. Return value is ignored."""

    if not row["email"].endswith("@example.com"):

        raise ValueError("invalid email domain")

runner = ThaCSV()

rows = runner.read("Step 1 of 2", "data.csv", ["name", "email"], process)

runner.write("Step 2 of 2", "output.csv")

```

## How it works

1. Opens the CSV and validates that all `required_headers` are present — raises immediately if any are missing

2. Iterates every row with a `tqdm` progress bar labelled with `desc`

3. Calls your `validator(row)` function — if it raises, that row is marked as an error and processing continues

4. Appends three columns to every row: `row number`, `row status`, and `message`

   - `row number` starts at 2 (row 1 is the header)

   - On success: `row status` and `message` are blank

   - On error: `row status = "error"`, `message = str(exception)`

5. `write()` writes all rows (success and error) to a CSV

## API

### `ThaCSV`

```python

ThaCSV(

    delimiter=",",        # optional — pass "\t" for TSV, or any single-character separator

    encoding="utf-8",     # optional — pass "cp1252" or "latin-1" for Excel exports

)

```

### `runner.read()`

```python

runner.read(

    "Step 1 of 2",           # progress bar label — pass None to use the filename

    "data.csv",              # path to input CSV

    ["a", "b"],              # columns that must exist — raises CsvError if missing

    validator=my_func,       # optional: callable(row: dict) -> None

    enrich=True,             # optional: set False to skip row number/status/message columns

)

```

Reads and processes all rows. Returns the rows as a `list[dict]` (same object as `runner.rows`).

The `validator` is designed for **offline, in-memory checks** — field presence, format, business rules. It runs synchronously on each row; don't use it for API calls or database lookups.

When `enrich=False`, validator exceptions are re-raised instead of captured.

### `runner.write()`

```python

runner.write(

    "Step 2 of 2",                     # progress bar label — pass None for "Writing {stem} CSV"

    output_path="output.csv",          # optional — auto-named input_processed_TIMESTAMP.csv if omitted

    rows=my_rows,                      # optional — use these rows instead of runner.rows

    sort_by="name",                    # optional — column name, or list of column names

    ascending=True,                    # optional — bool or list of bools matching sort_by

    column_order=["name", "email"],    # optional — listed columns come first, rest follow

    keep=["name", "email"],            # optional — keep only these columns (mutually exclusive with drop)

    drop=["row number"],               # optional — remove these columns (mutually exclusive with keep)

    chunk_size=1000,                   # optional — split output into files of this many rows

)

```

Prints `✅ Done! CSV was written to: {path}` on completion. Override by setting `runner.status_cb = my_fn`.

Returns the `Path` that was written, or a `list[Path]` when `chunk_size` is set.

#### `chunk_size`

When provided, `write()` splits the output into multiple files named `output_001.csv`, `output_002.csv`, etc. and returns a `list[Path]`.

```python

paths = runner.write("Step 2 of 2", "output.csv", chunk_size=1000)

# ["output_001.csv", "output_002.csv", ...]

```

## Alternatives

This library is intentionally limited in scope — it handles row-by-row processing with error capture and a progress bar, not data analysis or transformation. For heavier workloads:

- [**pandas**](https://pandas.pydata.org) — the standard for CSV processing and in-memory data manipulation; use when you need filtering, grouping, joins, or vectorized operations

- [**polars**](https://pola.rs) — faster alternative to pandas for large files with a cleaner API and lazy evaluation

- [**csv**](https://docs.python.org/3/library/csv.html) (stdlib) — raw CSV reading/writing with no dependencies; sufficient when you don't need progress tracking or structured error capture

Choose this library when you need per-row error capture with `row status` and `message` columns baked in — pandas and polars process data, they don't track individual row failures.

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tha-guy-nate/tha-csv-runner

Awesome Lists containing this project

README