{"id":43582183,"url":"https://github.com/nicopon/dtpipe","last_synced_at":"2026-02-20T16:02:45.508Z","repository":{"id":333003563,"uuid":"1135561729","full_name":"nicopon/dtpipe","owner":"nicopon","description":"A simple, self-contained CLI for performance-focused data streaming \u0026 anonymization.","archived":false,"fork":false,"pushed_at":"2026-02-03T22:03:58.000Z","size":706,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-04T10:54:33.263Z","etag":null,"topics":["cli","csv","data-masking","database","dotnet","duckdb","etl","oracle","parquet","postgresql","sql-server","sqlite"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nicopon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-16T09:18:57.000Z","updated_at":"2026-02-03T22:04:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/nicopon/dtpipe","commit_stats":null,"previous_names":["nicopon/querydump","nicopon/dtpipe"],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/nicopon/dtpipe","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicopon%2Fdtpipe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicopon%2Fdtpipe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicopon%2Fdtpipe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicopon%2Fdtpipe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nicopon","download_url":"https://codeload.github.com/nicopon/dtpipe/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicopon%2Fdtpipe/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29199519,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-07T14:35:27.868Z","status":"ssl_error","status_checked_at":"2026-02-07T14:25:51.081Z","response_time":63,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","csv","data-masking","database","dotnet","duckdb","etl","oracle","parquet","postgresql","sql-server","sqlite"],"created_at":"2026-02-04T00:00:42.855Z","updated_at":"2026-02-20T16:02:45.501Z","avatar_url":"https://github.com/nicopon.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DtPipe\n\n**A simple, self-contained CLI for performance-focused data streaming \u0026 anonymization.**\n\nDtPipe streams data from any source (SQL, CSV, Parquet) to any destination, applying intelligent transformations on the fly. It is designed for CI/CD pipelines, test data generation, and large dataset migration.\n\n---\n\n### 🚀 [**See the COOKBOOK for Recipes \u0026 Examples**](./COOKBOOK.md) 🍳\n*Go here for Anonymization guides, Pipeline examples, and detailed tutorials.*\n\n---\n\n## Capabilities\n\n- **Streaming Architecture**: Handles millions of rows with constant, low memory usage.\n- **Multi-Provider**: Native support for **Oracle**, **SQL Server**, **PostgreSQL**, **DuckDB**, **SQLite**, **Parquet**, and **CSV**.\n- **Zero Dependencies**: Single static binary. No drivers to install.\n- **Anonymization Engine**: Built-in **Bogus** integration to fake Names, Emails, IBANs, and more.\n- **Pipeline Transformation**: Mask, Nullify, Format, or Script (JS) data during export.\n- **Production Ready**: YAML job configuration, Environment variable support, and robust logging.\n\n## Installation\n\n### .NET Global Tool (Recommended)\nYou can install DtPipe as a global tool if you have the .NET SDK installed.\n\n```bash\ndotnet tool install -g dtpipe --prerelease\ndtpipe --help\n```\n\n### Build from Source\n**Prerequisite:** [.NET 10 SDK](https://dotnet.microsoft.com/download/dotnet/10.0) is required to compile.\n\n```bash\n# Bash (Mac/Linux/Windows Git Bash)\n./build.sh\n\n# PowerShell (Windows/Cross-platform)\n./build.ps1\n```\n\nBinary created at: `./dist/release/dtpipe`\n\n\u003e **Note:** The pre-compiled binaries in [GitHub Releases](https://github.com/nicopon/DtPipe/releases) are **self-contained**. You do NOT need to install .NET to run them.\n\n\u003e **Developers:** Want to use DtPipe programmatically via NuGet packages? Check out the **[src/DtPipe.Sample](./src/DtPipe.Sample)** project for a hands-on API example.\n\n## Quick Reference\n\n### CLI Usage\n\n```bash\ndtpipe --input [SOURCE] --query [SQL] --output [DEST] [OPTIONS]\n```\n\n\n### 1. Connection Strings (Input \u0026 Output)\n\nDtPipe auto-detects providers from file extensions (`.csv`, `.parquet`, `.duckdb`, `.sqlite`) or explicit prefixes.\n\n| Provider | Prefix / Format | Example |\n|:---|:---|:---|\n| **DuckDB** | `duck:` | `duck:my.duckdb` |\n| **SQLite** | `sqlite:` | `sqlite:data.sqlite` |\n| **PostgreSQL**| `pg:` | `pg:Host=localhost;Database=mydb` |\n| **Oracle** | `ora:` | `ora:Data Source=PROD;User Id=scott` |\n| **SQL Server**| `mssql:` | `mssql:Server=.;Database=mydb` |\n| **CSV** | `csv:` / `.csv` | `data.csv` |\n| **JsonL** | `jsonl:` / `.jsonl`| `data.jsonl` |\n| **Apache Arrow** | `arrow:` / `.arrow`| `data.arrow` |\n| **Parquet** | `parquet:` / `.parquet`| `data.parquet` |\n| **Data Gen** | `generate:` | `generate:1000000` (generates `GenerateIndex` column) |\n| **Keyring** | `keyring://` | `keyring://my-prod-db` |\n| **STDIN/OUT** | `csv`, `jsonl`, `arrow` or `parquet` | `csv` (no file path) |\n\n### 2. Anonymization \u0026 Fakers\n\nUse `--fake \"Col:Generator\"` to replace sensitive data.\n*See [COOKBOOK.md](./COOKBOOK.md#anonymization-the-fakers) for more examples.*\n\n| Category | Key Generators |\n|:---|:---|\n| **Identity** | `name.fullName`, `name.firstName`, `internet.email` |\n| **Address** | `address.fullAddress`, `address.city`, `address.zipCode` |\n| **Finance** | `finance.iban`, `finance.creditCardNumber` |\n| **Phone** | `phone.phoneNumber` |\n| **Dates** | `date.past`, `date.future`, `date.recent` |\n| **System** | `random.uuid`, `random.number`, `random.boolean` |\n\n\u003e Use `--fake-list` to print all available generators.\n\n### 3. Positional CLI Option Scoping (Reader vs Writer)\n\nDtPipe resolves options logically based on their position relative to the **output flag (`-o`)**.\n\n* **Global / Reader Scope:** Options placed *before* `-o` apply universally to the pipeline, acting as Reader properties or global pipeline properties.\n* **Writer Scope:** Options placed *after* `-o` specifically target the Writer, overriding global defaults.\n\n```bash\n# Example: Use a comma separator for the Reader, but a semicolon separator for the Writer\ndtpipe -i input.csv --csv-separator \",\" -o output.csv --csv-separator \";\"\n```\n\n### 4. CLI Options Reference\n\n#### Core\n| Flag | Description |\n|:---|:---|\n| `-i`, `--input` | **Required**. Source connection string or file path. |\n| `-q`, `--query` | **Required** (for queryable sources). SQL statement. |\n| `-o`, `--output`| **Required**. Target connection string or file path. |\n| `--limit` | Stop after N rows. |\n| `--batch-size` | Rows per buffer (default: 50,000). |\n| `--dry-run` | Preview data, **validate constraints**, and check schema compatibility. |\n| `--key` | Comma-separated Primary Keys for Upsert/Ignore. Auto-detected from target if omitted. |\n| `--sampling-rate` | Probability 0.0-1.0 to include a row (default: 1.0). |\n| `--sampling-seed` | Seed for sampling (ensures reproducibility). |\n\n#### Automation\n| Flag | Description |\n|:---|:---|\n| `--job [FILE]` | Execute a YAML job file. |\n| `--export-job` | Save current CLI args as a YAML job. |\n| `--log [FILE]` | Write execution statistics to file (Optional). |\n\n#### Transformation Pipeline\n| Flag | Description |\n|:---|:---|\n| `--fake \"[Col]:[Method]\"` | Generate fake data using Bogus. |\n| `--mask \"[Col]:[Pattern]\"` | Mask chars (`#` keeps char, others replace). |\n| `--null \"[Col]\"` | Force column to NULL. |\n| `--overwrite \"[Col]:[Val]\"`| Set column to fixed value. |\n| `--format \"[Col]:[Fmt]\"` | Apply .NET format string. |\n| `--compute \"[Col]:[JS]\"` | Apply Javascript logic on the `row` object. If `[Col]` doesn't exist, it is created as a **new virtual column**. Supports inline code or file paths (`@file.js`). Example: `TITLE:row.TITLE.substring(0,5)` |\n| `--filter \"[JS]\"` | Drop rows based on JS logic (must return true/false). |\n| `--expand \"[JS]\"` | Multi-row expansion. JS expression returning an array. |\n| `--window-count [N]` | Accumulate rows in a window of size N. |\n| `--window-script \"[JS]\"` | Script to execute on window `rows` (must return array). |\n| `--project`, `--drop` | Whitelist or Blacklist columns. |\n\n#### Pipeline Modifiers\n| Flag | Description |\n|:---|:---|\n| `--fake-locale [LOC]` | Locale for fakers (e.g. `fr`, `en_US`). |\n| `--fake-seed-column [COL]`| Make faking deterministic based on a column value. |\n| `--[type]-skip-null` | Skip transformation if value is NULL. |\n\n#### Database Writer Options\n| Flag | Description |\n|:---|:---|\n| `--strategy` | `Append`, `Truncate`, `DeleteThenInsert`, `Recreate`, `Upsert`, `Ignore`. Works for all providers. |\n| `--insert-mode` | `Standard`, `Bulk`. Works for supported providers (SqlSever, Oracle, PostgreSQL). |\n| `--table` | Target table name. Overrides default 'export'. |\n| `--unsafe-query` | Allow non-SELECT queries (use with caution). |\n\n---\n\n## 🔒 Secret Management\n\nDtPipe includes a built-in secret manager that uses your **Operating System's Keyring** (Windows Credential Manager, macOS Keychain, or Linux Secret Service) to store connection strings securely.\n\n### 1. Store a Secret\n```bash\ndtpipe secret set prod-db \"ora:Data Source=PROD;User Id=scott;Password=tiger\"\n```\n\n### 2. Use it in a Transfer\nUse the `keyring://` prefix followed by your alias.\n```bash\ndtpipe -i keyring://prod-db -q \"SELECT * FROM users\" -o users.parquet\n```\n\n### 3. Manage Secrets\n| Command | Description |\n|:---|:---|\n| `dtpipe secret list` | List all stored aliases. |\n| `dtpipe secret get \u003calias\u003e` | Print the secret value (useful for verification). |\n| `dtpipe secret delete \u003calias\u003e`| Delete a specific secret. |\n| `dtpipe secret nuke` | Delete ALL secrets. |\n\n---\n\n\n## Contributing\nWant to add a new database adapter or a custom transformer? Check out the [Developer Guide](./EXTENDING.md).\n\n## License\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicopon%2Fdtpipe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnicopon%2Fdtpipe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicopon%2Fdtpipe/lists"}