{"id":34113226,"url":"https://github.com/alastairtree/crump","last_synced_at":"2026-03-04T01:29:25.423Z","repository":{"id":320518973,"uuid":"1080751000","full_name":"alastairtree/crump","owner":"alastairtree","description":"Python \u0026 CLI tool for getting data from files into a DB fast.","archived":false,"fork":false,"pushed_at":"2026-02-26T13:30:56.000Z","size":5306,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-26T20:13:44.511Z","etag":null,"topics":["cdf-files","csv-files","parquet-files","postgresql","sqlite"],"latest_commit_sha":null,"homepage":"https://alastairtree.github.io/crump/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alastairtree.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-21T20:16:07.000Z","updated_at":"2026-02-26T13:30:13.000Z","dependencies_parsed_at":"2026-02-10T20:03:12.689Z","dependency_job_id":null,"html_url":"https://github.com/alastairtree/crump","commit_stats":null,"previous_names":["alastairtree/claudedemo","alastairtree/crump"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/alastairtree/crump","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alastairtree%2Fcrump","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alastairtree%2Fcrump/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alastairtree%2Fcrump/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alastairtree%2Fcrump/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alastairtree","download_url":"https://codeload.github.com/alastairtree/crump/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alastairtree%2Fcrump/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30068411,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T01:03:42.280Z","status":"ssl_error","status_checked_at":"2026-03-04T01:03:23.410Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdf-files","csv-files","parquet-files","postgresql","sqlite"],"created_at":"2025-12-14T19:10:38.387Z","updated_at":"2026-03-04T01:29:25.406Z","avatar_url":"https://github.com/alastairtree.png","language":"Python","readme":"# Welcome to Crump\n\nExamines and syncs CSV, Parquet, and CDF files into PostgreSQL or SQLite databases in batched files using easy to edit configuration files.\n\n[![CI](https://github.com/alastairtree/crump/workflows/CI/badge.svg)](https://github.com/alastairtree/crump/actions)\n[![Python Version](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)\n[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)\n\n## Overview\n\n**crump** is a command-line tool and Python library for easy syncing CSV, Parquet, and CDF files to PostgreSQL or SQLite databases, and extracxting data from CDF files. It provides a declarative, configuration-based approach to data synchronization with automatic schema management..\n\n## Key Features\n\n### Data File Support\n- **CSV Support**: Read and sync standard CSV files\n- **Native CDF Processing**: Built-in support for Common Data Format (CDF) science files\n- **Automatic Extraction**: Extracts CDF variables to CSV, Parquet, or directly to database\n- **Array Variable Handling**: Automatically expands multi-dimensional array variables\n- **Apache Parquet Support**: Built-in support for Apache Parquet files and sync Parquet files directly to database\n- **Extract to Parquet**: Convert CDF files to Parquet format with `--parquet` flag\n\n### Data Synchronization\n- **Configuration-Based**: Examines your CSV files with the prepare command, and defines sync jobs in YAML with sensible column mappings\n- **Column Mapping**: Sync all columns, rename them, or only sync a subset\n- **Automatic Table Creation**: Creates target tables if they don't exist\n- **Schema Evolution**: Automatically adds new columns as needed, never deletes existing columns. Optionally keeps a history of data changes in a history table.\n- **Index Management**: Suggests and creates database indexes based on column types\n- **Dual Interface**: Use as a CLI tool or import as a Python library\n- **Filename-Based Extraction**: Extract values from filenames (dates, versions, etc.) and store in database columns\n- **Automatic Cleanup**: Delete stale records based on extracted filename values\n- **Compound Primary Keys**: Support for multi-column primary keys\n- **Dry-Run Mode**: Preview all changes without modifying the database\n- **Idempotent Operations**: Safe to run multiple times, uses upsert\n- **Rich Output**: Beautiful terminal output with Rich library\n\n## Quick Example\n\n```bash\nuv install crump # or pip install crump if you prefer\n\n# Create a configuration file\ncrump prepare users.csv --config crump_config.yml --job users_sync\n\n# Look at the mapping it generated for you in crump_config.yml and edit as needed. \n# Crump has mapped your columns and suggested keys and indexes\n\n# get ready to sync - you db must be available\nexport DATABASE_URL=\"sqlite:///test.db\"\n# Or for Postgres\n# export DATABASE_URL=\"postgresql://user:pass@localhost:5432/mydb\"\n\n# preview changes first (requires --db-url or DATABASE_URL)\ncrump sync users.csv --config crump_config.yml --job users_sync --dry-run\n\n# Sync the file to database\ncrump sync users.csv --config crump_config.yml --job users_sync\n\n# Later that day the v2 of the file arrives\n# Sync the new file, old records from v1 are removed automatically, updates are applied to rows that match based on primary key\ncrump sync users_v2.csv --config crump_config.yml --job users_sync\n```\n\n## Example Configuration\n\n```yaml\njobs:\n  daily_sales:\n    target_table: sales\n    id_mapping:\n      sale_id: id\n    filename_to_column:\n      template: \"sales_[date].csv\"\n      columns:\n        date:\n          db_column: sync_date\n          type: date\n          use_to_delete_old_rows: true\n    columns:\n      product_id: product_id\n      amount: amount\n```\n\nThis configuration:\n- Syncs `sales_YYYY-MM-DD.csv` files to the `sales` table\n- Extracts the date from filename and stores it in `sync_date` column\n- Automatically deletes stale records for the same date after sync\n- Maps CSV columns to database columns\n\n## Documentation\n\n📚 **[Read the full documentation](https://alastairtree.github.io/crump)**\n\n- [Installation Guide](https://alastairtree.github.io/crump/installation/) - Install crump\n- [Quick Start](https://alastairtree.github.io/crump/quick-start/) - Get started in 5 minutes\n- [Configuration](https://alastairtree.github.io/crump/configuration/) - YAML configuration reference\n- [CLI Reference](https://alastairtree.github.io/crump/cli-reference/) - Command-line documentation\n- [Features](https://alastairtree.github.io/crump/features/) - Detailed feature documentation\n- [API Reference](https://alastairtree.github.io/crump/api-reference/) - Python API documentation\n- [Development](https://alastairtree.github.io/crump/development/) - Contributing guide\n\n\n## Programmatic Usage\n\n```python\nfrom pathlib import Path\nfrom crump import sync_csv_to_db, CrumpConfig\n\n# Load configuration\nconfig = CrumpConfig.from_yaml(Path(\"crump_config.yml\"))\njob = config.get_job(\"my_job\")\n\n# Sync CSV to database (PostgreSQL or SQLite)\nrows_synced = sync_csv_to_db(\n    csv_path=Path(\"data.csv\"),\n    job=job,\n    db_connection_string=\"postgresql://localhost/mydb\"\n)\nprint(f\"Synced {rows_synced} rows\")\n```\n\n## Development\n\n```bash\n# Clone repository\ngit clone https://github.com/alastairtree/crump.git\ncd crump\n\n# Install with development dependencies\nuv sync --all-extras\n\n# Run tests\nuv run pytest -v\n\n# Generate documentation locally\n./generate-docs.sh\n```\n\nSee the [Development Guide](https://alastairtree.github.io/crump/development/) for detailed instructions.\n\n## Contributing\n\nContributions are welcome! Please see the [Contributing Guide](https://alastairtree.github.io/crump/contributing/) for details.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Support\n\n- 📖 [Documentation](https://alastairtree.github.io/crump)\n- 🐛 [Issue Tracker](https://github.com/alastairtree/crump/issues)\n- 💬 [Discussions](https://github.com/alastairtree/crump/discussions)\n\n## Acknowledgments\n\nBuilt with [Click](https://click.palletsprojects.com/), [Rich](https://rich.readthedocs.io/), [psycopg3](https://www.psycopg.org/psycopg3/), and [pytest](https://pytest.org/).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falastairtree%2Fcrump","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falastairtree%2Fcrump","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falastairtree%2Fcrump/lists"}