{"id":30523856,"url":"https://github.com/pmgraham/datagrunt","last_synced_at":"2025-08-26T20:51:39.389Z","repository":{"id":281917620,"uuid":"847924948","full_name":"pmgraham/datagrunt","owner":"pmgraham","description":"Datagrunt is a Python library designed to simplify the way you work with CSV files. It provides a streamlined approach to reading, processing, and transforming your data into various formats, making data manipulation efficient and intuitive.","archived":false,"fork":false,"pushed_at":"2025-07-13T13:06:39.000Z","size":6842,"stargazers_count":9,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-08-17T07:38:43.015Z","etag":null,"topics":["csv","csv-parser","data-analysis","data-engineering","data-science","data-wrangling","dataframe","duckdb","open-source","polars","python","python3"],"latest_commit_sha":null,"homepage":"https://pmgraham.github.io/datagrunt","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pmgraham.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-08-26T20:02:20.000Z","updated_at":"2025-07-13T12:55:50.000Z","dependencies_parsed_at":null,"dependency_job_id":"43d25113-1982-4571-916f-c83ac83f83f8","html_url":"https://github.com/pmgraham/datagrunt","commit_stats":null,"previous_names":["pmgraham/datagrunt"],"tags_count":24,"template":false,"template_full_name":null,"purl":"pkg:github/pmgraham/datagrunt","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pmgraham%2Fdatagrunt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pmgraham%2Fdatagrunt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pmgraham%2Fdatagrunt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pmgraham%2Fdatagrunt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pmgraham","download_url":"https://codeload.github.com/pmgraham/datagrunt/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pmgraham%2Fdatagrunt/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272254476,"owners_count":24901049,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-26T02:00:07.904Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","csv-parser","data-analysis","data-engineering","data-science","data-wrangling","dataframe","duckdb","open-source","polars","python","python3"],"created_at":"2025-08-26T20:51:30.857Z","updated_at":"2025-08-26T20:51:39.377Z","avatar_url":"https://github.com/pmgraham.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Welcome To Datagrunt\n\nDatagrunt is a Python library designed to simplify the way you work with CSV files. It provides a streamlined approach to reading, processing, and transforming your data into various formats, making data manipulation efficient and intuitive.\n\n## Why Datagrunt?\n\nBorn out of real-world frustration, Datagrunt eliminates the need for repetitive coding when handling CSV files. Whether you're a data analyst, data engineer, or data scientist, Datagrunt empowers you to focus on insights, not tedious data wrangling.\n\n## Key Features\n\n- **Intelligent Delimiter Inference:**  Datagrunt automatically detects and applies the correct delimiter for your CSV files.\n- **Seamless Data Processing:** Leverage the robust capabilities of [DuckDB](https://duckdb.org) and [Polars](https://pola.rs) to perform advanced data processing tasks directly on your CSV data.\n- **Flexible Transformation:** Easily convert your processed CSV data into various formats to suit your needs.\n- **AI-Powered Schema Analysis:** Use Google's Gemini models to automatically generate detailed schema reports for your CSV files, including data types, column classifications, and data quality checks.\n- **Pythonic API:** Enjoy a clean and intuitive API that integrates seamlessly into your existing Python workflows.\n\n## Installation\nWe recommend using [UV](https://docs.astral.sh/uv/). However, you may get started with Datagrunt in seconds using UV or pip.\n\nGet started with UV:\n\n```bash\nuv pip install datagrunt\n```\n\nGet started with pip:\n\n```bash\npip install datagrunt\n```\n\n## Getting Started\n\n```python\nfrom datagrunt import CSVReader\n\n# Load your CSV file\ncsv_file = 'electric_vehicle_population_data.csv'\nengine = 'duckdb'\n\n# Set duckdb as the processing engine. Engine set to 'polars' by default\ndg = CSVReader(csv_file, engine=engine)\n\n# return sample of the data to get a peek at the schema\ndg.get_sample()\n┌────────────┬───────────┬──────────────┬───┬──────────────────────┬──────────────────────┬───────────────────┐\n│ VIN (1-10) │  County   │     City     │ … │   Vehicle Location   │   Electric Utility   │ 2020 Census Tract │\n│  varchar   │  varchar  │   varchar    │   │       varchar        │       varchar        │      varchar      │\n├────────────┼───────────┼──────────────┼───┼──────────────────────┼──────────────────────┼───────────────────┤\n│ 5YJSA1E28K │ Snohomish │ Mukilteo     │ … │ POINT (-122.29943 …  │ PUGET SOUND ENERGY…  │ 53061042001       │\n│ 1C4JJXP68P │ Yakima    │ Yakima       │ … │ POINT (-120.468875…  │ PACIFICORP           │ 53077001601       │\n│ WBY8P6C05L │ Kitsap    │ Kingston     │ … │ POINT (-122.517835…  │ PUGET SOUND ENERGY…  │ 53035090102       │\n│ JTDKARFP1J │ Kitsap    │ Port Orchard │ … │ POINT (-122.653005…  │ PUGET SOUND ENERGY…  │ 53035092802       │\n│ 5UXTA6C09N │ Snohomish │ Everett      │ … │ POINT (-122.203234…  │ PUGET SOUND ENERGY…  │ 53061041605       │\n│ 5YJYGDEF8L │ King      │ Seattle      │ … │ POINT (-122.378886…  │ CITY OF SEATTLE - …  │ 53033004703       │\n│ JTMAB3FV7P │ Thurston  │ Rainier      │ … │ POINT (-122.677141…  │ PUGET SOUND ENERGY…  │ 53067012530       │\n│ JN1AZ0CPXC │ King      │ Kirkland     │ … │ POINT (-122.192596…  │ PUGET SOUND ENERGY…  │ 53033022402       │\n│ JN1AZ0CP7B │ King      │ Kirkland     │ … │ POINT (-122.192596…  │ PUGET SOUND ENERGY…  │ 53033022603       │\n│ 1N4AZ0CP0F │ Thurston  │ Olympia      │ … │ POINT (-122.86491 …  │ PUGET SOUND ENERGY…  │ 53067010300       │\n│     ·      │   ·       │    ·         │ · │          ·           │          ·           │      ·            │\n│     ·      │   ·       │    ·         │ · │          ·           │          ·           │      ·            │\n│     ·      │   ·       │    ·         │ · │          ·           │          ·           │      ·            │\n│ 5YJYGDEE7M │ Clark     │ Vancouver    │ … │ POINT (-122.515805…  │ BONNEVILLE POWER A…  │ 53011041310       │\n│ 7SAYGAEE0P │ Snohomish │ Monroe       │ … │ POINT (-121.968385…  │ PUGET SOUND ENERGY…  │ 53061052203       │\n│ 2C4RC1N75P │ King      │ Burien       │ … │ POINT (-122.347227…  │ CITY OF SEATTLE - …  │ 53033027600       │\n│ 1FTVW1EVXP │ King      │ Kirkland     │ … │ POINT (-122.202653…  │ PUGET SOUND ENERGY…  │ 53033022300       │\n│ 4JGGM1CB2P │ King      │ Seattle      │ … │ POINT (-122.2453 4…  │ CITY OF SEATTLE - …  │ 53033011700       │\n│ 1N4BZ0CP0G │ King      │ Seattle      │ … │ POINT (-122.334079…  │ CITY OF SEATTLE - …  │ 53033008300       │\n│ 7SAYGDEF2N │ King      │ Bellevue     │ … │ POINT (-122.144149…  │ PUGET SOUND ENERGY…  │ 53033024704       │\n│ 1N4BZ1DP7L │ King      │ Bellevue     │ … │ POINT (-122.144149…  │ PUGET SOUND ENERGY…  │ 53033024902       │\n...\n├────────────┴───────────┴──────────────┴───┴──────────────────────┴──────────────────────┴───────────────────┤\n│ ? rows (\u003e9999 rows, 20 shown)                                                          17 columns (6 shown) │\n└─────────────────────────────────────────────────────────────────────────────────────────────────────────────┘\n```\n\n##  DuckDB Integration for Performant SQL Queries\n```python\nfrom datagrunt import CSVReader\n\ncsv_file = 'electric_vehicle_population_data.csv'\nengine = 'duckdb'\n\ndg = CSVReader(csv_file, engine=engine)\n\n# Construct your SQL query\nquery = f\"\"\"\nWITH core AS (\n    SELECT\n        City AS city,\n        \"VIN (1-10)\" AS vin\n    FROM {dg.db_table}\n)\nSELECT\n    city,\n    COUNT(vin) AS vehicle_count\nFROM core\nGROUP BY 1\nORDER BY 2 DESC\n\"\"\"\n\n# Execute the query and get results as a Polars DataFrame\ndf = dg.query_data(query).pl()\nprint(df)\n┌────────────────┬───────────────┐\n│ city           ┆ vehicle_count │\n│ ---            ┆ ---           │\n│ str            ┆ i64           │\n╞════════════════╪═══════════════╡\n│ Seattle        ┆ 32602         │\n│ Bellevue       ┆ 9960          │\n│ Redmond        ┆ 7165          │\n│ Vancouver      ┆ 7081          │\n│ Bothell        ┆ 6602          │\n│ …              ┆ …             │\n│ Glenwood       ┆ 1             │\n│ Walla Walla Co ┆ 1             │\n│ Pittsburg      ┆ 1             │\n│ Decatur        ┆ 1             │\n│ Redwood City   ┆ 1             │\n└────────────────┴───────────────┘\n```\n## License\nThis project is licensed under the [MIT License](https://opensource.org/license/mit)\n\n## Acknowledgements\nA HUGE thank you to the open source community and the creators of [DuckDB](https://duckdb.org) and [Polars](https://pola.rs) for their fantastic libraries that power Datagrunt.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpmgraham%2Fdatagrunt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpmgraham%2Fdatagrunt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpmgraham%2Fdatagrunt/lists"}