{"id":13557353,"url":"https://github.com/simonw/csv-diff","last_synced_at":"2025-05-16T07:05:38.157Z","repository":{"id":44687313,"uuid":"175321497","full_name":"simonw/csv-diff","owner":"simonw","description":"Python CLI tool and library for diffing CSV and JSON files","archived":false,"fork":false,"pushed_at":"2024-09-06T05:20:04.000Z","size":38,"stargazers_count":311,"open_issues_count":26,"forks_count":50,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-05-09T14:16:21.647Z","etag":null,"topics":["click","csv","csv-diff","datasette-io","datasette-tool","diff","git-scraping","tsv-diff"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simonw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-13T01:11:26.000Z","updated_at":"2025-04-15T07:21:25.000Z","dependencies_parsed_at":"2023-02-12T12:50:27.686Z","dependency_job_id":"f14252ab-cb29-4b33-96b1-294f9cc84f64","html_url":"https://github.com/simonw/csv-diff","commit_stats":{"total_commits":44,"total_committers":6,"mean_commits":7.333333333333333,"dds":"0.13636363636363635","last_synced_commit":"26903b74eefcd65be761810f51b0e55c033bde66"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonw%2Fcsv-diff","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonw%2Fcsv-diff/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonw%2Fcsv-diff/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonw%2Fcsv-diff/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simonw","download_url":"https://codeload.github.com/simonw/csv-diff/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254485061,"owners_count":22078767,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["click","csv","csv-diff","datasette-io","datasette-tool","diff","git-scraping","tsv-diff"],"created_at":"2024-08-01T12:04:17.902Z","updated_at":"2025-05-16T07:05:33.142Z","avatar_url":"https://github.com/simonw.png","language":"Python","readme":"# csv-diff\n\n[![PyPI](https://img.shields.io/pypi/v/csv-diff.svg)](https://pypi.org/project/csv-diff/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases\u0026label=changelog)](https://github.com/simonw/csv-diff/releases)\n[![Tests](https://github.com/simonw/csv-diff/workflows/Test/badge.svg)](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/csv-diff/blob/main/LICENSE)\n\nTool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco’s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project.\n\n## Installation\n\n    pip install csv-diff\n\n## Usage\n\nConsider two CSV files:\n\n`one.csv`\n\n    id,name,age\n    1,Cleo,4\n    2,Pancakes,2\n\n`two.csv`\n\n    id,name,age\n    1,Cleo,5\n    3,Bailey,1\n\n`csv-diff` can show a human-readable summary of differences between the files:\n\n    $ csv-diff one.csv two.csv --key=id\n    1 row changed, 1 row added, 1 row removed\n\n    1 row changed\n\n      Row 1\n        age: \"4\" =\u003e \"5\"\n\n    1 row added\n\n      id: 3\n      name: Bailey\n      age: 1\n\n    1 row removed\n\n      id: 2\n      name: Pancakes\n      age: 2\n\nThe `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed.\n\nThe tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`.\n\nYou can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON.\n\nUse `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output:\n\n    % csv-diff one.csv two.csv --key=id --show-unchanged\n    1 row changed\n\n      id: 1\n        age: \"4\" =\u003e \"5\"\n\n        Unchanged:\n          name: \"Cleo\"\n\n### JSON output\n\nYou can use the `--json` option to get a machine-readable difference:\n\n    $ csv-diff one.csv two.csv --key=id --json\n    {\n        \"added\": [\n            {\n                \"id\": \"3\",\n                \"name\": \"Bailey\",\n                \"age\": \"1\"\n            }\n        ],\n        \"removed\": [\n            {\n                \"id\": \"2\",\n                \"name\": \"Pancakes\",\n                \"age\": \"2\"\n            }\n        ],\n        \"changed\": [\n            {\n                \"key\": \"1\",\n                \"changes\": {\n                    \"age\": [\n                        \"4\",\n                        \"5\"\n                    ]\n                }\n            }\n        ],\n        \"columns_added\": [],\n        \"columns_removed\": []\n    }\n\n### Adding templated extras\n\nYou can specify additional keys to be displayed in the human-readable format using the `--extra` option:\n\n    --extra name \"Python format string with {id} for variables\"\n\nFor example, to output a link to `https://news.ycombinator.com/latest?id={id}` for each item with an ID, you could use this:\n\n```bash\ncsv-diff one.csv two.csv --key=id \\\n  --extra latest \"https://news.ycombinator.com/latest?id={id}\"\n```\nThese extras display something like this:\n```\n1 row changed\n\n  id: 41459472\n    points: \"24\" =\u003e \"25\"\n    numComments: \"5\" =\u003e \"6\"\n  extras:\n    latest: https://news.ycombinator.com/latest?id=41459472\n```\n\n## As a Python library\n\nYou can also import the Python library into your own code like so:\n\n    from csv_diff import load_csv, compare\n    diff = compare(\n        load_csv(open(\"one.csv\"), key=\"id\"),\n        load_csv(open(\"two.csv\"), key=\"id\")\n    )\n\n`diff` will now contain the same data structure as the output in the `--json` example above.\n\nIf the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.\n\n## As a Docker container\n\n### Build the image\n\n    $ docker build -t csvdiff .\n\n### Run the container\n\n    $ docker run --rm -v $(pwd):/files csvdiff\n\nSuppose current directory contains two csv files : one.csv two.csv\n\n    $ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv\n    \n## Alternatives\n\n- [csvdiff](https://github.com/aswinkarthik/csvdiff) is a \"fast diff tool for comparing CSV files\" - you may get better results from this than from `csv-diff` against larger files.\n","funding_links":[],"categories":["Python","others","\u003ca name=\"diff\"\u003e\u003c/a\u003eDiff"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonw%2Fcsv-diff","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimonw%2Fcsv-diff","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonw%2Fcsv-diff/lists"}