{"id":39840393,"url":"https://github.com/datacoon/datadifflib","last_synced_at":"2026-01-18T13:26:48.423Z","repository":{"id":73757806,"uuid":"206727909","full_name":"datacoon/datadifflib","owner":"datacoon","description":"Python library to track changes and generate deltas for JSON, CSV and BSON files. ","archived":false,"fork":false,"pushed_at":"2024-07-10T07:38:35.000Z","size":22,"stargazers_count":2,"open_issues_count":3,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-07-10T09:37:43.095Z","etag":null,"topics":["bson","csv","diff","json"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datacoon.png","metadata":{"files":{"readme":"README.rst","changelog":"HISTORY.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-09-06T06:28:40.000Z","updated_at":"2022-10-18T17:13:15.000Z","dependencies_parsed_at":"2024-01-22T16:25:54.542Z","dependency_job_id":"7f3ac43f-e88b-4554-8f34-edd0031c1f40","html_url":"https://github.com/datacoon/datadifflib","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/datacoon/datadifflib","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacoon%2Fdatadifflib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacoon%2Fdatadifflib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacoon%2Fdatadifflib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacoon%2Fdatadifflib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datacoon","download_url":"https://codeload.github.com/datacoon/datadifflib/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacoon%2Fdatadifflib/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28536751,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T13:04:05.990Z","status":"ssl_error","status_checked_at":"2026-01-18T13:01:44.092Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bson","csv","diff","json"],"created_at":"2026-01-18T13:26:48.350Z","updated_at":"2026-01-18T13:26:48.414Z","avatar_url":"https://github.com/datacoon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"====================================================\ndatadiff -- library and tool to compare data files JSON, CSV and BSON and to create and apply changes between dataset versions\n====================================================\n\n.. image:: https://img.shields.io/travis/datacoon/datadifflib/master.svg?style=flat-square\n    :target: https://travis-ci.org/ivbeg/qddate\n    :alt: travis build status\n\n.. image:: https://img.shields.io/pypi/v/datadifflib.svg?style=flat-square\n    :target: https://pypi.python.org/pypi/datadifflib\n    :alt: pypi version\n\n.. image:: https://readthedocs.org/projects/datadifflib/badge/?version=latest\n    :target: http://datadifflib.readthedocs.org/en/latest/?badge=latest\n    :alt: Documentation Status\n\n\n\n`datadifflib` is a Python 3 lib that helps track changes between two versions of dataset and to produce delta file of changes of these files.\nIt supports JSON, BSON and CSV file formats and could produce delta files for each of these data formats.\n\n\n\nDocumentation\n=============\n\nDocumentation is built automatically and can be found on\n`Read the Docs \u003chttps://datadifflib.readthedocs.org/en/latest/\u003e`_.\n\n\nFeatures\n========\n\n* As simple as possible\n* Minimalistic memory footprint\n* File formats supported: BSON, JSON, CSV\n\n\nLimitations\n========\n\n* Only JSON files supported to generate and apply delta files\n* Limited support for very huge files 100GB+, max tested files are 5GB\n* Files readed twice to generated delta. First time it generates index and second time it extracts added, deleted and changed records\n* The library and tool doesn't ever know anything about applicability of patch and so on. You have to manage yourself version control of datasets\n\n\n\nCommand-line tool\n=================\nUsage: datadiffcli.py [OPTIONS] COMMAND [ARGS]...\n\nOptions:\n  --help  Show this message and exit.\n\nCommands:\n* compare   Compares records in two files with unique key and returns if changes exists\n* delta     Generates delta file\n* patch     Applies patch from delta file\n\nExamples\n========\n\nCompare two versions of same dataset with unique key defined in 'regnum' field in each dataset\n\n    python datadiffcli.py compare regnum reestrgp_2018.json reestrgp_2019.json\n\nGenerates delta file after comparsion of two versions of same dataset with unique key defined in 'regnum' field\n\n    python datadiffcli.py delta regnum reestrgp_2018.json reestrgp_2019.json reestrgp_delta.json\n\nApply delta file against original dataset and produce updated dataset\n\n    python datadiffcli.py patch reestrgp_2018.json reestrgp_delta.json reestrgp_proc.json\n\n\nHow to use library\n==================\n\nGenerates report on changes between 'reestrgp_2018.json' and 'reestrgp_2019.json' versions of dataset with unique key 'regnum'\n    \u003e\u003e\u003e from datadiff.diff import jsondiff\n    \u003e\u003e\u003e key = 'regnum'\n    \u003e\u003e\u003e left = 'reestrgp_2018.json'\n    \u003e\u003e\u003e right = 'reestrgp_2019.json'\n    \u003e\u003e\u003e report = jsondiff(key, left, right)\n\n\nGenerates delta file between two versions of dataset\n    \u003e\u003e\u003e from datadiff.delta import json_delta\n    \u003e\u003e\u003e left = 'reestrgp_2018.json'\n    \u003e\u003e\u003e right = 'reestrgp_2019.json'\n    \u003e\u003e\u003e outfile = 'reestrgp_delta.json'\n    \u003e\u003e\u003e json_delta(key, left, right, outfile, difftype='full')\n\n\nApply patch to first version of dataset\n    \u003e\u003e\u003e from datadiff.delta import apply_json_delta\n    \u003e\u003e\u003e dataset = 'reestrgp_2018.json'\n    \u003e\u003e\u003e delta = 'reestrgp_delta.json'\n    \u003e\u003e\u003e outfile = 'reestrgp_proc.json'\n    \u003e\u003e\u003e apply_json_delta(key, dataset, delta, outfile)\n\n\nPatch file format\n==================\nPatch file is quite simple it's serialized json structure.\nEach record in 'records' field has fields:\n- mode    - 'a' for add, 'c' for change and 'd' for delete\n- uniqkey - unique key of selected record\n- obj     - original object value from original or compared dataset file\n\nUnique copied outside 'obj' since in future obj could be replaced by patch to selected record, not record itself\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatacoon%2Fdatadifflib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatacoon%2Fdatadifflib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatacoon%2Fdatadifflib/lists"}