{"id":20956415,"url":"https://github.com/bluebrain/dir-content-diff","last_synced_at":"2025-05-14T05:31:44.590Z","repository":{"id":40482942,"uuid":"436993324","full_name":"BlueBrain/dir-content-diff","owner":"BlueBrain","description":"Simple tool to compare directory contents and get differences using smart comparators.","archived":false,"fork":false,"pushed_at":"2024-11-13T13:39:43.000Z","size":221,"stargazers_count":6,"open_issues_count":1,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-11-13T14:32:42.689Z","etag":null,"topics":["compare-directories","compare-files","compare-folders","diff","dir-compare","directory","directory-comparator","directory-comparison-tool","directory-diff","file-comparison","file-diff","file-differences","folder-compare","folder-comparisation","folder-comparison","folder-diff","python"],"latest_commit_sha":null,"homepage":"https://dir-content-diff.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BlueBrain.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-10T13:44:37.000Z","updated_at":"2024-11-13T13:39:42.000Z","dependencies_parsed_at":"2023-01-30T19:15:46.949Z","dependency_job_id":"98d8aa89-9357-4dbd-8875-6a7e2f3a636c","html_url":"https://github.com/BlueBrain/dir-content-diff","commit_stats":{"total_commits":53,"total_committers":6,"mean_commits":8.833333333333334,"dds":0.09433962264150941,"last_synced_commit":"77e0e9babb3776883e134f38d71e6d74f4db79b9"},"previous_names":[],"tags_count":30,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlueBrain%2Fdir-content-diff","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlueBrain%2Fdir-content-diff/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlueBrain%2Fdir-content-diff/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlueBrain%2Fdir-content-diff/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BlueBrain","download_url":"https://codeload.github.com/BlueBrain/dir-content-diff/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225277030,"owners_count":17448607,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compare-directories","compare-files","compare-folders","diff","dir-compare","directory","directory-comparator","directory-comparison-tool","directory-diff","file-comparison","file-diff","file-differences","folder-compare","folder-comparisation","folder-comparison","folder-diff","python"],"created_at":"2024-11-19T01:25:52.506Z","updated_at":"2024-11-19T01:25:52.972Z","avatar_url":"https://github.com/BlueBrain.png","language":"Python","readme":"[![Version](https://img.shields.io/pypi/v/dir-content-diff)](https://github.com/BlueBrain/dir-content-diff/releases)\n[![Build status](https://github.com/BlueBrain/dir-content-diff/actions/workflows/run-tox.yml/badge.svg?branch=main)](https://github.com/BlueBrain/dir-content-diff/actions)\n[![Coverage](https://codecov.io/github/BlueBrain/dir-content-diff/coverage.svg?branch=main)](https://codecov.io/github/BlueBrain/dir-content-diff?branch=main)\n[![License](https://img.shields.io/badge/License-Apache%202-blue)](https://github.com/BlueBrain/dir-content-diff/blob/main/LICENSE.txt)\n[![Documentation status](https://readthedocs.org/projects/dir-content-diff/badge/?version=latest)](https://dir-content-diff.readthedocs.io/)\n\n\n# Directory Content Difference\n\nThis project provides simple tools to compare the content of a directory against a reference\ndirectory.\n\nThis is useful to check the results of a process that generate several files, like a luigi\nworkflow for example.\n\n\n## Installation\n\nThis package should be installed using pip:\n\n```bash\npip install dir-content-diff\n```\n\n\n## Usage\n\nThe ``dir-content-diff`` package introduces a framework to compare two directories. A comparator\nis associated to each file extension and then each file in the reference directory is compared to\nthe file with the same relative path in the compared directory. By default, a few comparators are\nprovided for usual files but others can be associated to new file extensions or can even replace\nthe default ones. The comparators should be able to report the differences between two files\naccurately, reporting which elements are different among the data. When an extension has no\ncomparator associated, a default comparator is used which just compares the whole binary data of\nthe files, so it is not able to report which values are different.\n\n### Compare two directories\n\nIf one wants to compare two directories with the following structures:\n\n```bash\n└── reference_dir\n    ├── sub_dir_1\n    |   ├── sub_file_1.a\n    |   └── sub_file_2.b\n    └── file_1.c\n```\n\n```bash\n└── compared_dir\n    ├── sub_dir_1\n    |   ├── sub_file_1.a\n    |   └── sub_file_2.b\n    |   └── sub_file_3.b\n    └── file_1.c\n```\n\nThese two directories can be compared with the following code:\n\n```python\nimport dir_content_diff\n\ndir_content_diff.compare_trees(\"reference_dir\", \"compared_dir\")\n```\n\nThis code will return an empty dictionary because no difference was detected.\n\nIf ``reference_dir/file_1.c`` is the following JSON-like file:\n\n```json\n{\n    \"a\": 1,\n    \"b\": [1, 2]\n}\n```\n\nAnd ``compared_dir/file_1.c`` is the following JSON-like file:\n\n```json\n{\n    \"a\": 2,\n    \"b\": [10, 2, 0]\n}\n```\n\nThe following code registers the ``JsonComparator`` for the file extension ``.c`` and compares the\ntwo directories:\n\n```python\nimport dir_content_diff\n\ndir_content_diff.register_comparator(\".c\", dir_content_diff.JsonComparator())\ndir_content_diff.compare_trees(\"reference_dir\", \"compared_dir\")\n```\n\nThe previous code will output the following dictionary:\n\n```python\n{\n    'file_1.c': (\n        'The files \\'reference_dir/file_1.c\\' and \\'compared_dir/file_1.c\\' are different:\\n'\n        'Added the value(s) \\'{\"2\": 0}\\' in the \\'[b]\\' key.\\n'\n        'Changed the value of \\'[a]\\' from 1 to 2.\\n'\n        'Changed the value of \\'[b][0]\\' from 1 to 10.'\n    )\n}\n```\n\nIt is also possible to check whether the two directories are equal or not with the following code:\n\n```python\nimport dir_content_diff\n\ndir_content_diff.register_comparator(\".c\", dir_content_diff.JsonComparator())\ndir_content_diff.assert_equal_trees(\"reference_dir\", \"compared_dir\")\n```\n\nWhich will output the following ``AssertionError``:\n\n```bash\nAssertionError: The files 'reference_dir/file_1.c' and 'compared_dir/file_1.c' are different:\nAdded the value(s) '{\"2\": 0}' in the '[b]' key.\nChanged the value of '[a]' from 1 to 2.\nChanged the value of '[b][0]' from 1 to 10.\n```\n\nFinally, the comparators have parameters that can be passed either to be used for all files of a\ngiven extension or only for a specific file:\n\n```python\nimport dir_content_diff\n\n# Get the default comparators\ncomparators = dir_content_diff.get_comparators()\n\n# Replace the comparators for JSON files to perform the comparison with a given tolerance\ncomparators[\".json\"] = dir_content_diff.JsonComparator(default_diff_kwargs={\"tolerance\": 0.1})\n\n# Use a specific tolerance for the file ``sub_dir_1/sub_file_1.a``\n# In this case, the kwargs are used to compute the difference by default, except the following\n# specific kwargs: ``return_raw_diffs``, ``load_kwargs``, ``format_data_kwargs``, ``filter_kwargs``,\n# ``format_diff_kwargs``, ``sort_kwargs``, ``concat_kwargs`` and ``report_kwargs``.\nspecific_args = {\"sub_dir_1/sub_file_1.a\": {\"tolerance\": 0.5}}\n\ndir_content_diff.assert_equal_trees(\n    \"reference_dir\",\n    \"compared_dir\",\n    comparators=comparators,\n    specific_args=specific_args,\n)\n```\n\nEach comparator has different arguments that are detailed in the documentation.\n\nIt's also possible to specify a arbitrary comparator for a specific file:\n\n```python\nspecific_args = {\n    \"sub_dir_1/sub_file_1.a\": {\n        \"comparator\": dir_content_diff.JsonComparator(),\n        \"tolerance\": 0.5,\n    }\n}\n```\n\nAnd last but not least, it's possible to use regular expressions to associate specific arguments to\na set of files:\n\n```python\nspecific_args = {\n    \"all files with *.a of *.b extensions\": {\n        \"patterns\": [r\".*\\.[a,b]$\"],\n        \"comparator\": dir_content_diff.BaseComparator(),\n    }\n}\n```\n\n\n### Export formatted data\n\nSome comparators have to format the data before comparing them. For example, if one wants to\ncompare data with file paths inside, it's likely that only a relative part of these paths are\nrelevant, not the entire absolute paths. To do this, a specific comparator can be defined with a\ncustom ``format_data()`` method which is automatically called after the data are loaded but before\nthe data are compared. It is then possible to export the data just after they have been formatted\nfor check purpose for example. To do this, the ``export_formatted_files`` argument of the\n``dir_content_diff.compare_trees`` and ``dir_content_diff.assert_equal_trees`` functions can be set\nto ``True``. Thus all the files processed by a comparator with a ``save()`` method will be exported\nto a new directory. This new directory is the same as the compared directory to which a suffix is\nadded. By default, the suffix is `` _FORMATTED ``, but it can be overridden by passing a non-empty\nstring to the ``export_formatted_files`` argument.\n\n## Pytest plugin\n\nThis package can be used as a pytest plugin. When ``pytest`` is run and ``dir-content-diff`` is\ninstalled, it is automatically detected and registered as a plugin. It is then possible to trigger\nthe export of formatted data with the following ``pytest`` option: ``--dcd-export-formatted-data``.\nIt is also possible to define a custom suffix for the new directory with the following option:\n``--dcd-export-suffix``.\n\n\n## Funding \u0026 Acknowledgment\n\nThe development of this software was supported by funding to the Blue Brain Project, a research\ncenter of the École polytechnique fédérale de Lausanne (EPFL), from the Swiss government’s ETH\nBoard of the Swiss Federal Institutes of Technology.\n\nFor license and authors, see `LICENSE.txt` and `AUTHORS.md` respectively.\n\nCopyright © 2021-2023 Blue Brain Project/EPFL\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbluebrain%2Fdir-content-diff","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbluebrain%2Fdir-content-diff","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbluebrain%2Fdir-content-diff/lists"}