{"id":17247671,"url":"https://github.com/joshdata/xml_diff","last_synced_at":"2025-08-16T23:31:31.944Z","repository":{"id":62590063,"uuid":"21174540","full_name":"JoshData/xml_diff","owner":"JoshData","description":"Compares two XML documents by diffing their text.","archived":false,"fork":false,"pushed_at":"2024-06-13T03:52:16.000Z","size":30,"stargazers_count":39,"open_issues_count":1,"forks_count":11,"subscribers_count":8,"default_branch":"primary","last_synced_at":"2024-12-08T05:12:57.556Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JoshData.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-06-24T17:39:52.000Z","updated_at":"2024-11-15T22:55:47.000Z","dependencies_parsed_at":"2024-10-31T07:02:46.462Z","dependency_job_id":"7a66bf99-89d6-4ecf-aef5-92eb2c00a327","html_url":"https://github.com/JoshData/xml_diff","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoshData%2Fxml_diff","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoshData%2Fxml_diff/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoshData%2Fxml_diff/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoshData%2Fxml_diff/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JoshData","download_url":"https://codeload.github.com/JoshData/xml_diff/tar.gz/refs/heads/primary","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230066930,"owners_count":18167546,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T06:38:35.834Z","updated_at":"2024-12-17T05:13:32.564Z","avatar_url":"https://github.com/JoshData.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"xml_diff\n========\n\nCompares the text inside two XML documents and marks up the differences with ``\u003cdel\u003e`` and ``\u003cins\u003e`` tags.\n\nThis is the result of about 7 years of trying to get this right and coded simply. I've used code like this in one form or another to compare bill text on GovTrack.us \u003chttps://www.govtrack.us\u003e.\n\nThe comparison is completely blind to the structure of the two XML documents. It does a word-by-word comparison on the text content only, and then it goes back into the original documents and wraps changed text in new ``\u003cdel\u003e`` and ``\u003cins\u003e`` wrapper elements.\n\nThe documents are then concatenated to form a new document and the new document is printed on standard output. Or use this as a library and call ``compare`` yourself with two ``lxml.etree.Element`` nodes (the roots of your documents).\n\nThe script is written in Python 3.\n\nExample\n-------\n\nComparing these two documents::\n\n\t\u003chtml\u003e\n\t\tHere is \u003cb\u003esome bold\u003c/b\u003e text.\n\t\u003c/html\u003e\n\nand::\n\n\t\u003chtml\u003e\n\t\tHere is \u003ci\u003esome italic\u003c/i\u003e content that shows how \u003ctt\u003exml_diff\u003c/tt\u003e works.\n\t\u003c/html\u003e\t\n\nYields::\n\n\t\u003cdocuments\u003e\n\t\t\u003chtml\u003e\n\t\t\tHere is \u003cb\u003esome \u003cdel\u003ebold\u003c/del\u003e\u003c/b\u003e\u003cdel\u003e text\u003c/del\u003e.\n\t\t\u003c/html\u003e\n\t\t\u003chtml\u003e\n\t\t\tHere is \u003ci\u003esome \u003cins\u003eitalic\u003c/ins\u003e\u003c/i\u003e\u003cins\u003e content that shows how \u003c/ins\u003e\u003ctt\u003e\u003cins\u003exml_diff\u003c/ins\u003e\u003c/tt\u003e\u003cins\u003e works\u003c/ins\u003e.\n\t\t\u003c/html\u003e\n\t\u003c/documents\u003e\n\nOn Ubuntu, get dependencies with::\n\n\tapt-get install python3-lxml libxml2-dev libxslt1-dev\n\nFor really fast comparisons, get Google's Diff Match Patch library \u003chttps://code.google.com/p/google-diff-match-patch/\u003e, as re-written and sped-up by @leutloff \u003chttps://github.com/leutloff/diff-match-patch-cpp-stl\u003e and then turned into a Python extension module by me \u003chttps://github.com/JoshData/diff_match_patch-python\u003e::\n\n\tpip3 install diff_match_patch_python\n\nOr if you can't install that for any reason, use the pure-Python library::\n\n\tpip3 install diff-match-patch\n\nThis is also at \u003chttps://code.google.com/p/google-diff-match-patch/source/browse/trunk/python3/diff_match_patch.py\u003e. xml_diff will use whichever is installed.\n\nFinally, install this module::\n\n\tpip3 install xml_diff\n\nThen call the module from the command line::\n\n\tpython3 -m xml_diff  --tags del,ins doc1.xml doc2.xml \u003e changes.xml\n\nOr use the module from Python::\n\n\timport lxml.etree\n\tfrom xml_diff import compare\n\n\tdom1 = lxml.etree.parse(\"doc1.xml\").getroot()\n\tdom2 = lxml.etree.parse(\"doc2.xml\").getroot()\n\tcomparison = compare(dom1, dom2)\n\nThe two DOMs are modified in-place.\n\nOptional Arguments\n------------------\n\nThe ``compare`` function takes other optional keyword arguments:\n\n``merge`` is a boolean (default false) that indicates whether the comparison function should perform a merge. If true, ``dom1`` will contain not just ``\u003cdel\u003e`` nodes but also ``\u003cins\u003e`` nodes and, similarly, ``dom2`` will contain not just ``\u003cins\u003e`` nodes but also ``\u003cdel\u003e`` nodes. Although the two DOMs will now contain the same semantic information about changes, and the same text content, each preserves their original structure --- since the comparison is only over text and not structure. The new ``ins``/``del`` nodes contain content from the other document (including whole subtrees), and so there's no guarantee that the final documents will conform to any particular structural schema after this operation.\n\n``word_separator_regex`` (default ``r\"\\s+|[^\\s\\w]\"``) is a regular expression for how to separate words. The default splits on one or more spaces in a row and single instances of non-word characters.\n\n``differ`` is a function that takes two arguments ``(text1, text2)`` and returns an iterator over difference operations given as tuples of the form ``(operation, text_length)``, where ``operation`` is one of ``\"=\"`` (no change in text), ``\"+\"`` (text inserted into ``text2``), or ``\"-\"`` (text deleted from ``text1``). (See xml_diff/__init__.py's ``default_differ`` function for how the default differ works.)\n\n``tags`` is a two-tuple of tag names to use for deleted and inserted content. The default is ``('del', 'ins')``.\n\n``make_tag_func`` is a function that takes one argument, which is either ``\"ins\"`` or ``\"del\"``, and returns a new ``lxml.etree.Element`` to be inserted into the DOM to wrap changed content. If given, the ``tags`` argument is ignored.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoshdata%2Fxml_diff","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjoshdata%2Fxml_diff","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoshdata%2Fxml_diff/lists"}