{"id":16972188,"url":"https://github.com/anastasia/htmldiffer","last_synced_at":"2025-03-22T14:31:28.075Z","repository":{"id":62569569,"uuid":"84677002","full_name":"anastasia/htmldiffer","owner":"anastasia","description":null,"archived":false,"fork":false,"pushed_at":"2022-06-30T13:22:37.000Z","size":95,"stargazers_count":8,"open_issues_count":10,"forks_count":9,"subscribers_count":4,"default_branch":"develop","last_synced_at":"2025-03-01T02:38:24.161Z","etag":null,"topics":["diff","html","html-diff","python"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anastasia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-03-11T20:13:06.000Z","updated_at":"2022-06-30T13:21:50.000Z","dependencies_parsed_at":"2022-11-03T17:15:34.196Z","dependency_job_id":null,"html_url":"https://github.com/anastasia/htmldiffer","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anastasia%2Fhtmldiffer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anastasia%2Fhtmldiffer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anastasia%2Fhtmldiffer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anastasia%2Fhtmldiffer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anastasia","download_url":"https://codeload.github.com/anastasia/htmldiffer/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244217920,"owners_count":20417677,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diff","html","html-diff","python"],"created_at":"2024-10-14T00:57:55.244Z","updated_at":"2025-03-22T14:31:27.802Z","avatar_url":"https://github.com/anastasia.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"### This is unmaintained. If you would like to maintain it, please DM me with your thoughts about this project. \n\n## htmldiffer\n[![Build Status](https://travis-ci.org/anastasia/htmldiffer.svg?branch=develop)](https://travis-ci.org/anastasia/htmldiff)\n#### highlight the differences between two html files\n#\n### To install:\n```\npip install htmldiffer\n```\n\nOr\n```\n$ git clone git@github.com:anastasia/htmldiffer.git\n$ cd htmldiffer\n$ python -m htmldiffer file_one.html file_two.html\n```\n\nHTMLDiffer will take strings or files and return three html diffs: deleted diff, inserted diff, and a combined diff (showing both the deleted and inserted highlights). To use this in a library:\nHTMLDiffer will \n+ surround any text-level changes with `\u003cspan class=\"htmldiffer_[insert|delete]\u003e`\n+ insert htmldiffer classes (`class=\"htmldiffer-tag-change_[insert|delete]`) into any tag-level changes (that is, if a tagname has changed, or any attribute inside a tag has changed) \n\n```python\nfrom htmldiffer import diff\n\nstr_a = \"\u003chtml\u003e\u003cbody\u003eHello world!\u003c/body\u003e\u003c/html\u003e\"\nstr_b = \"\u003chtml\u003e\u003cbody\u003eHello wanda! Hello!\u003c/body\u003e\u003c/html\u003e\"\nd = diff.HTMLDiffer(str_a, str_b)\n\nprint(d.deleted_diff)\n# get a string of the HTML with deleted elements highlighted:\n# \u003chtml\u003e\u003cbody\u003eHello \u003cspan class=\"diff_delete\"\u003eworld!\u003c/span\u003e\u003c/body\u003e\u003c/html\u003e\n\nprint(d.inserted_diff)\n# get a string of the HTML with inserted elements highlighted:\n# \u003chtml\u003e\u003cbody\u003eHello \u003cspan class=\"diff_insert\"\u003ewanda! \u003c/span\u003e\u003cspan class=\"diff_insert\"\u003eHello!\u003c/span\u003e\u003c/body\u003e\u003c/html\u003e\n\nprint(d.combined_diff)\n# get a string of the HTML with both deleted and inserted elements highlighted:\n# \u003chtml\u003e\u003cbody\u003eHello \u003cspan class=\"diff_delete\"\u003eworld!\u003c/span\u003e\u003cspan class=\"diff_insert\"\u003ewanda! \u003c/span\u003e\u003cspan class=\"diff_insert\"\u003eHello!\u003c/span\u003e\u003c/body\u003e\u003c/html\u003e\n```\n\nThat's it!\n\n### How does this work?\n\nhtmldiffer takes a string or a file of html, converts it to string entities[1], then diffs those entities using [SequenceMatcher][seqmatch] \nand gets deleted, inserted, and combined (deleted and inserted) html, which include spans wrapping the changed text.\n\nExample:\n```python\n\nold_html = \"\u003ch1\u003eThis is a simple header\u003c/h1\u003e\"\nnew_html = \"\u003ch1\u003eThis is a newer, better header\u003c/h1\u003e\"\n\nd = HTMLDiffer(old_html, new_html)\nd.deleted_diff == \"\u003ch1\u003eThis is a \u003cspan class=\"diff_delete\"\u003esimple \u003c/span\u003eheader\u003c/h1\u003e\"\nd.inserted_diff == \"\u003ch1\u003eThis is a \u003cspan class=\"diff_insert\"\u003enewer, \u003c/span\u003e\u003cspan class=\"diff_insert\"\u003ebetter \u003c/span\u003eheader\u003c/h1\u003e\"\nd.combined_diff == \"\u003ch1\u003eThis is a \u003cspan class=\"diff_delete\"\u003esimple \u003c/span\u003e\u003cspan class=\"diff_insert\"\u003enewer, \u003c/span\u003e\u003cspan class=\"diff_insert\"\u003ebetter \u003c/span\u003eheader\u003c/h1\u003e\"\n```\n\n[1] An entity can be one of several things:\n+ A word\n+ An opening tag: `\u003cli class=\"list-element\" style=\"some:style;\"\u003e`\n+ A closing tag: `\u003c/li\u003e`\n+ A tag that has been whitelisted (self closing tags that you want to highlight changes of are recommended here)\n    + for instance, by default we're whitelisting image tags, so the entity will be: `\u003cimg src=\"some/source.jpg\"/\u003e`\n+ The entirety of a blacklisted tag (like a script and head tag, since it's difficult to show changes in those, for now)\n    + `\u003cscript\u003eThe entirety of a script tag will be a single entity\u003c/script\u003e`\n\nIn order to maintain the integrity and structure of the original HTML, we don't remove any whitespaces or change the HTML itself in any way, before iterating through and wrapping it with span tags.\n\n[seqmatch]:https://docs.python.org/3/library/difflib.html#difflib.SequenceMatcher\n\n\n### Tell me more\n\n+ htmldiffer's `diff` method [diff.py][diffpy]\n`html2list` method which iterates through the html string and spits out a list of entities (see above for explanation).\n\n[diffpy]:https://github.com/anastasia/htmldiffer/htmldiffer/diff.py\n\n+ `diff` adds a style string (default lives in settings.py) to the `\u003chead\u003e` of the html (if head tag exists)\n  so that our diff highlights show up\n\n+ `diff` compares the two newly created lists (two — one is for the old html string, one for the new html string) using\n  `SequenceMatcher`, and gets a list back describing (using codes 'replace', 'delete', 'insert', and 'equal'), for each\n   element A how it got to be element B\n\n+ `diff` method iterates through that list, calling to `wrap_text` to wrap each element according to its change value\n\nMore complexities! How does `wrap_text` work?\n\n+ For each element, if the element is not an html tag, it wraps it in a `\u003cspan\u003e` tag with a `diff_insert` or `diff_delete` class.\n\n+ If the element is an HTML tag, `wrap_text` will skip the element *unless* the element is in `settings.WHITELISTED_TAGS` list.\n  The reason for that is that we don't want to wrap the `\u003cli\u003e` opening tag itself, but the changes within that tag.\n\n\n  Things to note:\n\n  + HTML `\u003c!-- comments --\u003e` will be read as a tag and therefore skipped. \n  + all text that is changed should therefore be wrapped by appropriate `span` diff tags.\n  + the default whitelisted tags include self-closing tags `\u003cimg\u003e` and `\u003cinput\u003e` and will therefore be wrapped in `span` diff tags \n\n\n***\n\nThis repository is a fork off of https://github.com/aaronsw/htmldiff. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanastasia%2Fhtmldiffer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanastasia%2Fhtmldiffer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanastasia%2Fhtmldiffer/lists"}