{"id":18356663,"url":"https://github.com/aaravmalani/htmlparse","last_synced_at":"2025-09-29T07:08:31.880Z","repository":{"id":152069818,"uuid":"625314871","full_name":"AaravMalani/htmlparse","owner":"AaravMalani","description":"A basic HTML parser in Python","archived":false,"fork":false,"pushed_at":"2023-04-14T10:31:31.000Z","size":6,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-22T00:11:16.235Z","etag":null,"topics":["collaborate","html","module","package","parser","pip","pypi","python","re","recursive","regex"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AaravMalani.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-08T18:21:24.000Z","updated_at":"2023-07-02T02:22:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"290a32c3-1610-4818-bb3f-3622969ddddc","html_url":"https://github.com/AaravMalani/htmlparse","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AaravMalani%2Fhtmlparse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AaravMalani%2Fhtmlparse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AaravMalani%2Fhtmlparse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AaravMalani%2Fhtmlparse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AaravMalani","download_url":"https://codeload.github.com/AaravMalani/htmlparse/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247484480,"owners_count":20946388,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collaborate","html","module","package","parser","pip","pypi","python","re","recursive","regex"],"created_at":"2024-11-05T22:11:07.288Z","updated_at":"2025-09-29T07:08:26.837Z","avatar_url":"https://github.com/AaravMalani.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# htmlparse: A basic HTML parser in Python\n## Installation\n```sh\n# Linux\npython3 -m pip install parser-html\n# Windows\npython -m pip install parser-html\n# Build from source\npython -m pip install git+https://github.com/AaravMalani/htmlparse\n```\n\n## Usage\n```py\nimport htmlparse\n\nwith open('index.html', 'r') as f:\n    element = htmlparse.parse_html(f.read())\n    if not element:\n        raise ValueError(\"Parsing failed!\")\nprint(element.children) # Sub-elements\nprint(element.innerHTML) # Data enclosed by tag\nprint(element.outerHTML) # Data enclosed by tag as well as the tag itself\nelement.innerHTML = 'e\u0026gt;' # Rebuilds this element and sets the innerHTML of all the parent elements\nprint(element.children) # ['e\u003e'] (The HTMLText element is represented as a string literal)\nprint(element.children[0].text) # e\u003e (Use HTMLText.outerHTML for an HTML escaped string (e\u0026gt;) however don't set it)\nelement.outerHTML = '\u003cdiv class=\"black blue\"\u003e\u003ca href=\"https://github.com/\" id=\"abc\"\u003e\u003c/div\u003e' # Read above statement\n# assigning to element.children is in the works\nprint(tag.attrs) # {\"href\":\"https://github.com/\", \"id\":\"abc\"}\nprint(tag.tag_name) # a\nelement.children = []\nelement.attrs = {} # WARNING! You have to set it, you can't do element.attrs.update or element.attrs |=\nprint(tag.outerHTML) # \u003cdiv\u003e\u003c/div\u003e\n```\n\n## ToDo\n- [ ] Support for CSS styles \n- [ ] Support for JS scripts\n- [x] Support for assignment to `HTMLElement.children` list\n- [x] Support for text between strings\n- [ ] Support for CSS selectors\n- [ ] Support for XPATH\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaravmalani%2Fhtmlparse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faaravmalani%2Fhtmlparse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaravmalani%2Fhtmlparse/lists"}