{"id":13813743,"url":"https://github.com/sihaelov/harser","last_synced_at":"2025-05-15T01:31:28.422Z","repository":{"id":44442951,"uuid":"75213472","full_name":"sihaelov/harser","owner":"sihaelov","description":"Easy way for HTML parsing and building XPath","archived":false,"fork":false,"pushed_at":"2022-07-06T19:20:57.000Z","size":6,"stargazers_count":138,"open_issues_count":3,"forks_count":3,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-09-26T20:05:13.700Z","etag":null,"topics":["html","html-parser","parser","python","xpath"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sihaelov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-11-30T18:07:04.000Z","updated_at":"2024-01-30T14:50:58.000Z","dependencies_parsed_at":"2022-09-09T23:22:51.023Z","dependency_job_id":null,"html_url":"https://github.com/sihaelov/harser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sihaelov%2Fharser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sihaelov%2Fharser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sihaelov%2Fharser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sihaelov%2Fharser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sihaelov","download_url":"https://codeload.github.com/sihaelov/harser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225319333,"owners_count":17455750,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html","html-parser","parser","python","xpath"],"created_at":"2024-08-04T04:01:28.256Z","updated_at":"2024-11-19T08:30:58.929Z","avatar_url":"https://github.com/sihaelov.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\n# Harser\n\n[![Build Status](https://travis-ci.org/sihaelov/harser.svg?branch=master)](https://travis-ci.org/sihaelov/harser) [![Coverage Status](https://img.shields.io/codecov/c/github/sihaelov/harser.svg)](https://codecov.io/gh/sihaelov/harser) [![Wheel Status](https://img.shields.io/badge/wheel-yes-brightgreen.svg)](https://pypi.python.org/pypi/harser) ![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg) [![PyPI Version](https://img.shields.io/pypi/v/harser.svg)](https://pypi.python.org/pypi/harser)\n\n\nHarser is a library for easy extracting data from HTML and building XPath.\n\n## Installation\n\n```python\npip install harser\n```\n## Examples\n\n```python\n\u003e\u003e\u003e from harser import Harser\n\n\u003e\u003e\u003e HTML = '''\n    \u003chtml\u003e\u003cbody\u003e\n    \u003cdiv class=\"header\" id=\"id-header\"\u003e\n        \u003cli class=\"nav-item\" data-nav=\"first-item\" href=\"/nav1\"\u003eFirst item\u003c/li\u003e\n        \u003cli class=\"nav-item\" data-nav=\"second-item\" href=\"/nav2\"\u003eSecond item\u003c/li\u003e\n        \u003cli class=\"nav-item\" data-nav=\"third-item\" href=\"/nav3\"\u003eThird item\u003c/li\u003e\n    \u003c/div\u003e\n    \u003cdiv\u003eFirst layer\n        \u003ch3\u003eLorem Ipsum\u003c/h3\u003e\n        \u003cspan\u003eDolor sit amet\u003c/span\u003e\n    \u003c/div\u003e\n    \u003cdiv\u003eSecond layer\u003c/div\u003e\n    \u003cdiv\u003eThird layer\n        \u003cspan class=\"text\"\u003efirst block\u003c/span\u003e\n        \u003cspan class=\"text\"\u003esecond block\u003c/span\u003e\n        \u003cspan\u003ethird block\u003c/span\u003e\n    \u003c/div\u003e\n    \u003cspan\u003efourth layer\u003c/span\u003e\n    \u003cimg /\u003e\n    \u003cdiv class=\"footer\" id=\"id-foobar\" foobar=\"ab bc cde\"\u003e\n        \u003ch3 some-attr=\"hey\"\u003e\n            \u003cspan id=\"foobar-span\"\u003efoo ter\u003c/span\u003e\n        \u003c/h3\u003e\n    \u003c/div\u003e\n    \u003c/body\u003e\u003c/html\u003e\n'''\n\n\u003e\u003e\u003e harser = Harser(HTML)\n\n\u003e\u003e\u003e harser.find('div', class_='header').children(class_='nav-item').find('text').extract()\n# Or just\n# harser.find(class_='nav-item').find('text').extract()\n['First item', 'Second item', 'Third item']\n\n\u003e\u003e\u003e harser.find(class_='nav-item').get_attr('href').extract()\n['/nav1', '/nav2', '/nav3']\n\n# It is equally\n\u003e\u003e\u003e harser.find('div', class_='header', id='id-header')\n\u003e\u003e\u003e harser.find('div', attrs={'class': 'header', 'id': 'id-header'})\n\n\u003e\u003e\u003e harser.find(id__contains='bar').get_attr('class').extract()\n['footer']\n\n\u003e\u003e\u003e harser.find(href__not_contains='2').find('text').extract()\n['First item', 'Third item']\n\n\u003e\u003e\u003e harser.find(attrs={'data-nav__contains': 'second'}).next_siblings().find('text').extract()\n['Third item']\n\n\u003e\u003e\u003e harser.find('li').parent().next_siblings(filters={'text__contains': 'Second'}).clean_extract()\n['\u003cdiv\u003eSecond layer\u003c/div\u003e']\n\n\u003e\u003e\u003e harser.find('h3', filters={'span.@id__starts_with': 'foo'}).get_attr('some-attr').extract()\n['hey']\n\n\u003e\u003e\u003e harser.find('div').children('h3').xpath\n'//descendant::div/h3'\n\n```\n\n## Support the project\n\nPlease contact [Michael Sinov](mailto:sihaelov@gmail.com?subject=Harser) if you want to support the Harser project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsihaelov%2Fharser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsihaelov%2Fharser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsihaelov%2Fharser/lists"}