{"id":34035462,"url":"https://github.com/chuanconggao/html2json","last_synced_at":"2026-04-07T06:31:10.333Z","repository":{"id":57437643,"uuid":"61140946","full_name":"chuanconggao/html2json","owner":"chuanconggao","description":"Lightweight library that converts a HTML webpage to JSON data using a template defined in JSON.","archived":false,"fork":false,"pushed_at":"2025-06-02T06:26:11.000Z","size":107,"stargazers_count":23,"open_issues_count":0,"forks_count":6,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-09-28T05:23:50.978Z","etag":null,"topics":["html","html2json","json"],"latest_commit_sha":null,"homepage":"https://git.io/html2json","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chuanconggao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-06-14T17:06:08.000Z","updated_at":"2025-08-11T04:22:00.000Z","dependencies_parsed_at":"2022-09-15T11:14:15.692Z","dependency_job_id":null,"html_url":"https://github.com/chuanconggao/html2json","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/chuanconggao/html2json","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chuanconggao%2Fhtml2json","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chuanconggao%2Fhtml2json/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chuanconggao%2Fhtml2json/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chuanconggao%2Fhtml2json/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chuanconggao","download_url":"https://codeload.github.com/chuanconggao/html2json/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chuanconggao%2Fhtml2json/sbom","scorecard":{"id":281831,"data":{"date":"2025-08-11","repo":{"name":"github.com/chuanconggao/html2json","commit":"1f3befd9fcfc9870bab4162d52febf0e4c9d53d8"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.7,"checks":[{"name":"Code-Review","score":0,"reason":"Found 2/26 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Maintained","score":5,"reason":"7 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 5","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'main'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 3 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-17T16:04:24.447Z","repository_id":57437643,"created_at":"2025-08-17T16:04:24.447Z","updated_at":"2025-08-17T16:04:24.447Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31503380,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html","html2json","json"],"created_at":"2025-12-13T20:02:13.979Z","updated_at":"2026-04-07T06:31:10.328Z","avatar_url":"https://github.com/chuanconggao.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPi version](https://img.shields.io/pypi/v/html2json.svg)](https://pypi.python.org/pypi/html2json/)\n[![PyPi pyversions](https://img.shields.io/pypi/pyversions/html2json.svg)](https://pypi.python.org/pypi/html2json/)\n[![PyPi license](https://img.shields.io/pypi/l/html2json.svg)](https://pypi.python.org/pypi/html2json/)\n\nConvert a HTML webpage to JSON data using a template defined in JSON.\n\nInstallation Guide\n----\n\nThis package is available on PyPi. Just use `pip install -U html2json` to install it. Then you can import it using `from html2json import collect`.\n\n- Note that starting version 0.3.0, at least Python 3.9 is required.\n\nAPI\n----\n\nThe method is `collect(html, template)`. `html` is the HTML of page loaded as string, and `template` is the JSON of template loaded as Python objects.\n\nNote that the HTML must contain the root node, like `\u003chtml\u003e...\u003c/html\u003e` or `\u003cdiv\u003e...\u003c/div\u003e`.\n\nTemplate Syntax\n----\n\n| For detailed syntax examples, please refer to unit tests (with 100% coverage).\n\nThe basic syntax is `keyName: [selector, attr, [listOfRegexes]]`.\n    1. `selector` is a CSS selector (supported by [lxml](http://lxml.de/)).\n        - When the selector is `null`, the root node itself is matched.\n        - When the selector cannot be matched, `null` is returned.\n        - When the selector matches single element, a string is returned.\n        - When the selector matches multiple elements, a list of string is returned.\n        - If only selector is needed, you can just specify a string instead of list.\n    2. `attr` matches the attribute value. It can be `null` to match either the inner text or the outer text when the inner text is empty.\n        - Optional when only selector is needed.\n    3. The list of regexes `[listOfRegexes]` supports two forms of regex operations. The operations with in the list are executed sequentially.\n        - Replacement: `s/regex/replacement/g`. `g` is optional for multiple replacements.\n        - Extraction: `/regex/`.\n        - Note that you can use any character as separator instead of `/`.\n        - Optional when only selector and/or attribute are needed.\n\nFor example:\n\n```json\n{\n    \"Color\": [\"head link:nth-of-type(1)\", \"href\", [\"/\\\\w+(?=\\\\.css)/\"]],\n}\n```\n\nStarting version 0.3.1, besides value, key can also matched like `\"[selector, ...]\": ...`. Note that key must be a string for valid JSON.\n\n- When the selector cannot be matched, key is not added to JSON.\n- When the selector matches single element, returned string is used as key.\n- When the selector matches multiple elements, list of returned strings are used as multiple keys.\n\nStarting version 0.3.1, you can also replace certain part of value's selector with current key using syntax `...{key}...`. This is especially useful when key is dynamic.\n\n\u003cbr/\u003e\n\nAs JSON, nested structure can be easily constructed.\n\n```json\n{\n    \"Cover\": {\n        \"URL\": [\".cover img\", \"src\", []],\n        \"Number of Favorites\": [\".cover .favorites\", \"value\", []]\n    },\n}\n```\n\n\u003cbr/\u003e\n\nAn alternative simplified syntax `keyName: [subRoot, subTemplate]` can be used.\n    1. `subRoot` a CSS selector of the new root for each sub entry.\n    2. `subTemplate` is a sub-template for each entry, recursively.\n\nFor example, the previous example can be simplified as follow.\n\n```json\n{\n    \"Cover\": [\".cover\", {\n        \"URL\": [\"img\", \"src\", []],\n        \"Number of Favorites\": [\".favorites\", \"value\", []]\n    }],\n}\n```\n\n\u003cbr/\u003e\n\nTo extract a list of sub-entries following the same sub-template, the list syntax is `keyName: [[subRoot, subTemplate]]`. Please note the difference (surrounding `[` and `]`) from the previous syntax above.\n    1. `subRoot` is the CSS selector of the new root for each sub entry.\n    2. `subTemplate` is the sub-template for each entry, recursively.\n        - Optional or `null` to match entire sub-root\n\nFor example:\n\n```json\n{\n    \"Comments\": [[\".comments\", {\n        \"From\": [\".from\", null, []],\n        \"Content\": [\".content\", null, []],\n        \"Photos\": [[\"img\", {\n            \"URL\": [\"\", \"src\", []]\n        }]]\n    }]]\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchuanconggao%2Fhtml2json","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchuanconggao%2Fhtml2json","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchuanconggao%2Fhtml2json/lists"}