{"id":19448040,"url":"https://github.com/scrapinghub/webstruct","last_synced_at":"2026-01-26T20:11:33.676Z","repository":{"id":9644417,"uuid":"11578385","full_name":"scrapinghub/webstruct","owner":"scrapinghub","description":"NER toolkit for HTML data","archived":false,"fork":false,"pushed_at":"2024-05-03T19:37:19.000Z","size":15314,"stargazers_count":259,"open_issues_count":23,"forks_count":59,"subscribers_count":129,"default_branch":"master","last_synced_at":"2025-09-25T12:22:12.599Z","etag":null,"topics":["crfsuite","data-science","ner"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scrapinghub.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.rst","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-07-22T10:05:49.000Z","updated_at":"2025-02-07T03:28:56.000Z","dependencies_parsed_at":"2024-06-20T00:15:50.338Z","dependency_job_id":"04a5eaef-3650-427f-969d-2d48b74e78c4","html_url":"https://github.com/scrapinghub/webstruct","commit_stats":{"total_commits":421,"total_committers":11,"mean_commits":38.27272727272727,"dds":0.4323040380047506,"last_synced_commit":"17c8254898368ead7448a9145e98a125f8891558"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/scrapinghub/webstruct","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapinghub%2Fwebstruct","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapinghub%2Fwebstruct/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapinghub%2Fwebstruct/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapinghub%2Fwebstruct/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scrapinghub","download_url":"https://codeload.github.com/scrapinghub/webstruct/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapinghub%2Fwebstruct/sbom","scorecard":{"id":806067,"data":{"date":"2025-08-11","repo":{"name":"github.com/scrapinghub/webstruct","commit":"17c8254898368ead7448a9145e98a125f8891558"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":1.3,"checks":[{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":0,"reason":"license file not detected","details":["Warn: project does not have a license file"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Vulnerabilities","score":0,"reason":"27 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2022-288 / GHSA-6hrg-qmvc-2xh8","Warn: Project is vulnerable to: PYSEC-2020-73","Warn: Project is vulnerable to: PYSEC-2014-14 / GHSA-652x-xj99-gmcc","Warn: Project is vulnerable to: GHSA-9hjg-9r4m-mvj7","Warn: Project is vulnerable to: GHSA-9wx4-h78v-vm56","Warn: Project is vulnerable to: PYSEC-2014-13 / GHSA-cfj3-7x9c-4p3h","Warn: Project is vulnerable to: PYSEC-2018-28 / GHSA-x84v-xcm2-53pg","Warn: Project is vulnerable to: GHSA-g7vv-2v7x-gj9p","Warn: Project is vulnerable to: GHSA-55x5-fj6c-h6m8","Warn: Project is vulnerable to: PYSEC-2014-9 / GHSA-57qw-cc2g-pv5p","Warn: Project is vulnerable to: PYSEC-2021-19 / GHSA-jq4v-f5q6-mjqq","Warn: Project is vulnerable to: GHSA-pgww-xf46-h92r","Warn: Project is vulnerable to: PYSEC-2022-230 / GHSA-wrxv-2j5q-m38w","Warn: Project is vulnerable to: PYSEC-2018-12 / GHSA-xp26-p53h-6h2p","Warn: Project is vulnerable to: PYSEC-2018-34 / GHSA-2fc2-6r4j-p65h","Warn: Project is vulnerable to: PYSEC-2021-856 / GHSA-5545-2q6w-2gh6","Warn: Project is vulnerable to: PYSEC-2019-108 / GHSA-9fq2-x9r6-wfmf","Warn: Project is vulnerable to: PYSEC-2018-33 / GHSA-cw6w-4rcx-xphc","Warn: Project is vulnerable to: PYSEC-2021-857 / GHSA-f7c7-j99h-c22f","Warn: Project is vulnerable to: GHSA-fpfv-jqm9-f5jm","Warn: Project is vulnerable to: PYSEC-2017-1 / GHSA-frgw-fgh6-9g52","Warn: Project is vulnerable to: PYSEC-2020-107 / GHSA-jjw5-xxj6-pcv5","Warn: Project is vulnerable to: PYSEC-2024-110 / GHSA-jw8x-6495-233v","Warn: Project is vulnerable to: PYSEC-2020-108","Warn: Project is vulnerable to: PYSEC-2019-156 / GHSA-xp76-357g-9wqq","Warn: Project is vulnerable to: PYSEC-2023-102","Warn: Project is vulnerable to: PYSEC-2023-114"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-23T11:48:00.377Z","repository_id":9644417,"created_at":"2025-08-23T11:48:00.378Z","updated_at":"2025-08-23T11:48:00.378Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28787152,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-26T13:55:28.044Z","status":"ssl_error","status_checked_at":"2026-01-26T13:55:26.068Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crfsuite","data-science","ner"],"created_at":"2024-11-10T16:23:25.106Z","updated_at":"2026-01-26T20:11:33.656Z","avatar_url":"https://github.com/scrapinghub.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"Webstruct\n=========\n\n.. image:: https://img.shields.io/pypi/v/webstruct.svg\n   :target: https://pypi.python.org/pypi/webstruct\n   :alt: PyPI Version\n\n.. image:: https://travis-ci.org/scrapinghub/webstruct.svg?branch=master\n   :target: https://travis-ci.org/scrapinghub/webstruct\n   :alt: Build Status\n\n.. image:: https://codecov.io/gh/scrapinghub/webstruct/branch/master/graph/badge.svg\n   :target: https://codecov.io/gh/scrapinghub/webstruct\n   :alt: Code Coverage\n\n.. image:: https://readthedocs.org/projects/webstruct/badge/?version=latest\n   :target: http://webstruct.readthedocs.io/en/latest/\n   :alt: Documentation\n\n\nWebstruct is a library for creating statistical NER_ systems that work\non HTML data, i.e. a library for building tools that extract named\nentities (addresses, organization names, open hours, etc) from webpages.\n\nUnlike most NER systems, webstruct works on HTML data, not only\non text data. This allows to define features that use HTML structure,\nand also to embed annotation results back into HTML.\n\nRead the docs_ for more info.\n\nLicense is MIT.\n\n.. _docs: http://webstruct.readthedocs.io/en/latest/\n.. _NER: http://en.wikipedia.org/wiki/Named-entity_recognition\n\nContributing\n------------\n\n* Source code: https://github.com/scrapinghub/webstruct\n* Bug tracker: https://github.com/scrapinghub/webstruct/issues\n\nTo run tests, make sure tox_ is installed, then run\n``tox`` from the source root.\n\n.. _tox: https://tox.readthedocs.io/en/latest/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapinghub%2Fwebstruct","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscrapinghub%2Fwebstruct","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapinghub%2Fwebstruct/lists"}