{"id":13458743,"url":"https://github.com/jhy/jsoup","last_synced_at":"2025-05-12T18:08:24.243Z","repository":{"id":765433,"uuid":"442430","full_name":"jhy/jsoup","owner":"jhy","description":"jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.","archived":false,"fork":false,"pushed_at":"2025-05-05T03:41:51.000Z","size":6206,"stargazers_count":11144,"open_issues_count":11,"forks_count":2229,"subscribers_count":394,"default_branch":"master","last_synced_at":"2025-05-05T15:21:32.534Z","etag":null,"topics":["css","css-selectors","dom","html","java","java-html-parser","jsoup","parser","xml","xpath"],"latest_commit_sha":null,"homepage":"https://jsoup.org","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jhy.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2009-12-19T01:29:58.000Z","updated_at":"2025-05-05T14:32:56.000Z","dependencies_parsed_at":"2023-12-14T08:23:46.940Z","dependency_job_id":"b8ee58ba-c1d1-468e-b0b6-54b2068faa67","html_url":"https://github.com/jhy/jsoup","commit_stats":{"total_commits":2015,"total_committers":127,"mean_commits":"15.866141732283465","dds":"0.19801488833746894","last_synced_commit":"36ba3edad1de83e61dff71ca929909587eebd834"},"previous_names":[],"tags_count":57,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhy%2Fjsoup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhy%2Fjsoup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhy%2Fjsoup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhy%2Fjsoup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jhy","download_url":"https://codeload.github.com/jhy/jsoup/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252522178,"owners_count":21761685,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["css","css-selectors","dom","html","java","java-html-parser","jsoup","parser","xml","xpath"],"created_at":"2024-07-31T09:00:56.428Z","updated_at":"2025-05-05T15:21:46.408Z","avatar_url":"https://github.com/jhy.png","language":"Java","readme":"# jsoup: Java HTML Parser\n\n**jsoup** is a Java library that makes it easy to work with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, and xpath selectors.\n\n**jsoup** implements the [WHATWG HTML5](https://html.spec.whatwg.org/multipage/) specification, and parses HTML to the same DOM as modern browsers.\n\n* scrape and [parse](https://jsoup.org/cookbook/input/parse-document-from-string) HTML from a URL, file, or string\n* find and [extract data](https://jsoup.org/cookbook/extracting-data/selector-syntax), using DOM traversal or CSS selectors\n* manipulate the [HTML elements](https://jsoup.org/cookbook/modifying-data/set-html), attributes, and text\n* [clean](https://jsoup.org/cookbook/cleaning-html/safelist-sanitizer) user-submitted content against a safe-list, to prevent XSS attacks\n* output tidy HTML\n\njsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree.\n\nSee [**jsoup.org**](https://jsoup.org/) for downloads and the full [API documentation](https://jsoup.org/apidocs/).\n\n[![Build Status](https://github.com/jhy/jsoup/workflows/Build/badge.svg)](https://github.com/jhy/jsoup/actions?query=workflow%3ABuild)\n\n## Example\nFetch the [Wikipedia](https://en.wikipedia.org/wiki/Main_Page) homepage, parse it to a [DOM](https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Introduction), and select the headlines from the *In the News* section into a list of [Elements](https://jsoup.org/apidocs/org/jsoup/select/Elements.html):\n\n```java\nDocument doc = Jsoup.connect(\"https://en.wikipedia.org/\").get();\nlog(doc.title());\nElements newsHeadlines = doc.select(\"#mp-itn b a\");\nfor (Element headline : newsHeadlines) {\n  log(\"%s\\n\\t%s\", \n    headline.attr(\"title\"), headline.absUrl(\"href\"));\n}\n```\n[Online sample](https://try.jsoup.org/~LGB7rk_atM2roavV0d-czMt3J_g), [full source](https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/examples/Wikipedia.java).\n\n## Open source\njsoup is an open source project distributed under the liberal [MIT license](https://jsoup.org/license). The source code is available on [GitHub](https://github.com/jhy/jsoup).\n\n## Getting started\n1. [Download](https://jsoup.org/download) the latest jsoup jar (or add it to your Maven/Gradle build)\n2. Read the [cookbook](https://jsoup.org/cookbook/)\n3. Enjoy!\n\n### Android support\nWhen used in Android projects, [core library desugaring](https://developer.android.com/studio/write/java8-support#library-desugaring) with the [NIO specification](https://developer.android.com/studio/write/java11-nio-support-table) should be enabled to support Java 8+ features.\n\n## Development and support\nIf you have any questions on how to use jsoup, or have ideas for future development, please get in touch via [jsoup Discussions](https://github.com/jhy/jsoup/discussions).\n\nIf you find any issues, please file a [bug](https://jsoup.org/bugs) after checking for duplicates.\n\nThe [colophon](https://jsoup.org/colophon) talks about the history of and tools used to build jsoup.\n\n## Status\njsoup is in general, stable release.\n\n## Author\njsoup was created and is maintained by [Jonathan Hedley](//jhedley.com), its primary author.\n\njsoup is an open-source project, and many contributors have helped improve it over the years. You can see their contributions and join the development on [GitHub](https://github.com/jhy/jsoup/graphs/contributors).\n\n## Citing jsoup\nIf you use jsoup in research or technical documentation, you can cite it as:\n\n\u003e **Jonathan Hedley \u0026 jsoup contributors. jsoup: Java HTML Parser (2009–present).** Available at: https://jsoup.org\n\n```plaintext\n@misc{jsoup,\n  author = {Jonathan Hedley and jsoup contributors},\n  title = {jsoup: Java HTML Parser},\n  year = {2025},\n  url = {https://jsoup.org}\n}\n```\n","funding_links":[],"categories":["Java","All","🧩 HTML \u0026 XML Parsing","网络服务","dependency list","III. Network and Integration"],"sub_categories":["Ruby","网络服务_其他","7. Web Crawling and HTML parsering","相关工具"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhy%2Fjsoup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjhy%2Fjsoup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhy%2Fjsoup/lists"}