{"id":13471311,"url":"https://github.com/code4craft/jsoup-learning","last_synced_at":"2025-04-13T02:20:33.453Z","repository":{"id":10358770,"uuid":"12498175","full_name":"code4craft/jsoup-learning","owner":"code4craft","description":"Jsoup学习笔记。添加了部分学习代码和注释。","archived":false,"fork":false,"pushed_at":"2023-12-16T18:10:39.000Z","size":1004,"stargazers_count":638,"open_issues_count":3,"forks_count":228,"subscribers_count":71,"default_branch":"master","last_synced_at":"2025-04-04T04:12:19.103Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/code4craft.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2013-08-31T00:32:57.000Z","updated_at":"2025-03-30T14:29:05.000Z","dependencies_parsed_at":"2024-04-09T11:40:14.969Z","dependency_job_id":"d4a50a61-6731-4d2c-9b1d-cc86f49964cc","html_url":"https://github.com/code4craft/jsoup-learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code4craft%2Fjsoup-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code4craft%2Fjsoup-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code4craft%2Fjsoup-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code4craft%2Fjsoup-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/code4craft","download_url":"https://codeload.github.com/code4craft/jsoup-learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248654646,"owners_count":21140335,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T16:00:42.932Z","updated_at":"2025-04-13T02:20:33.423Z","avatar_url":"https://github.com/code4craft.png","language":"Java","readme":"Jsoup学习笔记 \n------\n**Jsoup**是Java世界的一款HTML解析工具，它支持用CSS Selector方式选择DOM元素，也可过滤HTML文本，防止XSS攻击。\n\n学习Jsoup是为了更好的开发我的另一个爬虫框架[webmagic](https://github.com/code4craft/webmagic)，为了学的比较详细，就强制自己用很规范的方式写出这部分文章。\n\n代码部分来自[https://github.com/jhy/jsoup](https://github.com/jhy/jsoup)，添加了一些中文注释以及示例代码。\n\n---------------\n\n## 提纲\n\n1. [概述](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup1.md)\n\n2. [DOM相关对象](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup2.md)\n\n3. [Document的输出](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup3.md)\n\n4. HTML语法分析parser\n\n\t1. [语法分析与状态机基础](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup4.md)\n\t2. [词法分析Tokenizer](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup5.md)\n\t3. [语法检查及DOM树构建](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup6.md)\n\n5. [CSS Selector](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup7.md)\n\n6. [防御XSS攻击](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup8.md)\n\n7. [为Jsoup增加XPath选择功能](https://github.com/code4craft/xsoup)\n\t\n\tJsoup默认没有XPath功能，我写了一个项目[Xsoup](https://github.com/code4craft/xsoup)，可以使用XPath来选择HTML文本。Java里较常用的XPath抽取器是HtmlCleaner，Xsoup的性能比它快了一倍。\n\n-------\n\n## 协议：\n\n相关代码遵循MIT协议。\n\n文档遵循CC-BYNC协议。\n\n[![Bitdeli Badge](https://d2weczhvl823v0.cloudfront.net/code4craft/jsoup-learning/trend.png)](https://bitdeli.com/free \"Bitdeli Badge\")\n\n","funding_links":[],"categories":["Java"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode4craft%2Fjsoup-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcode4craft%2Fjsoup-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode4craft%2Fjsoup-learning/lists"}