Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/code4craft/jsoup-learning
Jsoup学习笔记。添加了部分学习代码和注释。
https://github.com/code4craft/jsoup-learning
Last synced: 3 days ago
JSON representation
Jsoup学习笔记。添加了部分学习代码和注释。
- Host: GitHub
- URL: https://github.com/code4craft/jsoup-learning
- Owner: code4craft
- License: mit
- Created: 2013-08-31T00:32:57.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2023-12-16T18:10:39.000Z (about 1 year ago)
- Last Synced: 2025-01-15T17:52:12.645Z (10 days ago)
- Language: Java
- Size: 980 KB
- Stars: 638
- Watchers: 71
- Forks: 228
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES
- License: LICENSE
Awesome Lists containing this project
- my-awesome - code4craft/jsoup-learning - 12 star:0.6k fork:0.2k Jsoup学习笔记。添加了部分学习代码和注释。 (Java)
README
Jsoup学习笔记
------
**Jsoup**是Java世界的一款HTML解析工具,它支持用CSS Selector方式选择DOM元素,也可过滤HTML文本,防止XSS攻击。学习Jsoup是为了更好的开发我的另一个爬虫框架[webmagic](https://github.com/code4craft/webmagic),为了学的比较详细,就强制自己用很规范的方式写出这部分文章。
代码部分来自[https://github.com/jhy/jsoup](https://github.com/jhy/jsoup),添加了一些中文注释以及示例代码。
---------------
## 提纲
1. [概述](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup1.md)
2. [DOM相关对象](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup2.md)
3. [Document的输出](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup3.md)
4. HTML语法分析parser
1. [语法分析与状态机基础](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup4.md)
2. [词法分析Tokenizer](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup5.md)
3. [语法检查及DOM树构建](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup6.md)5. [CSS Selector](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup7.md)
6. [防御XSS攻击](https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup8.md)
7. [为Jsoup增加XPath选择功能](https://github.com/code4craft/xsoup)
Jsoup默认没有XPath功能,我写了一个项目[Xsoup](https://github.com/code4craft/xsoup),可以使用XPath来选择HTML文本。Java里较常用的XPath抽取器是HtmlCleaner,Xsoup的性能比它快了一倍。-------
## 协议:
相关代码遵循MIT协议。
文档遵循CC-BYNC协议。
[![Bitdeli Badge](https://d2weczhvl823v0.cloudfront.net/code4craft/jsoup-learning/trend.png)](https://bitdeli.com/free "Bitdeli Badge")