Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rainyear/cix-extractor-py
基于行块分布函数的通用网页正文(及图片)抽取 - Python版本
https://github.com/rainyear/cix-extractor-py
Last synced: about 1 month ago
JSON representation
基于行块分布函数的通用网页正文(及图片)抽取 - Python版本
- Host: GitHub
- URL: https://github.com/rainyear/cix-extractor-py
- Owner: rainyear
- Created: 2015-09-02T13:17:26.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-09-22T16:41:53.000Z (over 8 years ago)
- Last Synced: 2024-08-04T04:07:28.361Z (5 months ago)
- Language: Python
- Homepage:
- Size: 2.93 KB
- Stars: 114
- Watchers: 7
- Forks: 42
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# cix-extractor-py
[cx-extractor](https://code.google.com/p/cx-extractor/) 的 Python 版本,提取网页正文,添加了[提取正文图片](http://blog.rainy.im/2015/09/02/web-content-and-main-image-extractor/)的功能。