Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/code4everything/visual-spider
欢迎体验我们全新的桌面端效率工具RunFlow,https://myrest.top/myflow
https://github.com/code4everything/visual-spider
crawler crawler4j-java java-8 java8 javafx javafx-application spider visualization
Last synced: 3 months ago
JSON representation
欢迎体验我们全新的桌面端效率工具RunFlow,https://myrest.top/myflow
- Host: GitHub
- URL: https://github.com/code4everything/visual-spider
- Owner: code4everything
- License: mit
- Archived: true
- Created: 2017-11-06T06:41:28.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2024-03-06T02:23:18.000Z (10 months ago)
- Last Synced: 2024-09-25T21:46:29.697Z (3 months ago)
- Topics: crawler, crawler4j-java, java-8, java8, javafx, javafx-application, spider, visualization
- Language: Java
- Homepage: https://myrest.top/myflow
- Size: 256 KB
- Stars: 33
- Watchers: 5
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## 欢迎体验我们全新的桌面端效率工具
**欢迎体验我们全新的桌面端效率工具[RunFlow](https://myrest.top/myflow)。**
> [https://myrest.top/myflow](https://myrest.top/myflow)
#### 图片爬取
目前支持的图片格式有 bmp,gif,jpeg,png,tiff,pcx,tga,svg,pic
#### 媒体爬取
目前支持的媒体格式有 avi,mov,swf,asf,navi,wmv,3gp,mkv,flv,rmvb,webm,mpg,mp4,qsv,mpeg,mp3,aac,ogg,wav,flac,ape,wma,aif,au,ram,mmf,amr,flac
#### 链接爬取
其实就是下载HTML源代码
#### 文档爬取
目前支持的文档格式有 pdf,docx,txt,log,conf,java,xml,json,css,js,html,hml,php,wps,rtf
#### 其他文件爬取
目前支持的文件格式有 zip,exe,dmg,iso,jar,msi,rar,tmp,xlsx,mdf,com,casm,for,lib,lst,msg,obj,pas,wki,bas,map,bak,dot,bat,sh,rpm
#### 自定义爬取
自定义XPath表达式,将匹配的网页内容存储至MySQL数据库
![xpath](xpath.png)
> [了解XPath语法](http://www.w3school.com.cn/xpath/xpath_syntax.asp)
#### 爬虫工作流程
![工作流程](workflow.png)
#### 运行截图
![截图](http://oq3iwfipo.bkt.clouddn.com/tutorial/vspider/visualspider.png)
[点我下载](http://oq3iwfipo.bkt.clouddn.com/tools/zhazhapan/visual-spider-1.1.jar)