https://github.com/spektom/spider
Simple Web crawler written in Java
https://github.com/spektom/spider
homework-exercises java web-crawler
Last synced: 3 months ago
JSON representation
Simple Web crawler written in Java
- Host: GitHub
- URL: https://github.com/spektom/spider
- Owner: spektom
- Created: 2011-09-06T08:48:36.000Z (almost 14 years ago)
- Default Branch: master
- Last Pushed: 2011-09-06T09:00:11.000Z (almost 14 years ago)
- Last Synced: 2025-01-20T16:53:27.246Z (5 months ago)
- Topics: homework-exercises, java, web-crawler
- Language: Java
- Homepage:
- Size: 102 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Spider
=======
Simple Web crawler written in Java.Building
---------
`ant`Running
--------
`java -cp spider.jar org.spektom.spider.SpiderTool`Usage
------
`java Spider [options] URL`
Where options are:
-r <true|false> Follow robots.txt and META robot tag rules (default: true)
-t <number> Number of concurrent downloads (default: 5)
-f <true|false> Follow other domains (default: false)
-c <timeout> Connect/read timeout in milliseconds (default: 5000)
-u <string> String that will be sent in User-Agent header (default: none)
-p <pattern> Follow only URLs that match pattern
-v <true|false> Verbose output (default: false)