Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mariusvanderwijden/swcl
Simple Webcrawler
https://github.com/mariusvanderwijden/swcl
Last synced: 17 days ago
JSON representation
Simple Webcrawler
- Host: GitHub
- URL: https://github.com/mariusvanderwijden/swcl
- Owner: MariusVanDerWijden
- Created: 2016-05-24T05:52:24.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-10-08T10:03:07.000Z (about 7 years ago)
- Last Synced: 2024-10-06T16:41:05.266Z (about 1 month ago)
- Language: Java
- Size: 67.4 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# swcl
Simple WebcrawlerThis is my first github project, so be gentle ;)
SWCL is a simple webcrawler which is supposed to crawl sites from a specified base-url on with different options
these options are:-crawl only subsites of the base url
-crawl only subsites of the base url and save them to a local directory
-crawl every link found on the site and recursively on the sites
-crawl based on a dictionary (works like a spider)
-crawl based on a dictionary and save everything to a local directory
-save the found urls to a database or print them to the command line
-works on multiple cores(can be specified) to use the whole bandwitdth
-specific recursion depth specifiableDoesn't work with js-parameters!