https://github.com/forcedotcom/sitecrawler
This is a Java library which can be used to crawl the content of some of web properties (www.salesforce.com, blogs.salesforce.com for example). It supports dynamic scaling (depending on available machine power (CPU, RAM) and network capacity) out of the box. It also has a Plugin structure, which allows others to write code (plugins) that act on the crawled pages.
https://github.com/forcedotcom/sitecrawler
Last synced: about 1 year ago
JSON representation
This is a Java library which can be used to crawl the content of some of web properties (www.salesforce.com, blogs.salesforce.com for example). It supports dynamic scaling (depending on available machine power (CPU, RAM) and network capacity) out of the box. It also has a Plugin structure, which allows others to write code (plugins) that act on the crawled pages.
- Host: GitHub
- URL: https://github.com/forcedotcom/sitecrawler
- Owner: forcedotcom
- License: bsd-3-clause
- Created: 2014-03-21T20:45:29.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2021-10-12T10:08:25.000Z (over 4 years ago)
- Last Synced: 2024-04-14T19:33:41.233Z (about 2 years ago)
- Language: Java
- Size: 124 KB
- Stars: 22
- Watchers: 10
- Forks: 9
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
- Codeowners: CODEOWNERS