Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/noqcks/iron-crawler
A generic web crawler
https://github.com/noqcks/iron-crawler
Last synced: 20 days ago
JSON representation
A generic web crawler
- Host: GitHub
- URL: https://github.com/noqcks/iron-crawler
- Owner: noqcks
- License: mit
- Created: 2016-02-06T18:31:31.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2016-02-09T21:20:07.000Z (almost 9 years ago)
- Last Synced: 2024-10-11T16:09:53.671Z (3 months ago)
- Language: Ruby
- Size: 39.1 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Iron Crawler
A generic web crawler.
## Features
From a starting URL, it will crawl all links on that URL and print a list of URLs visited.
- Follow href attributes contained in tags from the same domain
- Ignores href attributes contained in tags from other domains (even subdomains)
- Captures script src and link href tags for script and link tags respectively
- Outputs a list of visited URLs## Getting Started
It's easy to get started!
### Install
```
gem install iron-crawler
```### Run
```
iron-crawler
```The above command will crawl any site for you.
## TODO
- concurrency (will probably have to move away from mechanize)
- test coverage with Rspec
- set up CI pipeline with travis-ci to [automatically publish to rubygems](https://docs.travis-ci.com/user/deployment/rubygems)