Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/buren/site_mapper
Map all links on a given site
https://github.com/buren/site_mapper
gem ruby sitemapper
Last synced: 11 days ago
JSON representation
Map all links on a given site
- Host: GitHub
- URL: https://github.com/buren/site_mapper
- Owner: buren
- License: mit
- Created: 2014-10-22T12:34:25.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2021-11-29T14:34:43.000Z (almost 3 years ago)
- Last Synced: 2024-10-13T13:36:58.789Z (26 days ago)
- Topics: gem, ruby, sitemapper
- Language: Ruby
- Homepage: https://rubygems.org/gems/site_mapper
- Size: 38.1 KB
- Stars: 11
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SiteMapper
[![Code Climate](https://codeclimate.com/github/buren/site_mapper.png)](https://codeclimate.com/github/buren/site_mapper)
[![Coverage Status](https://img.shields.io/coveralls/buren/site_mapper.svg)](https://coveralls.io/r/buren/site_mapper)
[![Docs badge](https://inch-ci.org/github/buren/site_mapper.svg?branch=master)](http://www.rubydoc.info/github/buren/site_mapper/master)
[![Build Status](https://travis-ci.org/buren/site_mapper.svg?branch=master)](https://travis-ci.org/buren/site_mapper)
[![Dependency Status](https://gemnasium.com/buren/site_mapper.svg)](https://gemnasium.com/buren/site_mapper)
[![Gem Version](https://badge.fury.io/rb/site_mapper.svg)](http://badge.fury.io/rb/site_mapper)Map all links on a given site.
SiteMapper will try to respect `/robots.txt`Works great with [Wayback Archiver](https://github.com/buren/wayback_archiver) a gem that crawls your site and submits each URL to the [Internet Archive (Wayback Machine)](https://archive.org/web/).
## Installation
Install the gem:```bash
gem install site_mapper
```## Usage
Command line usage:
```bash
# Crawl all found links on page
# that has example.com domain
site_mapper example.com
```Ruby usage:
```ruby
# Crawl all found links on page
# that has example.com domain
require 'site_mapper'
SiteMapper.map('example.com') do |new_url|
puts "New URL found: #{new_url}"
end
# Log to STDOUT
SiteMapper.map('example.com', logger: :system) do |new_url|
puts "New URL found: #{new_url}"
end
```## Docs
You can find the docs online on [RubyDoc](http://www.rubydoc.info/github/buren/site_mapper/master).
This gem is documented using `yard` (run from the root of this respository).
```bash
yard # Generates documentation to doc/
```## Contributing
Contributions, feedback and suggestions are very welcome.
1. Fork it
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create new Pull Request## Notes
* Special thanks to the [robots](https://rubygems.org/gems/robots) gem, which provided the bulk of the code in `lib/robots.rb`
## Alternatives
There are a couple of __great__ alternatives, which are more mature and has more features than this Gem and has. Please feel free to check them out:
* [spidr](https://github.com/postmodern/spidr#readme)
* [anemone](https://github.com/chriskite/anemone#readme)## License
[MIT License](LICENSE)