https://github.com/ericlondon/ruby-nokogiri-mongodb-crawler
Ruby class to crawl a website using Nokogiri, MongoDB database, and MongoMapper ORM
https://github.com/ericlondon/ruby-nokogiri-mongodb-crawler
crawl mongodb mongomapper-orm nokogiri ruby
Last synced: 5 months ago
JSON representation
Ruby class to crawl a website using Nokogiri, MongoDB database, and MongoMapper ORM
- Host: GitHub
- URL: https://github.com/ericlondon/ruby-nokogiri-mongodb-crawler
- Owner: EricLondon
- Created: 2013-05-15T00:49:33.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2025-04-23T18:35:00.000Z (6 months ago)
- Last Synced: 2025-04-23T19:44:30.921Z (6 months ago)
- Topics: crawl, mongodb, mongomapper-orm, nokogiri, ruby
- Language: Ruby
- Homepage: https://ericlondon.com/2012/07/29/a-ruby-class-to-crawl-a-website-using-nokogiri-mongodb-database-and-mongomapper-orm.html
- Size: 22.5 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
A Ruby class to crawl a website using Nokogiri, MongoDB database, and MongoMapper ORM
Usage:
1. read [blog post](http://ericlondon.com/2012/07/29/a-ruby-class-to-crawl-a-website-using-nokogiri-mongodb-database-and-mongomapper-orm.html)
2. setup.readme
3. usage.rb:```ruby
# include crawler class
require './ng_crawl.rb'# instantiate crawler class object
ngc = NG_Crawl.new 'http://example.com'# recursively crawl unprocessed URLs
ngc.crawl# output all scanned URLs
puts ngc.all_urls# output all external URLs
puts ngc.all_urls_external
```