Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mislav/nibbler
A cute HTML scraper / data extraction tool in under 70 lines of code
https://github.com/mislav/nibbler
Last synced: 14 days ago
JSON representation
A cute HTML scraper / data extraction tool in under 70 lines of code
- Host: GitHub
- URL: https://github.com/mislav/nibbler
- Owner: mislav
- License: mit
- Archived: true
- Created: 2009-10-22T19:53:48.000Z (about 15 years ago)
- Default Branch: master
- Last Pushed: 2016-06-06T10:02:29.000Z (over 8 years ago)
- Last Synced: 2024-04-25T19:01:56.971Z (7 months ago)
- Language: Ruby
- Size: 322 KB
- Stars: 142
- Watchers: 6
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Nibbler
=======*Nibbler* is a small little tool (~100 LOC) that helps you map data structures to objects that you define.
It can be used for HTML screen scraping:
~~~ ruby
require 'nibbler'
require 'open-uri'class BlogScraper < Nibbler
element :titleelements 'div.hentry' => :articles do
element 'h2' => :title
element 'a/@href' => :url
end
endblog = BlogScraper.parse open('http://example.com')
blog.title
#=> "My blog title"blog.articles.first.title
#=> "First article title"blog.articles.first.url
#=> "http://example.com/article"
~~~For mapping XML API payloads:
~~~ ruby
class Movie < Nibbler
element './title/@regular' => :name
element './box_art/@large' => :poster_large
element 'release_year' => :year, :with => lambda { |node| node.text.to_i }
element './/link[@title="web page"]/@href' => :url
endresponse = Net::HTTP.get_response URI('http://example.com/movie.xml')
movie = Movie.parse response.bodymovie.name #=> "Toy Story 3"
movie.year #=> 2010
~~~Or even for JSON:
~~~ ruby
require 'json'
require 'nibbler/json'class Movie < NibblerJSON
element :title
element :year
elements :genres
# JSONPath selectors:
element '.links.alternate' => :url
element '.ratings.critics_score' => :critics_score
endmovie = Movie.parse json_string
~~~There are sample scripts in the "examples/" directory:
ruby -Ilib -rubygems examples/delicious.rb
ruby -Ilib -rubygems examples/tweetburner.rb > output.csv[See the wiki][wiki] for more on how to use *Nibbler*.
Requirements
------------*None*. Well, [Nokogiri][] is a requirement if you pass in an HTML string for parsing, like in the example above. Otherwise you can initialize the scraper with an
Hpricot document or anything else that implements `at(selector)` and `search(selector)` methods.NibblerJSON needs a JSON parser if string content is passed, so "json" library should be installed on Ruby 1.8.
[wiki]: http://wiki.github.com/mislav/nibbler
[nokogiri]: http://nokogiri.rubyforge.org/nokogiri/