https://github.com/gregorym/boilerpipe
Ruby wrapper for the Boilerpipe API.
https://github.com/gregorym/boilerpipe
Last synced: 7 months ago
JSON representation
Ruby wrapper for the Boilerpipe API.
- Host: GitHub
- URL: https://github.com/gregorym/boilerpipe
- Owner: gregorym
- Created: 2011-03-02T18:42:07.000Z (almost 15 years ago)
- Default Branch: master
- Last Pushed: 2011-11-04T08:31:05.000Z (about 14 years ago)
- Last Synced: 2025-05-29T00:12:16.053Z (8 months ago)
- Language: Ruby
- Homepage:
- Size: 89.8 KB
- Stars: 18
- Watchers: 1
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.textile
Awesome Lists containing this project
README
A ruby wrapper for the Boilerpipe API.
Boilerpipe definition:
bq. The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
For more information: http://code.google.com/p/boilerpipe/
h1. Explication
The Boilerpipe module has only one method which is extract. Extract takes 2 parameters, first the url and second a hash.
The hash can have 3 options:
* output => :html, :htmlFragment, :text, :json, :debug
* extractor => :ArticleExtractor, :DefaultExtractor, :LargestContentExtractor, :KeepEverythingExtractor, :CanolaExtractor
* api: => The api url
None of these options are mandatory. To find out more about these options checkout the Boilerpipe API http://boilerpipe-web.appspot.com/
h1. Example
require "boilerpipe"
Boilerpipe.extract("http://techcrunch.com/2011/05/12/karma-is-a-bitch/", {:output => :json})