Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/matiasinsaurralde/wikipedia

Tool for extracting plain text from wikipedia data
https://github.com/matiasinsaurralde/wikipedia

Last synced: about 2 months ago
JSON representation

Tool for extracting plain text from wikipedia data

Awesome Lists containing this project

README

        

# wikipedia

tool for extracting plain text from wikipedia articles

## Installing:

a gem is available, so fire up your terminal:

````
$ gem install wikipedia
````

## Usage:

it's easy:

````ruby
irb(main):001:0* require 'wikipedia'
irb(main):002:0>
irb(main):003:0* connor = Wikipedia::article 'John Connor'
irb(main):004:0> connor.first # just the first paragraph
"John Connor is a fictional character and the main protagonist of the Terminator franchise.
Created by writer and director James Cameron, the character is first referred to in the 1984 film The Terminator
and first appears, portrayed by teenage actor Edward Furlong, in its 1991 sequel Terminator 2: Judgment Day.
The character is subsequently portrayed by 23-year-old Nick Stahl in the 2003 film Terminator 3: Rise of the Machines
and by 19-year-old Thomas Dekker in the 2007 television series Terminator: The Sarah Connor Chronicles.
English actor Christian Bale portrays Connor in the film series' fourth installment, Terminator Salvation."
````

There's a simple method for checking term's ambiguity, an array of those other terms will be provided in the future.

A good example is 'apple' which may refer to the company, to the fruit, etc.

````ruby
irb(main):001:0> require 'wikipedia'
irb(main):002:0> apple = Wikipedia::article 'apple'
irb(main):003:0> apple.ambiguous?
=> true
````

## TODO

* Integrate it with the [Opensearch API] (http://www.mediawiki.org/wiki/API%3aOpensearch).
* Provide a method for classifying text based on context (using data from Wikipedia's disambiguation pages).
* Switch to Nokogiri or provide support for both Nokogiri and Hpricot?

## Disclaimer

[Hpricot] (https://github.com/whymirror/hpricot) was used as a tribute to [whytheluckystiff] (http://en.wikipedia.org/wiki/Why_the_lucky_stiff).

## License

[MIT](https://github.com/matiasinsaurralde/wikipedia/blob/master/LICENSE)