Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/molybdenum-99/mediawiktory

Full-featured MediaWiki client
https://github.com/molybdenum-99/mediawiktory

api mediawiki mediawiki-api mediawiki-client

Last synced: 3 months ago
JSON representation

Full-featured MediaWiki client

Awesome Lists containing this project

README

        

# MediaWiktory, The MediaWiki Client

[![Gem Version](https://badge.fury.io/rb/mediawiktory.svg)](http://badge.fury.io/rb/mediawiktory)
[![Build Status](https://travis-ci.org/molybdenum-99/mediawiktory.svg?branch=master)](https://travis-ci.org/molybdenum-99/mediawiktory)

**MediaWiktory** is a MediaWiki (think Wikipedia, Wiktionary and others) API client. It is the only
client that allows (almost) full access to MediaWiki API powers without loosing of Ruby powers.

No, seriously.

[MediaWiki API](https://www.mediawiki.org/wiki/API:Main_page) currently is very powerful and
full-featured (thought not very easy to use). Things like "fetch first 50 pages from that category
alongside with their revision history and interwiki links and mediafile stats" are typically done
with one carefully constructed request and return lots of useful information.

Yes, there already are [several](https://www.mediawiki.org/wiki/API:Client_code#Ruby)
API clients for Ruby, including ["official" one](https://github.com/wikimedia/mediawiki-ruby-api).
Typical approach for all of them is thick wrapper around some functionality (like "login and edit
pages" or "search and analyze pages"), and leave all the other cool things for generic `action` method
(at best), or without any coverage at all.

MediaWiktory, to the contrary is:

* **thin** wrapper...
* around **all** MediaWiki API features...
* making access to them available through idiomatic Ruby code, easy to use and clearly documented.

## Examples

**Example 1.** Fetching page's text and metadata:

```ruby
api = MediaWiktory::Wikipedia::Api.new
response = api.query. # "query" action is a basis for all pages/categories/meta receiving
titles('Argentina'). # query page titles: Argentina
prop(:info, :revisions). # query page properties: info, revisions
prop(:url, :content). # query those properties subproperties: full URL (from info) and content (from revisions)
response # perform query and parse it!

page = response['pages'].values.first
puts page['title']
# Prints:
# Argentina
puts page['fullurl']
# Prints:
# https://en.wikipedia.org/wiki/Argentina
puts page['revisions'].first['*'].slice(0..200) # first 200 chars of page contents
# Prints:
# {{other uses}}
# {{pp-semi|small=yes}}
# {{Use dmy dates|date=March 2017}}
# {{Coord|34|S|64|W|display=title}}
# {{Infobox country
# |coordinates = {{Coord|34|36|S|58|23|W|type:city}}
# |conventional_long_name = A
```

Note, that for using MediaWiktory API wrapper you need to understand the underlying API. While previous
experience might make you expect something like `api.page('Argentina').text`, in fact you should
use the `query` action, request page title 'Argentina', its `:revisions` property, its `:content`
subproperty—and voila, you have a _1-element list of revisions_ for the page and last revisions `'*'`
key has page's text.

The good news is all methods are documented at [RubyDoc.info](http://www.rubydoc.info/gems/mediawiktory).
Most of the time, the documentation has enough details, so you don't need to refer to MediaWiki
official docs.

**Example 2:** Editing the page (we are editing [Sandbox](https://en.wikipedia.org/wiki/Wikipedia:Sandbox)
here, which is safe, but be careful while experimenting, this code **really** replaces page's text!):

```ruby
token = api.query.meta(:tokens).response.dig('tokens', 'csrftoken')
response = api.edit.title('Wikipedia:Sandbox').text("Test '''me''', MediaWiktory!").token(token).response
response.to_h
# => {"result"=>"Success", "pageid"=>16283969, "title"=>"Wikipedia:Sandbox", "contentmodel"=>"wikitext", "oldrevid"=>779502714, "newrevid"=>779502729, "newtimestamp"=>"2017-05-09T08:24:26Z"}

# This, without token, will raise:
api.edit.title('Wikipedia:Sandbox').text("Test '''me''', MediaWiktory without token!").response
# MediaWiktory::Wikipedia::Response::Error: The "token" parameter must be set.
```

**Example 3:** Fetching all "main" page images for the pages of category:

```ruby
response = api.query. # "query" action again
generator(:categorymembers). # instead of listing titles, we use "page list generator": all members of a category
title('Category:1960s_automobiles'). # ...of this category
prop(:pageimages).prop(:thumbnail). # and fetch "pageimages" property, its "thumbnail" sub-property
limit('max'). # limit to maximum number of pages available in one response
response

# You can fetch ALL of them with, it will be a lot:
# response = response.continue while response.continue?

response.to_h['pages'].values.each do |page|
puts "#{page['title']}: #{page.dig('thumbnail', 'source')}"
end
# AC Cobra: https://upload.wikimedia.org/wikipedia/commons/thumb/e/e8/Shelby_AC_427_Cobra_vl_blue.jpg/50px-Shelby_AC_427_Cobra_vl_blue.jpg
# Acadian (automobile):
# Alfa Romeo 33 Stradale: https://upload.wikimedia.org/wikipedia/commons/thumb/e/eb/1968_Alfa_Romeo_Tipo_33_Stradale.jpg/50px-1968_Alfa_Romeo_Tipo_33_Stradale.jpg
# Alfa Romeo 105/115 Series Coupés: https://upload.wikimedia.org/wikipedia/commons/thumb/8/81/Alfa_Romeo_GT_1300_Junior.jpg/50px-Alfa_Romeo_GT_1300_Junior.jpg
# Alfa Romeo 1750 Berlina: https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Alfa_Romeo_1750_berlina_grey-front.JPG/50px-Alfa_Romeo_1750_berlina_grey-front.JPG
# Alfa Romeo 2000: https://upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Alfa_2000_touring_spider.JPG/50px-Alfa_2000_touring_spider.JPG
# Alfa Romeo 2600: https://upload.wikimedia.org/wikipedia/commons/thumb/6/6b/Alfa-Romeo_2600-Spider-Touring.JPG/50px-Alfa-Romeo_2600-Spider-Touring.JPG
# ...
```

## Usage

```
gem install mediawiktory
```

There are a lot of popular installations of MediaWiki besides Wikipedia. All of them are having
different versions installed with different features enabled and custom extensions turned on.

To catch with this multitude of features, MediaWiktory provides two ways of usage.

### 1. Use default wrapper, generated from English Wikipedia:

```ruby
require 'mediawiktory'
api = MediaWiktory::Wikipedia::Api.new # => English Wikipedia
# or
api = MediaWiktory::Wikipedia::Api.new('http://some.site/w/api.php') # => any other MediaWiki
```

...and wonder through docs of [MediaWiktory::Wikipedia::Api](http://www.rubydoc.info/gems/mediawiktory/MediaWiktory/Wikipedia/Api)
class to understand what you can do.

### 2. Custom wrapper generation.
```
mediawiktory-gen -u http://some.site/w/api.php --path lib/path/to/wrapper --namespace My::Wrapper
```
This will generate `My::Wrapper::Api` class and a lot of other classes wrapping all actions and
modules of target APIs. The generated code is **independent** of MediaWiktory (so you can exclude it
from your runtime), and depends only on `addressable`, `faraday` and `faraday_middleware` gems.

The usage of custom wrapper is basically the same:

```ruby
require 'path/to/wrapper/api'
api = My::Wrapper::Api.new
api.query # .and.so.on
```

You need custom wrapper if:

* you want to have the exact list of features your site has: for example, with Wikia sites, most of
generic functionality (like query and edit) will work, but most of fancy modern Wikipedia actions
will fail with "unknown action";
* your target site has some custom actions and modules: for example, most informative Wikidata actions
are custom ones, like [wbgetentities](https://www.wikidata.org/w/api.php?action=help&modules=wbgetentities),
they are not present in default wrapper;
* you want to catch up with some edge Wikipedia features; Wikipedia wrapper is generated on gem
release, but Wikipedia's API changes everyday with new small and large exerimental features.

**Generator limitations:** Wrapper is generated from [HTML docs of API](en.wikipedia.org/w/api.php),
but currently generator can't process old MediaWiki versions ASCII docs format, which, unfortunately,
is stil in use on [Wikia](https://marvel.wikia.com/api.php), for example. It is subject to further
development, as some "old" installations of MediaWiki provide pretty useful content and a lot of
custom modules.

If you integrate wrapper generated by MediaWiktory into some other library, you should note that:

* All generated code is documented in YARD format, Markdown markup flavour;
* If you use Rubocop, you will find some "good code" practices broken in generated code, because it
is hard to follow them in large code generation.

## Roadmap

* Expose underlying Faraday client for fine-tuning;
* Handle cookies automatically (for logging in);
* Handle file uploads (should be done as multipart, use appropriate Faraday middleware);
* Add parser for outdated ASCII docs.

## Authors

* [Victor Shepelev](https://zverok.github.io) [@zverok](https://github.com/zverok);
* Serhiy Mostovyi [@smostovoy](https://github.com/smostovoy).

## License

[MIT](https://github.com/molybdenum-99/mediawiktory/blob/master/LICENSE.txt)