Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/molybdenum-99/whatis
WhatIs.this: simple entity resolution through Wikipedia
https://github.com/molybdenum-99/whatis
entity-resolution wikipedia
Last synced: 3 months ago
JSON representation
WhatIs.this: simple entity resolution through Wikipedia
- Host: GitHub
- URL: https://github.com/molybdenum-99/whatis
- Owner: molybdenum-99
- License: mit
- Created: 2017-12-03T18:50:21.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2023-01-22T01:54:47.000Z (about 2 years ago)
- Last Synced: 2024-11-15T09:51:46.444Z (3 months ago)
- Topics: entity-resolution, wikipedia
- Language: Ruby
- Size: 3.45 MB
- Stars: 18
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# WhatIs.this
[![Gem Version](https://badge.fury.io/rb/whatis.svg)](http://badge.fury.io/rb/whatis)
[![Build Status](https://travis-ci.org/molybdenum-99/whatis.svg?branch=master)](https://travis-ci.org/molybdenum-99/whatis)**WhatIs.this** is a quick probe for the meaning and metadata of concepts through Wikipedia.
## Showcase
```ruby
require 'whatis'sparta = WhatIs.this('Sparta')
# => #
sparta.coordinates
# => #
sparta.image
# => "https://upload.wikimedia.org/wikipedia/commons/6/6c/Sparta_territory.jpg"sparta.describe
# => Sparta
# title: "Sparta"
# description: "city-state in ancient Greece"
# coordinates: #
# extract: "Sparta (Doric Greek: ; Attic Greek: ) was a prominent city-state in ancient Greece."
# image: "https://upload.wikimedia.org/wikipedia/commons/6/6c/Sparta_territory.jpg"# Fetch additional information: categories & translations:
sparta = WhatIs.this('Sparta', categories: true, languages: 'el')
# => #
sparta.describe
# => Sparta
# title: "Sparta"
# description: "city-state in ancient Greece"
# coordinates: #
# categories: ["Former countries in Europe", "Former populated places in Greece", "Locations in Greek mythology", "Populated places in Laconia", "Sparta", "States and territories disestablished in the 2nd century BC", "States and territories established in the 11th century BC"]
# languages: {"el"=>#}
# extract: "Sparta (Doric Greek: ; Attic Greek: ) was a prominent city-state in ancient Greece."
# image: "https://upload.wikimedia.org/wikipedia/commons/6/6c/Sparta_territory.jpg"sparta.languages['el'].resolve
# => ## Multiple entities at once:
WhatIs.these('Paris', 'Berlin', 'Rome', 'Athens')
# => {
# "Paris"=>#,
# "Berlin"=>#,
# "Rome"=>#,
# "Athens"=>#
# }
```
## ApplicationsThe gem is intended to be a simple tool for entities resolution/normalization. Possible usages:
* You have a lot of user-entered answers to "What city are you from". Through `WhatIs.these` it is
pretty easy to resolve them to "canonical" city name (e.g. "Warsaw", "Warszawa", "Warsaw, Poland" =>
"Warsaw") and map locations;
* Quick check on user-entered cultural objects, "what is it";
* Canonical Wikipedia-powered translations of toponyms, movie titles and historical people;
* ...and so-on.## Features/problems
* Fetches Wikipedia data by entity names: canonical title, geographical coordinates, main page image,
the first phrase, short entity description from Wikidata;
* Optionally fetches links to other Wikipedia languages and list of page categories;
* Fetches any number of Wikipedia pages in minimal number of API requests (50-page batches);
* Note that despite this optimization, Wikipedia API responses are not very small, so resolving,
say, 1000 entities, will errrm _take some time_;
* Works with any language version of Wikipedia:
```ruby
WhatIs[:de].this('München')
# => #
```
* Handles not found pages and allows to search them in place:
```ruby
g = WhatIs.this('Guardians Of The Galaxy') # Wikipedia pages is case-sensitive
# => #
g.search(3)
# => [#, #, #]
```
* Handles disambiguation pages:
```ruby
g = WhatIs.this('Guardians of the Galaxy')
# => #
g.describe
# => Guardians of the Galaxy: ambigous (11 options)
# #: Guardians of the Galaxy (1969 team), the original 31st-century team from an alternative timeline of the Marvel Universe (Earth-691)
# #: Guardians of the Galaxy (2008 team), the modern version of the team formed in the aftermath of Annihilation: Conquest
# <...skip...>
# Usage: .variants[0].resolve, .resolve_all
g.variants[1].resolve(categories: true)
# => #
```
* Provides command-line tool:
```
$ whatis Paris Berlin Rome
Paris: Paris {48.856700,2.350800} - capital city of France
Berlin: Berlin {52.516667,13.388889} - capital city of Germany
Rome: Rome {41.900000,12.500000} - capital city of Italy$ whatis --help
Usage: `whatis [options] title1, title2, title3Options:
-l, --language CODE Which language Wikipedia to ask, 2-letter code. "en" by default
-t, --languages [CODE] Without argument, fetches all translations for entity.
With argument (two-letter code) fetches only one translation.
By default, no translations are fetched.
--categories Whether to fetch entity categories
-f, --format FORMAT Output format: one line per entity ("short"), several lines per
entity ("long"), or "json". Default is "short".
-h, --help Show this message
```### Note on disambiguation pages
Unfortunately, Wikipedia does not provide a consistent way to tell disambiguation pages from others,
the only way is to know is to see the page's categories (different for different languages). Therefore,
currently, disambiguation works currently for English, Ukrainian, Russian and Belorussian. Feel free
to contribute disambiguation categories for your language versions!## Usage
`gem install whatis` or add `gem "whatis"` to your `Gemfile`.
Then use it as library (see docs for [WhatIs](www.rubydoc.info/gems/whatis/WhatIs) and its methods)
or command-line tool (try `$ whatis --help`).## How it works
`WhatIs.this` is a small brother of large [reality](https://github.com/molybdenum-99/reality). Under
the hood, it uses [infoboxer](https://github.com/molybdenum-99/infoboxer) semantic Wikipedia client.Most of the information is taken from API response metadata, but for some features (ambiguities
resolution), Wikipedia page is actually parsed.Unlike `reality` (which tries to be _comprehensive_), `WhatIs.this` tries to be as simple yet useful,
as possible.## Author
[Victor Shepelev](http://zverok.github.io)
## License
MIT