Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nick-robo/mediawiki-tools
Tools for getting data from MediaWiki websites
https://github.com/nick-robo/mediawiki-tools
data-mining mediawiki mediawiki-api wikipedia wikipedia-api
Last synced: 3 months ago
JSON representation
Tools for getting data from MediaWiki websites
- Host: GitHub
- URL: https://github.com/nick-robo/mediawiki-tools
- Owner: nick-robo
- License: mit
- Created: 2021-09-04T07:23:13.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-09-06T06:07:25.000Z (over 1 year ago)
- Last Synced: 2024-10-01T10:03:53.728Z (4 months ago)
- Topics: data-mining, mediawiki, mediawiki-api, wikipedia, wikipedia-api
- Language: Python
- Homepage:
- Size: 209 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MediaWiki Tools
[![Coverage Status](https://coveralls.io/repos/github/nick-robo/MediaWiki-Tools/badge.svg?branch=main)](https://coveralls.io/github/nick-robo/MediaWiki-Tools?branch=main)
A high level library containing a set of tools for filtering pages using the rich data available in MediaWikis such as categories and info boxes. Uses both web-scraping and API methods (where available and feasible) to gather information.
# Goals
- Generate useful data (and datasets) from a wiki.
- To work on any MediaWiki (including `fandom.com`) with or without api.
- Get arbitrary subsets of pages based on categories and template parameters (todo).
- Be very robust to variations and inconsistencies in user input.
- Be efficient.# Installation
Install it using pip.
```
pip install mediawiki-tools
```Requires python `>3.8` because I like the walrus operator.
# Usage
Check out the [basic usage](https://nick-robo.github.io/MediaWiki-Tools/mwtools.html) guide and detailed [API documentation](https://nick-robo.github.io/MediaWiki-Tools/mwtools/mediawikitools.html).
# Example
Question: Which countries in Asia use english as spoken Language?
Answer:
```python
from mwtools import MediaWikiToolswiki = MediaWikiTools('en.wikipedia.org')
wiki.get_set(['Countries in Asia',
'English-speaking countries and territories'],
'and')
# ['Philippines', 'Pakistan', 'Bahrain', 'Singapore', 'Brunei', 'India']
```Question: Which countries in Asia or Europe use english as spoken Language?
Answer:
```python
wiki.get_set(['Countries in Asia', 'Countries in Europe',
'English-speaking countries and territories'],
['or','and'])
# ['Philippines',
# 'United Kingdom',
# 'Brunei',
# 'Malta',
# 'India',
# 'Pakistan',
# 'Scotland',
# 'Republic of Ireland',
# 'Singapore',
# 'Bahrain']
```Question: Which of these countries are not island nations?
Answer:
```python
wiki.get_set(['Countries in Asia', 'Countries in Europe',
'English-speaking countries and territories',
'Island countries'],
['or', 'and', 'not'])
# ['Pakistan', 'India']
```