Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/albertmeronyo/wikilists2skos
A supervised list extractor from Wikipedia to SKOS
https://github.com/albertmeronyo/wikilists2skos
Last synced: 12 days ago
JSON representation
A supervised list extractor from Wikipedia to SKOS
- Host: GitHub
- URL: https://github.com/albertmeronyo/wikilists2skos
- Owner: albertmeronyo
- Created: 2014-06-21T17:30:25.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2014-06-27T16:02:55.000Z (over 10 years ago)
- Last Synced: 2023-03-29T03:19:26.953Z (over 1 year ago)
- Language: Python
- Size: 203 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
WikiLists2SKOS
==============A supervised extractor of SKOS taxonomies from lists in Wikis
## Why this?
HTML renders of Wiki sites (like Wikipedia articles) contain sometimes
concept taxonomies represented as nested lists, like the following:- https://en.wikipedia.org/wiki/List_of_genres
- https://en.wikipedia.org/wiki/List_of_religions_and_spiritual_traditionsIn such pages, titles of sections play the role of top categories of a
concept scheme. Lists contained in such sections generally enumerate
subconcepts of these top categories, and sometimes even nested lists
repersent subsubconcepts of these subconcepts.For programs processing Semantic Web data, representing these
taxonomies as RDF SKOS is much more convenient than the human-tailored
HTML.## So what does it do?
[WikiLists2SKOS](http://github.com/albertmeronyo/WikiLists2SKOS) is a
Python script that reads a target URL, processes its HTML, looks for
lists contained under section headers, generates the correspondent RDF
SKOS taxonomy, and serializes it into a destination Turtle file.Currently, only Wikipedia HTML layouting is supported.
## How to use it?
Type
`./parseWiki.py -i http://foo/bar -o foobaz.ttl`
in your favourite shell.
## Dependencies
- Python 2.7.5
- [RDFLib](https://github.com/RDFLib)
- [lxml](http://lxml.de/)## Credits
Author: [Albert Meroño-Peñuela](http://github.com/albertmeronyo)
License: [LGPL v3.0](http://www.gnu.org/licenses/lgpl.html)