Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jacopofar/wikipedia-category-graph
An implementation of the paper "Automatically assigning Wikipedia articles to macro-categories"
https://github.com/jacopofar/wikipedia-category-graph
neo4j wikipedia
Last synced: 2 months ago
JSON representation
An implementation of the paper "Automatically assigning Wikipedia articles to macro-categories"
- Host: GitHub
- URL: https://github.com/jacopofar/wikipedia-category-graph
- Owner: jacopofar
- License: apache-2.0
- Created: 2014-12-20T10:00:18.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2014-12-26T10:44:08.000Z (about 10 years ago)
- Last Synced: 2023-03-30T20:11:02.912Z (almost 2 years ago)
- Topics: neo4j, wikipedia
- Language: Java
- Homepage:
- Size: 152 KB
- Stars: 9
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Wikipedia category graph loader
========================An implementation of the algorithm of the article "Automatically assigning Wikipedia articles to macro-categories".
It loads the wikipedia category graph in a Neo4j embedded instance, then proceed to calculate the distance of each category from a set of chosen ones.
The process follows those steps:
1. load the file caterogy.sql obtained from Wikimedia periodical database exports, creating a node with ID and name properties for each category
2. load the file categorylinks.sql to create edges between categories and articles (creating article nodes on the fly)
3. calculate the distance from the chosen categories with the algorithm explained in the paper, using a different cost for edges depending on the travelling directionThe program can be used for any wikipedia edition, for en.wikipedia it took about 20 hours on my laptop and generated a 15GB graph database instance, including Lucene indexes.