https://github.com/pderkowski/wikimap
https://github.com/pderkowski/wikimap
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/pderkowski/wikimap
- Owner: pderkowski
- Created: 2017-04-05T12:07:34.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2022-10-23T04:05:09.000Z (over 3 years ago)
- Last Synced: 2025-05-13T17:38:05.748Z (about 1 year ago)
- Language: C++
- Size: 5.08 MB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Wikimap
This is a tool that I used in my thesis. It can automatically:
* download and parse a Wikipedia dump,
* compute document embeddings for articles, based on links between them,
using the DeepWalk algorithm,
* compute t-SNE mappings for obtained vectors,
and more.
## Requirements
* python 2.7
* gcc 4.8.4
## Installation
1. Clone this repo:
```
git clone git@github.com:pderkowski/wikimap.git
```
2. Download pybind11:
```
cd wikimap
git submodule update --init --recursive
```
3. Build C++ libs:
```
make
```
4. Install Python libs listed in requirements.txt. I recommend using pip and
virtualenv:
```
virtualenv env --no-site-packages
source env/bin/activate
pip install -r requirements.txt
```
## Usage
run.py is the entry point to the application. Type
```
python ./run.py -h
```
to see usage info.
For example, to compute embeddings of 100000 most popular articles from Polish
Wikipedia:
```
python ./run.py -t embed -b builds --lang pl
```
The results will be written to builds/ directory.