https://github.com/jehna/wikigolf
Find shortest amount of clicks between two Wikipedia pages
https://github.com/jehna/wikigolf
Last synced: 5 months ago
JSON representation
Find shortest amount of clicks between two Wikipedia pages
- Host: GitHub
- URL: https://github.com/jehna/wikigolf
- Owner: jehna
- License: mit
- Created: 2019-05-10T21:22:12.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-01-11T02:46:35.000Z (over 2 years ago)
- Last Synced: 2023-03-22T22:37:08.147Z (about 2 years ago)
- Language: TypeScript
- Homepage: https://wikigolf1.herokuapp.com/
- Size: 1.03 MB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 22
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Wikigolf
> Find shortest amount of clicks between two Wikipedia pages
If you start from the Wikipedia page
[Helsinki](https://fi.wikipedia.org/wiki/Helsinki), what is the least amount of
clicks you can do to get to the page of [Avengers:
Endgame](https://fi.wikipedia.org/wiki/Avengers:_Endgame)?The answer is 4 clicks:
1. [Winter Soistice](https://fi.wikipedia.org/wiki/Talvip%C3%A4iv%C3%A4nseisaus)
2. [December 21st](https://fi.wikipedia.org/wiki/21._joulukuuta)
3. [Samuel L. Jackson](https://fi.wikipedia.org/wiki/Samuel_L._Jackson)
4. [Avengers: Endgame](https://fi.wikipedia.org/wiki/Avengers:_Endgame)This project answers the same question for any two pages at Wikipedia by doing a
_breadth-first_ search against Wikipedia database imported to Google BigQuery.## Try it out!
You can try the search out yourself by going to url:
https://wikigolf1.herokuapp.com/
(notice that this service is running on free Heroku instance, so cold start
potentially takes a bit of time)## Developing
To test this project out yourself you'll need to load all page and pagelinks
data from Wikipedia. Here's how:### 1. Download necessary data from Wikipedia
Here are links to download all data from Finnish wikipedia:
- All links between all Wikipedia pages: https://dumps.wikimedia.your.org/fiwiki/latest/fiwiki-latest-pagelinks.sql.gz
- All pages' metadata: https://dumps.wikimedia.your.org/fiwiki/latest/fiwiki-latest-page.sql.gzThere's a script at [converter/](converter/) folder that downloads the data and
streams it to .ndjson files. The `load_to_bigquery.sh` both loads the files and
uploads the results to BigQuery. Note that you should have `bq` CLI client
installed on your machine.Usage:
```sh
cd converter
nvm use 13
./convert_and_upload.sh fi
./convert_and_upload.sh en
```Running this script creates and uploads the needed wikipedia tables to BigQuery.
### 2. Allow API access to BigQuery.
Then create a Google Cloud IAM service account to fill in your .env file for
keys you can find at [`.env-example`](.env-example).Remember to enable BigQuery API access from the console too!
### 3. Start the dev server
You can install the npm dependencies by running:
```
yarn
```And start the project locally by running:
```
yarn dev
```This starts Heroku and Webpack, and starts watching for your files for changes.
## Features
This project
- Is a website and an API
- Website uses React with hooks
- Full stack
- React
- Koa
- Google BigQuery
- TypeScript
- Heroku
- Styled Components
- Converter
- Rust
- Nom parser combinators
- Does a breadth-first search from a Wikipedia page name to another Wikipedia
page name by using link relations between those pages
- Website works with Finnish and English wikipedia site now, but the search
should work just fine with any Wikipedia dump
- Limits the search to maximum of 5 steps
- With an average page having ~30 links to other pages, this means the 5 steps
should cover the whole wikipedia over 33 times (fi wikipedia)## Deploying
This project can be deployed to Heroku, with same environment variables that
you're using locally.You can use Github Actions (or any other CI) to keep your porject up-to-date
with periodic data updates### Setup CI
Create a service account to Google Cloud with at least the BigQuery Admin
permission and download the `secrets.json` file to your computer.## Contributing
If you'd like to contribute, please fork the repository and use a feature
branch. Pull requests are warmly welcome.## Licensing
The code in this project is licensed under MIT license.