https://github.com/meditativeape/wikiracer
Finds the shortest path between two Wikipedia articles, using only Wikipedia links.
https://github.com/meditativeape/wikiracer
go golang wikipedia
Last synced: 5 months ago
JSON representation
Finds the shortest path between two Wikipedia articles, using only Wikipedia links.
- Host: GitHub
- URL: https://github.com/meditativeape/wikiracer
- Owner: meditativeape
- License: mit
- Created: 2017-10-23T00:39:46.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-10-27T18:02:26.000Z (over 8 years ago)
- Last Synced: 2024-06-20T13:37:54.420Z (about 2 years ago)
- Topics: go, golang, wikipedia
- Language: Go
- Homepage:
- Size: 18.6 KB
- Stars: 33
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# wikiracer
Finds a path between two Wikipedia articles, using only Wikipedia links.
## Approach
Wikiracer runs a one-way parallel BFS (Breadth First Search) from the given start URL to crawl the graph of Wikipedia articles until it reaches the target URL.
At each level of BFS, the work is shared across a number of Goroutines. These Goroutines fetch work from a common input channel, which streams links found by Goroutines for the previous level, crawl the articles, and send links found in these articles to another common output channel. The main function collects the output into an array, removes duplicates and links that have already been crawled, and starts the next batch of Goroutines to crawl the new links.
For simplicity, Wikiracer only uses English articles (URL prefix: `en.wikipedia.org/wiki/`).
## Installation
```
$ go get github.com/meditativeape/wikiracer
$ cd $GOPATH/src/github.com/meditativeape/wikiracer
$ make install
```
## Usage
Start wikiracer by running `wikiracer`. It spins up an HTTP server that listens on port `8080`.
Wikiracer offers one REST endpoint, `POST /race`, that expects two keys in the POST form: `startUrl` and `endUrl`. It returns the path found in JSON format. You could use your favorite client, such as cURL or Postman, to query against this endpoint.
Example request as a cURL command:
```
curl localhost:8080/race -F startUrl=https://en.wikipedia.org/wiki/Computer_programming -F endUrl=https://en.wikipedia.org/wiki/Blade_Runner
```
## Logging
Wikiracer keeps a lightweighted log under `/tmp/wikiracer/service.log`.
## License
The MIT License