Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rashikakarki/related-articles-wikipedia
Using Wikipedia Clickstream data to find related articles to the searched term..
https://github.com/rashikakarki/related-articles-wikipedia
Last synced: 15 days ago
JSON representation
Using Wikipedia Clickstream data to find related articles to the searched term..
- Host: GitHub
- URL: https://github.com/rashikakarki/related-articles-wikipedia
- Owner: RashikaKarki
- Created: 2021-04-30T08:48:22.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-10-10T14:57:50.000Z (about 2 years ago)
- Last Synced: 2024-10-18T17:49:41.969Z (19 days ago)
- Language: Python
- Size: 1.67 MB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Related Articles Wikipedia
![Demo](https://github.com/RashikaKarki/Related-Articles-Wikipedia/blob/main/resource/demo.gif)
A simple web application that gives you list of related articles to the searched term in wikipedia. Uses Wikipedia Clickstream data.
## About the data:
> The data contains counts of (referer, resource) or (source, destination) pairs extracted from the request logs of Wikipedia.
### Format of the data
The data has 4 columns: Source, Destination, Type and TotalNum
- **Source :** The result of mapping the source/referrer URL to the fixed set of values
- **the article title** : an article in the main namespace
- **other-internal** : a page from any other Wikimedia project
- **other-search** : an external search engine
- **other-external** : any other external site
- **other-empty** : an empty source
- **other-other** : anything else
- **Destination :** The title of the article client requested
- **Type :** Described the relation between source and destination
- **link** : if the source and request are both articles and the source links to the request
- **external** : if the source host is not en(.m)?.wikipedia.org
- **other** : if the source and request are both articles but the source does not link to the request. This can happen when clients search or spoof their refer.
- **TotalNum :** The number of occurrences of the (source, destination) pairThe data can be found at: https://dumps.wikimedia.org/other/clickstream/
## How to run the application locally?
- Step 1:
Clone the repo
- Step 2:
Install the required packages:`pip install -r requirements.txt`
- Step 4:
Rename the `.env.example` file to `.env`- Step 3:
Run the application```
flask run
```The server will run on port `5000`