Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eggpi/citationhunt
A fun tool for quickly browsing unsourced snippets on Wikipedia.
https://github.com/eggpi/citationhunt
citations flask gamification javascript python wikipedia
Last synced: about 19 hours ago
JSON representation
A fun tool for quickly browsing unsourced snippets on Wikipedia.
- Host: GitHub
- URL: https://github.com/eggpi/citationhunt
- Owner: eggpi
- License: mit
- Created: 2015-02-24T21:37:41.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2024-12-02T12:19:57.000Z (25 days ago)
- Last Synced: 2024-12-25T08:06:38.866Z (2 days ago)
- Topics: citations, flask, gamification, javascript, python, wikipedia
- Language: Python
- Homepage: https://citationhunt.toolforge.org
- Size: 2.12 MB
- Stars: 109
- Watchers: 14
- Forks: 48
- Open Issues: 20
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
## Citation Hunt
Citation Hunt is a simple tool for finding unsourced statements on
Wikipedia in different languages. It is hosted at
[https://citationhunt.toolforge.org](https://citationhunt.toolforge.org).This repository contains the full server and client code. The
[scripts/](https://github.com/eggpi/citationhunt/tree/master/scripts)
directory contains all the scripts used for processing Wikipedia dumps.
Hopefully they will be illustrative and reusable for similar applications.#### I want to help!
That's great! There are many ways you can help. Please take a look at
[CONTRIBUTING.md](https://github.com/eggpi/citationhunt/blob/master/CONTRIBUTING.md)
for guidelines and instructions.#### Running in Toolforge
There are three major components to Citation Hunt and they are each set up in
slightly different ways in
[Toolforge](https://wikitech.wikimedia.org/wiki/Help:Toolforge):* The HTTP serving job.
* The periodic jobs that update the database.
* The continuous job that identifies snippets that were fixed.All of them run on
[Kubernetes](https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes). The
Kubernetes configuration is in the `k8s` directory.After logging in to `login.tools.wmflabs.org`, run the following commands to
create the directory structure and enter the virtualenv:```
mkdir www/python/
webservice --backend=kubernetes python3.9 shell
python3 -m venv www/python/venv/
. www/python/venv/bin/activate
```Now, clone this repository, point uwsgi to it and install the dependencies:
```
git clone https://github.com/eggpi/citationhunt.git
ln -s ../../citationhunt www/python/src
pip install -r citationhunt/requirements.txt
```and start the webservice:
```
webservice --backend=kubernetes python3.9 start
```Then, generate the Cron jobs for Kubernetes:
```
kubectl get cronjobs | tail -n +1 | grep -E -o '^citationhunt-update-[^ ]+' | xargs kubectl delete cronjob # delete existing jobs
(cd citationhunt; k8s/crontab.py | kubectl apply -f -)
kubectl get cronjobs # verify it
```See [scripts/README.md](https://github.com/eggpi/citationhunt/blob/master/scripts/README.md)
for more information about those jobs.Finally, use `k8s/compute_fixed_snippets.yaml` to launch `scripts/compute_fixed_snippets.py`
to detect snippets that get fixed:```
kubectl create --validate=true -f k8s/compute_fixed_snippets.yaml
```