{"id":13680713,"url":"https://github.com/openzim/wp1","last_synced_at":"2025-08-02T18:06:53.189Z","repository":{"id":38107382,"uuid":"50779114","full_name":"openzim/wp1","owner":"openzim","description":"Wikipedia 1.0 engine \u0026 selection tools","archived":false,"fork":false,"pushed_at":"2025-04-22T22:58:12.000Z","size":34086,"stargazers_count":38,"open_issues_count":85,"forks_count":39,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-04-22T23:36:29.475Z","etag":null,"topics":["wikidata","wikipedia","zim"],"latest_commit_sha":null,"homepage":"https://wp1.openzim.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openzim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"kiwix","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2016-01-31T14:47:41.000Z","updated_at":"2025-04-22T22:57:10.000Z","dependencies_parsed_at":"2023-10-03T09:57:53.218Z","dependency_job_id":"d40eb7ef-d203-4117-aa4c-5ed0c22f9a87","html_url":"https://github.com/openzim/wp1","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fwp1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fwp1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fwp1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fwp1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openzim","download_url":"https://codeload.github.com/openzim/wp1/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251607347,"owners_count":21616744,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["wikidata","wikipedia","zim"],"created_at":"2024-08-02T13:01:20.902Z","updated_at":"2025-08-02T18:06:53.155Z","avatar_url":"https://github.com/openzim.png","language":"Python","funding_links":["https://github.com/sponsors/kiwix"],"categories":["Python"],"sub_categories":[],"readme":"# Wikipedia 1.0 engine\n\nThis directory contains the code of Wikipedia 1.0 supporting\nsoftware. More information about the Wikipedia 1.0 project can be\nfound [on the Wikipedia in\nEnglish](https://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team).\n\n[![build status](https://github.com/openzim/wp1/actions/workflows/workflow.yml/badge.svg)](https://github.com/openzim/wp1/actions?query=branch%3Amain)\n[![codecov](https://codecov.io/gh/openzim/wp1/branch/main/graph/badge.svg)](https://codecov.io/gh/openzim/wp1)\n[![CodeFactor](https://www.codefactor.io/repository/github/openzim/wp1/badge)](https://www.codefactor.io/repository/github/openzim/wp1)\n[![Doc](https://readthedocs.org/projects/wp1/badge/?style=flat)](https://wp1.readthedocs.io/en/latest/?badge=latest)\n[![License: GPL v2](https://img.shields.io/badge/License-GPL%20v2-blue.svg)](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html)\n\n## Contents\n\nThe `wp1` subdirectory includes code for updating the `enwp10`\ndatabase, specifically the `ratings` table (but also other\ntables). The library code itself isn't directly runnable, but instead\nis loaded and run in various docker images that are maintained in the\n`docker` directory.\n\n`requirements.txt` is a list of python dependencies in pip format that\nneed to be installed in a virtual env in order to run the library code.\nBoth the `web` and `workers` docker images use the same requirements,\nthough [Flask](https://www.palletsprojects.com/p/flask/) and its\ndependencies are not utilized by the worker code.\n\nThe `cron` directory contains wrapper scripts for cron jobs that are\nrun [inside the workers image](https://github.com/openzim/wp1/blob/master/docker/workers/Dockerfile#L15).\n\nThe `setup` directory contains a historical record of the database\nschema used by the tool for what is refered to in code as the `wp10`\ndatabase. This file has been heavily edited, but should be able to be\nused to re-create the `enwp10` database if necessary.\n\n`wp1-frontend` contains the code for the Vue-CLI based frontend,\nwhich is encapsulated and served from the `frontend` docker image.\nSee that directory for instructions on how to setup a development\nenvironment for the frontend.\n\n`conf.json` is a configuration file that is used by the `wp1`\nlibrary code.\n\n`docker-compose.yml` is a file read by the `docker-compose`\n[command](https://docs.docker.com/compose/) in order to generate the\ngraph of required docker images that represent the production environment.\n\n`docker-compose-dev.yml` is a similar file which sets up a dev environment,\nwith Redis and a MariaDB server for the `enwp10` database. Use it like so\n\n```bash\ndocker-compose -f docker-compose-dev.yml up -d\n```\n\n`docker-compose-test.yml` is a another docker file which sets up the test db\nfor python \"nosetests\" (unit tests). Run it similarly:\n\n```bash\ndocker-compose -f docker-compose-test.yml up -d\n```\n\nThe `*.dockerfile` symlinks allow for each docker image in this repository\n(there are many) to be more easily organized.\n\n`openapi.yml` is a YAML file that describes the API of the `web` image\nin [OpenAPI](https://swagger.io/specification/) format. If you visit\nthe [index of the API server](https://api.wp1.openzim.org) you will\nget a swagger-ui documentation frontend that utilizes this file. It\nis symlinked into the `wp1/web` directory.\n\nThe `wp10_test.*.sql` and `wiki_test.*.sql` files are rough\napproximations of the schemas of the two databases that the library\ninterfaces with. They are used for unit testing.\n\n## Installation\n\nThis code is targeted to and tested on Python 3.12.0. For now, all development\nhas been on Linux, use other platforms at your own risk.\n\n### Installing dependencies\n\nWP1 uses [Pipenv](https://pipenv.pypa.io/en/latest/) to managed dependencies.\nA `Pipfile` and `Pipfile.lock` are provided. You should have the pipenv tool\ninstalled in your global Python install (not in a virtualenv):\n\n```bash\npip3 install pipenv\n```\n\nThen you can use:\n\n```bash\npipenv install --dev\n```\n\nWhich will install the dependencies at the precise versions specified in the\n`Pipfile.lock` file. Behind the scenes, Pipenv creates a virtualenv for you\nautomatically, which it keeps up to date when you run Pipenv commands. You\ncan use the `pipenv shell` command to start a shell using the environment,\nwhich is similar to \"activating\" a virtualenv. You can also use `pipenv run`\nto run arbitrary individual shell commands within that environment. In many\ncases, it will be more convenient to use commands like `pipenv run pytest`\nthen actually spawning a subshell.\n\n### Installing frontend requirements\n\nThe frontend requires [Node.js](https://nodejs.org/) version 18 to build and\nrun. Once node is installed, to install the requirements for the frontend\nserver, cd into `wp1-frontend` and use:\n\n```bash\nyarn install\n```\n\nIf you do not have yarn, it can be installed with:\n\n```bash\nnpm i -g yarn\n```\n\n### Docker\n\nYou will also need to have [Docker](https://www.docker.com/) on your system\nin order to run the development server.\n\n### Populating the credentials module\n\nThe script needs access to the enwiki_p replica database (referred to\nin the code as `wikidb`), as well as its own toolsdb application database\n(referred to in the code as `wp10db`). If you are a part of the toolforge\n`enwp10` [project](https://tools.wmflabs.org/admin/tool/enwp10), you can\nfind the credentials for these on toolforge in the replica.my.cnf file in\nthe tool's home directory. They need to be formatted in a way that is\nconsumable by the library and pymysql. Look at `credentials.py.example`\nand create a copy called `credentials.py` with the relevant information\nfilled in. The production version of this code also requires English Wikipedia\nAPI credentials for automatically editing and updating\n[tables like this one](https://en.wikipedia.org/wiki/User:WP_1.0_bot/Tables/Project/Catholicism).\nCurrently, if your environment is DEVELOPMENT, jobs that utilize the API\nto edit Wikipedia are disabled. There is no development wiki that gets edited\nat this time.\n\nThe \"development\" credentials files, `credentials.py.dev` and\n`credentials.py.dev.example` are for running the docker graph of development\nresources. They are copied into the docker container that is run when using\n`docker-compose-dev.yml`.\n\nThe `credentials.py` file proper also contains a section for TEST database\ncredentials. These are used in unit tests. If you use the database provided\nin `docker-compose-test.yml` you can copy these directly from the example\nfile. However, you are free to provide your own test database that will\nbe destroyed after every test run. See the next section on running the tests.\n\n### Running the backend (Python/pytest) tests\n\nThe backend/python tests require a MariaDB or MySQL instance to connect to in\norder to verify various statements and logic. This database does not need to be\npersistent and in fact part of the test setup and teardown is to recreate (destroy)\na fresh schema for the test databases each time. You also will need two databases\nin your server: `enwp10_test` and `enwikip_test`. They can use default settings\nand be empty. **If you've followed the steps under 'Development' below to\ncreate a running dev database with docker-compose, you're all set.**\n\nIf you have that, and you've already installed the requirements above,\nyou should be able to simply run the following command from this\ndirectory to run the tests:\n\n```bash\npipenv run pytest\n```\n\n### Running the frontend (Cypress) integration tests\n\nFor frontend tests, you need to have a full working local development\nenvironment. You should follow the steps in 'Installation' above, as well as the\nsteps in 'Development' below. Your frontend should be running on port 5173 (the\ndefault) and the backend should be on port 5000 (also the default).\n\nTo run the tests:\n\n```bash\ncd wp1-frontend\n$(yarn bin)/cypress run\n```\n\nThen follow the GUI prompts to run \"Electron E2E tests\".\n\n# Development\n\nFor development, you will need to have Docker installed as explained above.\n\n## Running docker-compose\n\nThere is a Docker setup for a development database. It lives in\n`docker-compose-dev.yml`.\n\nBefore you run the docker-compose command below, you must copy the file\n`wp1/credentials.py.dev.example` to `wp1/credentials.py.dev` and fill out the\nsection for `STORAGE`, if you wish to properly materialize builder lists into\nbackend selections.\n\nAfter that is done, use the following command to run the dev environment:\n\n```bash\ndocker-compose -f docker-compose-dev.yml up -d\n```\n\n## Migrating and updating the dev database.\n\nSee the instructions in the associated [README file](https://github.com/openzim/wp1/blob/main/docker/dev-db/README.md)\n\n## Starting the API server\n\nUsing pipenv, you can start the API server with:\n\n```bash\npipenv run flask --app wp1.web.app --debug run\n```\n\nIf you're having difficulties connecting to the backend server from the\nfrontend, especially in cypress e2e tests, and espcially on macOS, it might have\nsomething to do with IPv4 versus IPv6 networking stacks. You can try adding the\noption `--host 127.0.0.1` to the command line above (see\nhttps://github.com/openzim/wp1/pull/859).\n\n## Starting the web frontend\n\nAssuming you've installed the frontend deps (`yarn install`), the web frontend\ncan be started with the following command in the `wp1-frontend` directory:\n\n```bash\nyarn dev\n```\n\n## Development credentials.py\n\nThe DEVELOPMENT section of credentials.py.example is already filled out with\nthe proper values for the servers listed in docker-compose-dev.yml. You should\nbe able to simply copy it to credentials.py.\n\nIf you wish to connect to a wiki replica database on toolforge, you will need\nto fill out your credentials in WIKIDB section. This is not required for\ndeveloping the frontend.\n\n## Running a ZIM Farm\n\nIf you wish to run a ZIM Farm instance for testing purposes, the easiest way is to \nclone the zimfarm repository and then setup a development instance of it:\n\n```bash\ngit clone https://github.com/openzim/zimfarm.git\ncd zimfarm/dev\ndocker compose -p zimfarm up -d\n```\n\nFor detailed setup instructions, refer to `dev/README.md` in the zimfarm repository.\nThe `ZIMFARM` section in your `credentials.py` file contains pre-configured default\nvalues for the development instance. If you encounter connection issues, verify\nthese credentials match your local setup.\n\n## Development overlay\n\nThe API server has a built-in development overlay, currently used for manual\nupdate endpoints. What this means is that the endpoints defined in\n`wp1.web.dev.projects` are used with priority, instead of the production endpoints,\n**only if the credentials.py ENV == Environment.DEVELOPMENT**. This is to allow\nfor easier manual and CI testing of the manual update page.\n\nIf you wish to test the manual update job with a real Wikipedia replica database\nand RQ jobs, you will have to disable this overlay. The easiest way would be to\nchange the following line in wp1.web.app:\n\n```\n  if ENV == environment.Environment.DEVELOPMENT:\n    # In development, override some project endpoints, mostly manual\n    # update, to provide an easier env for developing the frontend.\n    print('DEVELOPMENT: overlaying dev_projects blueprint. '\n          'Some endpoints will be replaced with development versions')\n    app.register_blueprint(dev_projects, url_prefix='/v1/projects')\n```\n\nto something like:\n\n```\n  if false:  # false while manually testing\n    # In development, override some project endpoints, mostly manual\n    ...\n```\n\n# Building/editing the docs\n\nDocumentation lives at [Read the Docs](https://wp1.readthedocs.io/en/latest/). It is\nbuilt using [mkdocs](https://www.mkdocs.org/). The Read the Docs site automatically\nmonitors the WP1 github HEAD and re-builds the documentation on every push.\n\n## Local docs\n\nIf you are editing the docs and would like to view them locally before pushing:\n\n```bash\n$ cd docs\n$ python -m venv venv\n$ source venv/bin/activate\n$ pip install -r requirements.txt\n$ cd ..\n$ mkdocs serve\n```\n\nThe `serve` command should print out the port to view the docs at, likely localhost:8000.\n\n# Updating production\n\n- Push to the release branch of the github repository:\n  - `git checkout main`\n  - `git pull origin main`\n  - `git checkout release`\n  - `git merge main`\n  - `git push origin release`\n- Wait for the release images [to be built](https://github.com/openzim/wp1/actions/workflows/publish.yml)\n- Log in to the box that contains the production docker images. It is\n  called mwcurator.\n- `cd /data/code/wp1/`\n- `sudo git pull origin main`\n- Pull the docker images from docker hub:\n  - `sudo docker pull ghcr.io/openzim/wp1-workers:release`\n  - `sudo docker pull ghcr.io/openzim/wp1-web:release`\n  - `sudo docker pull ghcr.io/openzim/wp1-frontend:release`\n- If you've made changes to the format or contents of `credentials.py`, update `/data/wp1bot/credentials.py`.\n- Run docker-compose to bring the production images online.\n  - `sudo docker-compose up -d`\n- Run the production database migrations in the worker container:\n  - `sudo docker exec -ti -e PYTHONPATH=. wp1bot-workers yoyo -c /usr/src/app/db/production/yoyo.ini apply`\n\n# Pre-commit hooks\n\nThis project is configured to use git pre-commit hooks managed by the\nPython program `pre-commit` ([website](https://pre-commit.com/)). Pre-\ncommit checks let us ensure that the code is properly formatted with\n[yapf](https://github.com/google/yapf) amongst other things.\n\nIf you've installed the requirements for this repository, the pre-commit\nbinary should be available to you. To install the hooks, use:\n\n```bash\npre-commit install\n```\n\nThen, when you try to commit a change that would fail pre-commit, you get:\n\n```\n(venv) host:wikimedia_wp1_bot audiodude$ git commit -am 'Test commit'\nTrim Trailing Whitespace.................................................Passed\nFix End of Files.........................................................Passed\nyapf.....................................................................Failed\nhookid: yapf\n```\n\nFrom there, the pre-commit hook will have modified and thus unstaged some or all\nof the files you were trying to commit. Look through the changes to make sure\nthey are sane, then re-add them with git add, before trying your commit again.\n\n# License\n\nGPLv2 or later, see [LICENSE](LICENSE) for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenzim%2Fwp1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenzim%2Fwp1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenzim%2Fwp1/lists"}