{"id":19794195,"url":"https://github.com/openzim/wp1_selection_tools","last_synced_at":"2025-05-01T02:30:59.985Z","repository":{"id":5280899,"uuid":"50768077","full_name":"openzim/wp1_selection_tools","owner":"openzim","description":"Create selections with the best articles of a WM project","archived":false,"fork":false,"pushed_at":"2025-01-10T14:49:52.000Z","size":7547,"stargazers_count":6,"open_issues_count":11,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-06T08:01:54.528Z","etag":null,"topics":["selection","wikipedia","wp1"],"latest_commit_sha":null,"homepage":"https://download.kiwix.org/wp1/","language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openzim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":"kiwix","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2016-01-31T10:01:52.000Z","updated_at":"2025-01-31T07:23:56.000Z","dependencies_parsed_at":"2023-01-11T16:37:27.518Z","dependency_job_id":null,"html_url":"https://github.com/openzim/wp1_selection_tools","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fwp1_selection_tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fwp1_selection_tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fwp1_selection_tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fwp1_selection_tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openzim","download_url":"https://codeload.github.com/openzim/wp1_selection_tools/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251812307,"owners_count":21647884,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["selection","wikipedia","wp1"],"created_at":"2024-11-12T07:12:29.498Z","updated_at":"2025-05-01T02:30:58.552Z","avatar_url":"https://github.com/openzim.png","language":"Perl","readme":"The **WP1 Selection tools** gather and compile multiple indicators to\nprovide [Wikipedia](http://wikipedia.org) article subset\nselections. It has been created for the [Wikipedia\n1.0](https://en.wikipedia.org/wiki/Wikipedia:1) project and is\ncomplementary of the [WP1 engine](https://github.com/openzim/wp1).\n\nThe results are made available at\n[https://download.openzim.org/wp1](https://download.openzim.org/wp1).\n\n[![CodeFactor](https://www.codefactor.io/repository/github/openzim/wp1_selection_tools/badge)](https://www.codefactor.io/repository/github/openzim/wp1_selection_tools)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n\nRequirements\n------------\n\nTo run it, you need:\n* MANDATORY: a GNU/Linux system\n* MANDATORY: an access to Internet\n* MANDATORY: an access to a Wikipedia database\n* OPTION: an access to enwp10 rating database for Wikipedia in English\n\nContext\n-------\n\nMany Wikipedias, in different languages, have more than 500.000\narticles and even if we can provide offline versions with a\nreasonnable size, this is still too much for many devices. That's why\nwe need to build offline versions with only a selections with the TOP\nbest articles.\n\nPrinciple\n---------\n\nThis tool builds lists of key values (pageviews, links, ...) about\nWikipedia articles and put them in a directory. These key values are\neverything we have as input to build smart selection algorithms. To\nget more detalis about the list, read the README in the language based\ndirectory.\n\nTools\n-----\n\n* build_biggest_wikipedia_list.sh give you the list of all\n  wikipedia/languages with more than 500.000 entries.\n\n* build_selections.sh takes a language code ('en' for example) as first\n  argument and create the directory with all the key values.\n\n* build_all_selections.sh to build/upload lists for all Wikipedia with\n  more than 500.000 pages.\n\n* build_en_vital_articles_list.sh generates a the list Wikipedia in\n  English vital articles\n  (https://en.wikipedia.org/wiki/Wikipedia:Vital_articles)\n\n* build_custom_selections.sh generates selections which need custom\n  (non-standard) handling.\n\n* build_projects_lists.pl generates the lists for projects with\n  articles sorted (reverse order) by scores. Works only for Wikipedia\n  in English.\n\n* build_translated_list.pl translates a list in the given language\n  based on Wikipedia in English language links and local language\n  scores.\n\nDownload\n--------\n\nYou can download the output of that scripts directly from\ndownload.kiwix.org/wp1/ using FTP, HTTP(s) or rsync.\n\nYou might be interested by downloading only the last version, here is\na small command (based on rsync) to retrieve the right directory name.\n\n```bash\nfor ENTRY in $(rsync --recursive --list-only download.kiwix.org::download.kiwix.org/wp1/ | tr -s ' ' | cut -d ' ' -f5 | grep wiki | grep -v '/' | sort -r)\ndo\n    RADICAL=`echo $ENTRY | sed 's/_20[0-9][0-9]-[0-9][0-9]//g'`;\n    if [[ $LAST != $RADICAL ]]\n    then\n        echo $ENTRY\n        LAST=$RADICAL\n    fi\ndone\n```\n\nVPS\n---\n\nTo run it on VPS via Docker:\n\n```bash\ndocker run -d --name wp1_selection_tools\n  -v /srv/wp1_selection_tools/data:/data \\\n  -v /srv/wp1_selection_tools/.ssh/:/root/.ssh \\\n  -v /srv/wp1_selection_tools/replica.my.cnf:/root/replica.my.cnf \\\n  ghcr.io/openzim/wp1_selection_tools\n```\n\nLicense\n-------\n\n[GPLv3](https://www.gnu.org/licenses/gpl-3.0) or later, see\n[LICENSE](LICENSE) for more details.\n","funding_links":["https://github.com/sponsors/kiwix"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenzim%2Fwp1_selection_tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenzim%2Fwp1_selection_tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenzim%2Fwp1_selection_tools/lists"}