{"id":13586090,"url":"https://github.com/openzim/zimfarm","last_synced_at":"2025-04-04T10:02:54.293Z","repository":{"id":39801710,"uuid":"85113832","full_name":"openzim/zimfarm","owner":"openzim","description":"Farm operated by bots to grow and harvest new zim files","archived":false,"fork":false,"pushed_at":"2025-03-24T13:54:34.000Z","size":6159,"stargazers_count":100,"open_issues_count":143,"forks_count":28,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-03-28T09:01:46.617Z","etag":null,"topics":["distributed-systems","docker-images","flask","python3","zim-files"],"latest_commit_sha":null,"homepage":"https://farm.openzim.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openzim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"kiwix","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2017-03-15T19:40:12.000Z","updated_at":"2025-03-26T16:15:30.000Z","dependencies_parsed_at":"2024-06-11T09:35:20.135Z","dependency_job_id":"4c4e4801-45f1-429d-a34e-0be1aa9165cd","html_url":"https://github.com/openzim/zimfarm","commit_stats":{"total_commits":904,"total_committers":17,"mean_commits":53.1764705882353,"dds":0.5907079646017699,"last_synced_commit":"7ae9cd0a3ea12ffb8bc37c3db5ae2e601d1dec33"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fzimfarm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fzimfarm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fzimfarm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fzimfarm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openzim","download_url":"https://codeload.github.com/openzim/zimfarm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247149510,"owners_count":20891954,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-systems","docker-images","flask","python3","zim-files"],"created_at":"2024-08-01T15:05:19.219Z","updated_at":"2025-04-04T10:02:54.259Z","avatar_url":"https://github.com/openzim.png","language":"Python","funding_links":["https://github.com/sponsors/kiwix"],"categories":["Python","flask"],"sub_categories":[],"readme":"# ZIM Farm\n\n[![Build Status](https://github.com/openzim/zimfarm/workflows/CI/badge.svg?query=branch%3Amain)](https://github.com/openzim/zimfarm/actions?query=branch%3Amain)\n[![CodeFactor](https://www.codefactor.io/repository/github/openzim/zimfarm/badge)](https://www.codefactor.io/repository/github/openzim/zimfarm)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![codecov](https://codecov.io/gh/openzim/zimfarm/branch/main/graph/badge.svg)](https://codecov.io/gh/openzim/zimfarm)\n\nThe ZIM farm (zimfarm) is a semi-decentralised software solution to\nbuild [ZIM files](http://www.openzim.org/) efficiently. This means scraping Web contents,\npackaging them into a ZIM file and uploading the result to an online\nZIM files repository.\n\n## How does it work?\n\nThe Zimfarm platform is a combination of different tools:\n\n### dispatcher\n\nThe [dispatcher](https://ghcr.io/openzim/zimfarm-dispatcher) is a central database and [API](https://api.farm.openzim.org/v1) that records *recipes* (metadata of ZIM to produce) and *tasks*. It includes a scheduler that decides when a ZIM file should be recreated (based on the recipe) and a dispatcher that creates and assigns *tasks* to *workers*.\n\n### frontend\n\nThe [frontend](https://ghcr.io/openzim/zimfarm-ui), available at [farm.openzim.org](https://farm.openzim.org/) is a simple consumer of the API.\n\nIt is used to create, clone and edit recipes, but also to monitor the evolution of tasks and *workers*.\n\nAnybody can use it in read-only mode.\n\n### workers\n\nWorkers are always-running computers which gets assigned ZIM creation tasks by the dispatcher. If you are interested in providing us worker resources, please [read these instructions](https://github.com/openzim/zimfarm/blob/main/workers/README.md).\n\nA worker is made of two software components:\n\n#### worker-manager\n\nThe [manager](https://ghcr.io/openzim/zimfarm-worker-manager) is responsible for declaring its available resources and configuration and receives tasks assigned to it by the dispatcher. It's a very-low resources container whose job is to spawn `task-worker` ones.\n\n#### task-worker\n\nThe [task-worker](https://ghcr.io/openzim/zimfarm-task-worker) is responsible for running a specific task. It's also a very-low resources container but contrary to the manager, one is spawned for each task assigned to the worker (the manager defines the concurrency based on resources).\n\nThe task-worker's role is to start and monitor the scraper's container for the task and to spawn uploader containers for both created ZIM files and logs.\n\n#### uploader\n\nThe [uploader](https://ghcr.io/openzim/uploader) is instantiated by the task-worker to upload, individually, each created ZIM files, as well as the scraper's container log.\n\nThe uploader supports both SCP and SFTP. We are currently using SFTP for all uploads due to a slight speed gain.\n\nUploader is very fast and convenient (can watch and resumes files) but works only off files at the moment.\n\n### receiver\n\nThe [receiver](https://ghcr.io/openzim/zimfarm-receiver) is a jailed OpenSSH-server that receives scraper logs and ZIM files and pass the latter through a quarantine via the [zimcheck](https://github.com/openzim/zim-tools) tool which eventually either put them aside (invalid ZIM) or move those to the [public download server](download.kiwix.org/zim/).\n\n### scrapers\n\nScrapers are the tools used to actually convert a *scraping request* (recorded in a Zimfarm recipe) into one or several ZIM files.\n\nThe most important one is the Mediawiki scraper, called [mwoffliner](https://ghcr.io/openzim/mwoffliner/) but there are many of them for Stack-Exchange, Project Gutenberg, PhET and others.\n\nScrapers are not part of the Zimfarm. Those are completely independent projects for which the requirements to integrate into the Zimfarm are minimal:\n\n* Works completely off a docker image\n* Arguments should be set on the command line\n* ZIM output folder should be settable via an argument\n\n# How do I request a ZIM file?\n\nZIM file requests are handled on [zim-requests](https://github.com/openzim/zim-requests/issues/new/choose) repository.\n\nIf there's already a scraper for the website you want to convert to ZIM, someone with editor access to the Zimfarm will create the recipe and in a few days, a ZIM file should be available.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenzim%2Fzimfarm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenzim%2Fzimfarm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenzim%2Fzimfarm/lists"}