{"id":19794186,"url":"https://github.com/openzim/python-scraperlib","last_synced_at":"2025-04-07T13:04:56.528Z","repository":{"id":41828449,"uuid":"237924365","full_name":"openzim/python-scraperlib","owner":"openzim","description":"Collection of Python code to re-use across Python-based scrapers","archived":false,"fork":false,"pushed_at":"2024-10-24T15:03:56.000Z","size":5908,"stargazers_count":19,"open_issues_count":24,"forks_count":16,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-10-25T03:25:00.405Z","etag":null,"topics":["library","python","webscraping","zim"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openzim.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"kiwix","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2020-02-03T09:05:32.000Z","updated_at":"2024-10-22T07:11:49.000Z","dependencies_parsed_at":"2024-02-12T09:48:28.468Z","dependency_job_id":"863abc5e-487a-43a6-9316-4ffb1a6cf332","html_url":"https://github.com/openzim/python-scraperlib","commit_stats":{"total_commits":249,"total_committers":6,"mean_commits":41.5,"dds":"0.46586345381526106","last_synced_commit":"fa7b42107319dd722e1a7698931fc742d6d6d5b8"},"previous_names":["openzim/python_scraperlib"],"tags_count":41,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fpython-scraperlib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fpython-scraperlib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fpython-scraperlib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fpython-scraperlib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openzim","download_url":"https://codeload.github.com/openzim/python-scraperlib/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247657276,"owners_count":20974344,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["library","python","webscraping","zim"],"created_at":"2024-11-12T07:12:27.113Z","updated_at":"2025-04-07T13:04:56.491Z","avatar_url":"https://github.com/openzim.png","language":"Python","funding_links":["https://github.com/sponsors/kiwix"],"categories":[],"sub_categories":[],"readme":"# zimscraperlib\n\n[![Build Status](https://github.com/openzim/python-scraperlib/workflows/CI/badge.svg?query=branch%3Amain)](https://github.com/openzim/python-scraperlib/actions?query=branch%3Amain)\n[![CodeFactor](https://www.codefactor.io/repository/github/openzim/python-scraperlib/badge)](https://www.codefactor.io/repository/github/openzim/python-scraperlib)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/zimscraperlib.svg)](https://pypi.org/project/zimscraperlib/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/zimscraperlib.svg)](https://pypi.org/project/zimscraperlib)\n[![codecov](https://codecov.io/gh/openzim/python-scraperlib/branch/master/graph/badge.svg)](https://codecov.io/gh/openzim/python-scraperlib)\n[![Read the Docs](https://img.shields.io/readthedocs/python-scraperlib)](https://python-scraperlib.readthedocs.io/)\n\nCollection of python code to re-use across python-based scrapers\n\n# Usage\n\n- This library is meant to be installed via PyPI ([`zimscraperlib`](https://pypi.org/project/zimscraperlib/)).\n- Make sure to reference it using a version code as the API is subject to frequent changes.\n- API should remain the same only within the same _minor_ version.\n\nExample usage:\n\n```pip\nzimscraperlib\u003e=1.1,\u003c1.2\n```\n\nSee documentation at [Read the Docs](https://python-scraperlib.readthedocs.io/) for details.\n\n# Dependencies\n\n- libmagic\n- wget\n- libzim (auto-installed, not available on Windows)\n- Pillow\n- FFmpeg\n- gifsicle (\u003e=1.92)\n- libcairo (if you use the image manipulation, this is used for svg conversion)\n\n## macOS\n\n```sh\nbrew install libmagic wget libtiff libjpeg webp little-cms2 ffmpeg gifsicle\n```\n\n## Linux\n\n```sh\nsudo apt install libmagic1 wget ffmpeg \\\n    libtiff5-dev libjpeg8-dev libopenjp2-7-dev zlib1g-dev \\\n    libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python3-tk \\\n    libharfbuzz-dev libfribidi-dev libxcb1-dev gifsicle\n```\n\n## Alpine\n\n```\napk add ffmpeg gifsicle libmagic wget libjpeg\n```\n\n# Contribution\n\nThis project adheres to openZIM's [Contribution Guidelines](https://github.com/openzim/overview/wiki/Contributing).\n\nThis project has implemented openZIM's [Python bootstrap, conventions and policies](https://github.com/openzim/_python-bootstrap/docs/Policy.md) **v1.0.2**.\n\n```shell\npip install hatch\npip install \".[dev]\"\npre-commit install\n# For tests\ninvoke coverage\n```\n\n# Users\n\nNon-exhaustive list of scrapers using it (check status when updating API):\n\n- [openzim/freecodecamp](https://github.com/openzim/freecodecamp)\n- [openzim/gutenberg](https://github.com/openzim/gutenberg)\n- [openzim/ifixit](https://github.com/openzim/ifixit)\n- [openzim/kolibri](https://github.com/openzim/kolibri)\n- [openzim/nautilus](https://github.com/openzim/nautilus)\n- [openzim/nautilus](https://github.com/openzim/nautilus)\n- [openzim/openedx](https://github.com/openzim/openedx)\n- [openzim/sotoki](https://github.com/openzim/sotoki)\n- [openzim/ted](https://github.com/openzim/ted)\n- [openzim/warc2zim](https://github.com/openzim/warc2zim)\n- [openzim/wikihow](https://github.com/openzim/wikihow)\n- [openzim/youtube](https://github.com/openzim/youtube)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenzim%2Fpython-scraperlib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenzim%2Fpython-scraperlib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenzim%2Fpython-scraperlib/lists"}