{"id":13994193,"url":"https://github.com/openzim/python-libzim","last_synced_at":"2025-05-16T14:08:42.614Z","repository":{"id":37814977,"uuid":"248276881","full_name":"openzim/python-libzim","owner":"openzim","description":"Libzim binding for Python: read/write ZIM files in Python","archived":false,"fork":false,"pushed_at":"2025-03-26T10:35:15.000Z","size":28195,"stargazers_count":80,"open_issues_count":12,"forks_count":27,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-12T12:55:05.968Z","etag":null,"topics":["binding","library","libzim","offline","python","webscraping"],"latest_commit_sha":null,"homepage":"https://python-libzim.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openzim.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"kiwix","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2020-03-18T15:55:46.000Z","updated_at":"2025-04-01T06:28:54.000Z","dependencies_parsed_at":"2023-11-14T16:44:18.879Z","dependency_job_id":"68a54630-e85b-4859-981c-1f2fcf1520cc","html_url":"https://github.com/openzim/python-libzim","commit_stats":{"total_commits":255,"total_committers":13,"mean_commits":"19.615384615384617","dds":0.592156862745098,"last_synced_commit":"cd8766fb89ed1c8a6af88158cc6cdb1e40969f23"},"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fpython-libzim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fpython-libzim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fpython-libzim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fpython-libzim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openzim","download_url":"https://codeload.github.com/openzim/python-libzim/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254544146,"owners_count":22088807,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binding","library","libzim","offline","python","webscraping"],"created_at":"2024-08-09T14:02:45.384Z","updated_at":"2025-05-16T14:08:37.607Z","avatar_url":"https://github.com/openzim.png","language":"Python","funding_links":["https://github.com/sponsors/kiwix"],"categories":["Python"],"sub_categories":[],"readme":"# python-libzim\n\n`libzim` module allows you to read and write [ZIM\nfiles](https://openzim.org) in Python. It provides a shallow python\ninterface on top of the [C++ `libzim` library](https://github.com/openzim/libzim).\n\nIt is primarily used in [openZIM](https://github.com/openzim/) scrapers like [`sotoki`](https://github.com/openzim/sotoki) or [`youtube2zim`](https://github.com/openzim/youtube).\n\n[![Build Status](https://github.com/openzim/python-libzim/workflows/test/badge.svg?query=branch%3Amain)](https://github.com/openzim/python-libzim/actions?query=branch%3Amain)\n[![CodeFactor](https://www.codefactor.io/repository/github/openzim/python-libzim/badge)](https://www.codefactor.io/repository/github/openzim/python-libzim)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/libzim.svg)](https://pypi.org/project/libzim/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/libzim.svg)](https://pypi.org/project/libzim)\n[![codecov](https://codecov.io/gh/openzim/python-libzim/branch/main/graph/badge.svg)](https://codecov.io/gh/openzim/python-libzim)\n[![Read the Docs](https://img.shields.io/readthedocs/python-libzim)](https://python-libzim.readthedocs.io/)\n\n## Installation\n\n```sh\npip install libzim\n```\n\nOur [PyPI wheels](https://pypi.org/project/libzim/) bundle a [recent release](https://download.openzim.org/release/libzim/) of the C++ libzim and are available for the following platforms:\n\n- macOS for `x86_64` and `arm64`\n- GNU/Linux for `x86_64`, `armhf` and `aarch64`\n- Linux+musl for `x86_64` and `aarch64`\n- Windows for `x64`\n\nWheels are available for CPython only (but can be built for Pypy).\n\nUsers on other platforms can install the source distribution (see [Building](#Building) below). \n\n\n## Contributions\n\n```sh\ngit clone git@github.com:openzim/python-libzim.git \u0026\u0026 cd python-libzim\n# hatch run test:coverage\n```\n\nSee [CONTRIBUTING.md](./CONTRIBUTING.md) for additional details then [Open a ticket](https://github.com/openzim/python-libzim/issues/new) or submit a Pull Request on Github 🤗!\n\n## Usage\n\n### Read a ZIM file\n\n```python\nfrom libzim.reader import Archive\nfrom libzim.search import Query, Searcher\nfrom libzim.suggestion import SuggestionSearcher\n\nzim = Archive(\"test.zim\")\nprint(f\"Main entry is at {zim.main_entry.get_item().path}\")\nentry = zim.get_entry_by_path(\"home/fr\")\nprint(f\"Entry {entry.title} at {entry.path} is {entry.get_item().size}b.\")\nprint(bytes(entry.get_item().content).decode(\"UTF-8\"))\n\n# searching using full-text index\nsearch_string = \"Welcome\"\nquery = Query().set_query(search_string)\nsearcher = Searcher(zim)\nsearch = searcher.search(query)\nsearch_count = search.getEstimatedMatches()\nprint(f\"there are {search_count} matches for {search_string}\")\nprint(list(search.getResults(0, search_count)))\n\n# accessing suggestions\nsearch_string = \"kiwix\"\nsuggestion_searcher = SuggestionSearcher(zim)\nsuggestion = suggestion_searcher.suggest(search_string)\nsuggestion_count = suggestion.getEstimatedMatches()\nprint(f\"there are {suggestion_count} matches for {search_string}\")\nprint(list(suggestion.getResults(0, suggestion_count)))\n```\n\n### Write a ZIM file\n\n```py\nimport base64\nimport pathlib\n\nfrom libzim.writer import Creator, Item, StringProvider, FileProvider, Hint\n\n\nclass MyItem(Item):\n    def __init__(self, title, path, content=\"\", fpath=None):\n        super().__init__()\n        self.path = path\n        self.title = title\n        self.content = content\n        self.fpath = fpath\n\n    def get_path(self):\n        return self.path\n\n    def get_title(self):\n        return self.title\n\n    def get_mimetype(self):\n        return \"text/html\"\n\n    def get_contentprovider(self):\n        if self.fpath is not None:\n            return FileProvider(self.fpath)\n        return StringProvider(self.content)\n\n    def get_hints(self):\n        return {Hint.FRONT_ARTICLE: True}\n\n\ncontent = \"\"\"\u003chtml\u003e\u003chead\u003e\u003cmeta charset=\"UTF-8\"\u003e\u003ctitle\u003eWeb Page Title\u003c/title\u003e\u003c/head\u003e\n\u003cbody\u003e\u003ch1\u003eWelcome to this ZIM\u003c/h1\u003e\u003cp\u003eKiwix\u003c/p\u003e\u003c/body\u003e\u003c/html\u003e\"\"\"\n\npathlib.Path(\"home-fr.html\").write_text(\n    \"\"\"\u003chtml\u003e\u003chead\u003e\u003cmeta charset=\"UTF-8\"\u003e\n    \u003ctitle\u003eBonjour\u003c/title\u003e\u003c/head\u003e\n    \u003cbody\u003e\u003ch1\u003ethis is home-fr\u003c/h1\u003e\u003c/body\u003e\u003c/html\u003e\"\"\"\n)\n\nitem = MyItem(\"Hello Kiwix\", \"home\", content)\nitem2 = MyItem(\"Bonjour Kiwix\", \"home/fr\", None, \"home-fr.html\")\n\n# illustration = pathlib.Path(\"icon48x48.png\").read_bytes()\nillustration = base64.b64decode(\n    \"iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAQMAAABtzGvEAAAAGXRFWHRTb2Z0d2FyZQBB\"\n    \"ZG9iZSBJbWFnZVJlYWR5ccllPAAAAANQTFRFR3BMgvrS0gAAAAF0Uk5TAEDm2GYAAAAN\"\n    \"SURBVBjTY2AYBdQEAAFQAAGn4toWAAAAAElFTkSuQmCC\"\n)\n\nwith Creator(\"test.zim\").config_indexing(True, \"eng\") as creator:\n    creator.set_mainpath(\"home\")\n    creator.add_item(item)\n    creator.add_item(item2)\n    creator.add_illustration(48, illustration)\n    for name, value in {\n        \"creator\": \"python-libzim\",\n        \"description\": \"Created in python\",\n        \"name\": \"my-zim\",\n        \"publisher\": \"You\",\n        \"title\": \"Test ZIM\",\n        \"language\": \"eng\",\n        \"date\": \"2024-06-30\",\n    }.items():\n\n        creator.add_metadata(name.title(), value)\n```\n\n#### Thread safety\n\n\u003e The reading part of the libzim is most of the time thread safe. Searching and creating part are not. [libzim documentation](https://libzim.readthedocs.io/en/latest/usage.html#introduction)\n\n`python-libzim` disables the [GIL](https://wiki.python.org/moin/GlobalInterpreterLock) on most of C++ libzim calls. You **must prevent concurrent access** yourself. This is easily done by wrapping all creator calls with a [`threading.Lock()`](https://docs.python.org/3/library/threading.html#lock-objects)\n\n```py\nlock = threading.Lock()\nwith Creator(\"test.zim\") as creator:\n\n    # Thread #1\n    with lock:\n        creator.add_item(item1)\n\n    # Thread #2\n    with lock:\n        creator.add_item(item2)\n```\n\n#### Type hints\n\n`libzim` being a binary extension, there is no Python source to provide types information. We provide them as type stub files. When using `pyright`, you would normally receive a warning when importing from `libzim` as there could be discrepencies between actual sources and the (manually crafted) stub files.\n\nYou can disable the warning via `reportMissingModuleSource = \"none\"`.\n\n## Building\n\n`libzim` package building offers different behaviors via environment variables\n\n| Variable                         | Example                                  | Use case |\n| -------------------------------- | ---------------------------------------- | -------- |\n| `LIBZIM_DL_VERSION`              | `8.1.1` or `2023-04-14`                     | Specify the C++ libzim binary version to download and bundle. Either a release version string or a date, in which case it downloads a nightly |\n| `USE_SYSTEM_LIBZIM`              | `1`                                      | Uses `LDFLAG` and `CFLAGS` to find the libzim to link against. Resulting wheel won't bundle C++ libzim. |\n| `DONT_DOWNLOAD_LIBZIM`           | `1`                                      | Disable downloading of C++ libzim. Place headers in `include/` and libzim dylib/so in `libzim/` if no using system libzim. It will be bundled in wheel. |\n| `PROFILE`                        | `0`                                      | Enable profile tracing in Cython extension. Required for Cython code coverage reporting. |\n| `SIGN_APPLE`                     | `1`                                      | Set to sign and notarize the extension for macOS. Requires following informations |\n| `APPLE_SIGNING_IDENTITY`         | `Developer ID Application: OrgName (ID)` | Required for signing on macOS |\n| `APPLE_SIGNING_KEYCHAIN_PATH`    | `/tmp/build.keychain`                    | Path to the Keychain containing the certificate to sign for macOS with |\n| `APPLE_SIGNING_KEYCHAIN_PROFILE` | `build`                                  | Name of the profile in the specified Keychain |\n\n\n### Building on Windows\n\nOn Windows, built wheels needs to be fixed post-build to move the bundled DLLs (libzim and libicu)\nnext to the wrapper (Windows does not support runtime path).\n\nAfter building you wheel, run\n\n```ps\npython setup.py repair_win_wheel --wheel=dist/xxx.whl --destdir wheels\\\n```\n\nSimilarily, if you install as editable (`pip install -e .`), you need to place those DLLs at the root\nof the repo.\n\n```ps\nMove-Item -Force -Path .\\libzim\\*.dll -Destination .\\\n```\n\n### Examples\n\n##### Default: downloading and bundling most appropriate libzim release binary\n\n```sh\npython3 -m build\n```\n\n#### Using system libzim (brew, debian or manually installed) - not bundled\n\n```sh\n# using system-installed C++ libzim\nbrew install libzim  # macOS\napt-get install libzim-devel  # debian\ndnf install libzim-dev  # fedora\nUSE_SYSTEM_LIBZIM=1 python3 -m build --wheel\n\n# using a specific C++ libzim\nUSE_SYSTEM_LIBZIM=1 \\\nCFLAGS=\"-I/usr/local/include\" \\\nLDFLAGS=\"-L/usr/local/lib\"\nDYLD_LIBRARY_PATH=\"/usr/local/lib\" \\\nLD_LIBRARY_PATH=\"/usr/local/lib\" \\\npython3 -m build --wheel\n```\n\n#### Other platforms\n\nOn platforms for which there is no [official binary](https://download.openzim.org/release/libzim/) available, you'd have to [compile C++ libzim from source](https://github.com/openzim/libzim) first then either use `DONT_DOWNLOAD_LIBZIM` or `USE_SYSTEM_LIBZIM`.\n\n\n## License\n\n[GPLv3](https://www.gnu.org/licenses/gpl-3.0) or later, see\n[LICENSE](LICENSE) for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenzim%2Fpython-libzim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenzim%2Fpython-libzim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenzim%2Fpython-libzim/lists"}