{"id":13994195,"url":"https://github.com/openzim/sotoki","last_synced_at":"2025-05-15T18:03:35.643Z","repository":{"id":36693545,"uuid":"41000050","full_name":"openzim/sotoki","owner":"openzim","description":"StackExchange websites to ZIM scraper","archived":false,"fork":false,"pushed_at":"2024-11-01T10:03:36.000Z","size":2703,"stargazers_count":227,"open_issues_count":23,"forks_count":28,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-03-31T23:33:40.051Z","etag":null,"topics":["scraper","stackexchange","stackoverflow","zim"],"latest_commit_sha":null,"homepage":"https://library.kiwix.org/?category=stack_exchange","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openzim.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"kiwix","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2015-08-18T22:12:07.000Z","updated_at":"2025-02-22T14:23:26.000Z","dependencies_parsed_at":"2024-01-18T05:10:00.710Z","dependency_job_id":"d95cd3cd-7b63-4cba-b8c9-618cbd8f2fa2","html_url":"https://github.com/openzim/sotoki","commit_stats":{"total_commits":717,"total_committers":16,"mean_commits":44.8125,"dds":0.7210599721059971,"last_synced_commit":"26716426acb9107be017b306e354188241c11584"},"previous_names":[],"tags_count":40,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fsotoki","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fsotoki/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fsotoki/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openzim%2Fsotoki/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openzim","download_url":"https://codeload.github.com/openzim/sotoki/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247755557,"owners_count":20990620,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["scraper","stackexchange","stackoverflow","zim"],"created_at":"2024-08-09T14:02:45.412Z","updated_at":"2025-04-08T00:35:05.639Z","avatar_url":"https://github.com/openzim.png","language":"Python","funding_links":["https://github.com/sponsors/kiwix"],"categories":["Python"],"sub_categories":[],"readme":"Sotoki\n======\n\n`Sotoki` (*Stack Overflow to Kiwix*) is an\n[openZIM](https://github.com/openzim) scraper to create offline\nversions of [Stack Exchange](https://stackexchange.com) websites such\nas [Stack Overflow](https://stackoverflow.com/).\n\nIt is based on Stack Exchange's Data Dumps hosted by [The Internet\nArchive](https://archive.org/download/stackexchange/).\n\n[![CodeFactor](https://www.codefactor.io/repository/github/openzim/sotoki/badge)](https://www.codefactor.io/repository/github/openzim/sotoki)\n[![Docker](https://ghcr-badge.egpl.dev/openzim/sotoki/latest_tag?label=docker)](https://ghcr.io/openzim/sotoki)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/sotoki.svg)](https://pypi.org/project/sotoki/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sotoki.svg)](https://pypi.org/project/sotoki)\n\n## Usage\n\n`Sotoki` works off a `domain` that you must provide. That is the\ndomain-name of the stackexchange website you want to scrape. Run\n`sotoki --list-all` to get a list of those\n\n### Docker\n\n```bash\ndocker run -v my_dir:/output ghcr.io/openzim/sotoki sotoki --help\n```\n\n### Installation\n\n`sotoki` is a Python3 software. If you are not using the\n[Docker](https://ghcr.io/openzim/sotoki/) image, you are advised to use it in a\nvirtual environment to avoid installing software dependencies on your\nsystem.\n\n```sh\npython3 -m venv ./env  # creates a virtual python environment in ./env folder\n./env/bin/pip install -U pip  # upgrade pip (package manager). recommended\n./env/bin/pip install -U sotoki  # install/upgrade sotoki inside virtualenv\n\n# direct access to in-virtualenv sotoki binary, without shell-attachment\n./env/bin/sotoki --help\n# alias or link it for convenience\nsudo ln -s $(pwd)/env/bin/sotoki /usr/local/bin/\n\n# alternatively, attach virtualenv to shell\nsource env/bin/activate\nsotoki --help\ndeactivate  # unloads virtualenv from shell\n```\n\n## Developers\n\nAnybody is welcome to improve the Sotoki.\n\nTo run Sotoki off the git repository, you'll need to download a few\nexternal dependencies that we pack in Python releases. Just run\n`python src/sotoki/dependencies.py`.\n\nSee `requirements.txt` for the list of python dependencies.\n\n## Users\n\nYou don't have to make your own ZIM files of Stack Exchange's Web \nsites. Updated ZIM files are built on a regular basis for all \nof them. Look at https://library.kiwix.org/?category=stack_exchange\nto download them.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenzim%2Fsotoki","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenzim%2Fsotoki","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenzim%2Fsotoki/lists"}