{"id":34109111,"url":"https://github.com/saveweb/dokuwiki-dumper","last_synced_at":"2026-04-08T19:32:58.088Z","repository":{"id":116028108,"uuid":"601300133","full_name":"saveweb/dokuwiki-dumper","owner":"saveweb","description":"A tool for archiving DokuWiki","archived":false,"fork":false,"pushed_at":"2026-01-30T00:10:42.000Z","size":362,"stargazers_count":28,"open_issues_count":5,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-03-29T16:55:47.084Z","etag":null,"topics":["archive","dokuwiki","internet-archive"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/dokuwikidumper","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saveweb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-02-13T19:27:39.000Z","updated_at":"2026-02-19T11:57:15.000Z","dependencies_parsed_at":"2023-12-04T12:25:39.386Z","dependency_job_id":"b731c766-d1ac-4f7a-8cbc-2ae8392b2249","html_url":"https://github.com/saveweb/dokuwiki-dumper","commit_stats":null,"previous_names":[],"tags_count":51,"template":false,"template_full_name":null,"purl":"pkg:github/saveweb/dokuwiki-dumper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saveweb%2Fdokuwiki-dumper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saveweb%2Fdokuwiki-dumper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saveweb%2Fdokuwiki-dumper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saveweb%2Fdokuwiki-dumper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saveweb","download_url":"https://codeload.github.com/saveweb/dokuwiki-dumper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saveweb%2Fdokuwiki-dumper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31571600,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archive","dokuwiki","internet-archive"],"created_at":"2025-12-14T18:26:36.708Z","updated_at":"2026-04-08T19:32:58.073Z","avatar_url":"https://github.com/saveweb.png","language":"Python","readme":"# DokuWiki Dumper\n\n![Dynamic JSON Badge](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Farchive.org%2Fadvancedsearch.php%3Fq%3Dsubject%3AdokuWikiDumper%26rows%3D1%26page%3D1%26output%3Djson\u0026query=%24.response.numFound\u0026label=DokuWiki%20Dumps%40IA)\n[![PyPI version](https://badge.fury.io/py/dokuwikidumper.svg)](https://badge.fury.io/py/dokuwikidumper)\n\n\n\u003e A tool for archiving DokuWiki.\n\nRecommend using `dokuWikiDumper` on _modern_ filesystems, such as `ext4` or `btrfs`. `NTFS` is not recommended because it denies many special characters in the filename.\n\n# For webmaster\n\nWe crawl every MediaWiki site (with 1.5s crawl-delay) every year and upload to the Internet Archive. If you don’t want your wiki to be archived, add the following to your `\u003cdomain\u003e/robots.txt`:\n\n```robots.txt\nUser-agent: dokuWikiDumper\nDisallow: /\n```\n\nOur bots are running on the following IPs: [wikiteam3.txt](https://static.saveweb.org/bots_ips/wikiteam3.txt) (ips, contact) | [wikiteam3.ips.txt](https://static.saveweb.org/bots_ips/wikiteam3.ips.txt) (ips)\n\n## Requirements\n\n### dokuWikiDumper\n\n- Python 3.8+ (developed on py3.10)\n- beautifulsoup4\n- requests\n- lxml\n- rich\n\n### dokuWikiUploader\n\n\u003e Upload wiki dump to [Internet Archive](https://archive.org/).\n\u003e `dokuWikiUploader -h` for help.\n\n- internetarchive\n- p7zip (`7z` command) (`p7zip-full` package)\n\n## Install `dokuWikiDumper`\n\n\u003e `dokuWikiUploader` is included in `dokuWikiDumper`.\n\n### Install `dokuWikiDumper` with `pip` (recommended)\n\n\u003e \u003chttps://pypi.org/project/dokuwikidumper/\u003e\n\n```bash\npip3 install dokuWikiDumper --upgrade\n```\n\n## Usage\n\n```bash\nusage: dokuWikiDumper [-h] [--content] [--media] [--html] [--pdf] [--current-only] [--path PATH] [--no-resume] [--threads THREADS] [--i-love-retro] [--insecure] [--ignore-errors] [--ignore-action-disabled-edit] [--trim-php-warnings]\n                      [--export-xhtml-action {export_html,export_xhtml}] [--delay DELAY] [--retry RETRY] [--hard-retry HARD_RETRY] [--parser PARSER] [--username USERNAME] [--password PASSWORD] [--verbose] [--cookies COOKIES] [--auto] [-u]\n                      [-g UPLOADER_ARGS] [--force]\n                      url\n\ndokuWikiDumper Version: 0.1.48\n\npositional arguments:\n  url                   URL of the dokuWiki (provide the doku.php URL)\n\noptions:\n  -h, --help            show this help message and exit\n  --current-only        Dump latest revision, no history [default: false]\n  --path PATH           Specify dump directory [default: \u003csite\u003e-\u003cdate\u003e]\n  --no-resume           Do not resume a previous dump [default: resume]\n  --threads THREADS     Number of sub threads to use [default: 1], not recommended to set \u003e 5\n  --i-love-retro        Do not check the latest version of dokuWikiDumper (from pypi.org) before running [default: False]\n  --insecure            Disable SSL certificate verification\n  --ignore-errors       !DANGEROUS! ignore errors in the sub threads. This may cause incomplete dumps.\n  --ignore-action-disabled-edit\n                        Some sites disable edit action for anonymous users and some core pages. This option will ignore this error and textarea not found error.But you may only get a partial dump. (only works with --content)\n  --trim-php-warnings   Trim PHP warnings from requests.Response.text\n  --export-xhtml-action {export_html,export_xhtml}\n                        HTML export action [default: export_xhtml]\n  --delay DELAY         Delay between requests [default: 0.0]\n  --retry RETRY         Maximum number of retries [default: 5]\n  --hard-retry HARD_RETRY\n                        Maximum number of retries for hard errors [default: 3]\n  --parser PARSER       HTML parser [default: lxml]\n  --username USERNAME   login: username\n  --password PASSWORD   login: password\n  --verbose             Verbose output\n  --cookies COOKIES     cookies file\n  --auto                dump: content+media+html, threads=3, ignore-action-disable-edit. (threads is overridable)\n  -u, --upload          Upload wikidump to Internet Archive after successfully dumped (only works with --auto)\n  -g UPLOADER_ARGS, --uploader-arg UPLOADER_ARGS\n                        Arguments for uploader.\n  --force               To dump even if a recent dump exists on IA\n\nData to download:\n  What info download from the wiki\n\n  --content             Dump content\n  --media               Dump media\n  --html                Dump HTML\n  --pdf                 Dump PDF [default: false] (Only available on some wikis with the PDF export plugin) (Only dumps the latest PDF revision)```\n\nFor most cases, you can use `--auto` to dump the site.\n\n```bash\ndokuWikiDumper https://example.com/wiki/ --auto\n```\n\nwhich is equivalent to\n\n```bash\ndokuWikiDumper https://example.com/wiki/ --content --media --html --threads 3 --ignore-action-disabled-edit\n```\n\n\u003e Highly recommend using `--username` and `--password` to login (or using `--cookies`), because some sites may disable anonymous users to access some pages or check the raw wikitext.\n\n`--cookies` accepts a Netscape cookies file, you can use [cookies.txt Extension](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) to export cookies from Firefox. It also accepts a json cookies file created by [Cookie Quick Manager](https://addons.mozilla.org/en-US/firefox/addon/cookie-quick-manager/). Bring a cookies file when the wiki requires you to be logged in (e.g. company ACLs or Keycloak/SSO frontends); the dumper loads those cookies before its first request so it can see the authenticated wiki immediately.\n\n## Dump structure\n\n\u003c!-- Dump structure --\u003e\n| Directory or File       | Description                                 |\n|-----------              |-------------                                |\n| `attic/`                | old revisions of page. (wikitext)           |\n| `dumpMeta/`             | (dokuWikiDumper only) metadata of the dump. |\n| `dumpMeta/check.html`   | ?do=check page of the wiki.                 |\n| `dumpMeta/config.json`  | dump's configuration.                       |\n| `dumpMeta/favicon.ico`  | favicon of the site.                        |\n| `dumpMeta/files.txt`    | list of filename.                           |\n| `dumpMeta/index.html`   | homepage of the wiki.                       |\n| `dumpMeta/info.json`    | infomations of the wiki.                    |\n| `dumpMeta/titles.txt`   | list of page title.                         |\n| `html/`                 | (dokuWikiDumper only) HTML of the pages.    |\n| `media/`                | media files.                                |\n| `meta/`                 | metadata of the pages.                      |\n| `pages/`                | latest page content. (wikitext)             |\n| `*.mark`                | mark file.                                  |\n\u003c!-- /Dump structure --\u003e\n\n## Available Backups/Dumps\n\nCheck out: \u003chttps://archive.org/search?query=subject%3A\"dokuWikiDumper\"\u003e\n\n## How to import dump to DokuWiki\n\nIf you need to import Dokuwiki, please add the following configuration to `local.php`\n\n```php\n$conf['fnencode'] = 'utf-8'; // Dokuwiki default: 'safe' (url encode)\n# 'safe' =\u003e Non-ASCII characters will be escaped as %xx form.\n# 'utf-8' =\u003e Non-ASCII characters will be preserved as UTF-8 characters.\n\n$conf['compression'] = '0'; // Dokuwiki default: 'gz'.\n# 'gz' =\u003e attic/\u003cid\u003e.\u003crev_id\u003e.txt.gz\n# 'bz2' =\u003e attic/\u003cid\u003e.\u003crev_id\u003e.txt.bz2\n# '0' =\u003e attic/\u003cid\u003e.\u003crev_id\u003e.txt\n```\n\nImport `pages` dir if you only need the latest version of the page.  \nImport `meta` dir if you need the **changelog** of the page.  \nImport `attic` and `meta` dirs if you need the old revisions **content** of the page.  \nImport `media` dir if you need the media files.\n\n`dumpMeta` and `html` dirs are only used by `dokuWikiDumper`, you can ignore it.\n\n## Information\n\n### DokuWiki links\n\n- [DokuWiki](https://www.dokuwiki.org/)\n- [DokuWiki changelog](https://www.dokuwiki.org/changelog)\n- [DokuWiki source code](https://github.com/splitbrain/dokuwiki)\n\n- [DokuWiki - ArchiveTeam Wiki](https://wiki.archiveteam.org/index.php/DokuWiki)\n\n### Other tools\n\n- [wikiteam/WikiTeam](https://github.com/wikiteam/wikiteam/), a tool for archiving MediaWiki, written in Python 2 that you won't want to use nowadays. :(\n- [mediawiki-client-tools/MediaWiki Scraper](https://github.com/mediawiki-client-tools/mediawiki-scraper) (aka `wikiteam3`), a tool for archiving MediaWiki, forked from [WikiTeam](https://github.com/wikiteam/wikiteam/) and has been rewritten in Python 3. (Lack of code writers and reviewers, STWP no longer maintains this repo.)\n- [saveweb/WikiTeam3](https://github.com/saveweb/wikiteam3) forked from MediaWiki Scraper, maintained by STWP. :)\n- [DigitalDwagon/WikiBot](https://github.com/DigitalDwagon/WikiBot) a Discord and IRC bot to run the dokuWikiDumper and wikiteam3 in the background.\n\n## License\n\nGPLv3\n\n## Contributors\n\nThis tool is based on an unmerged PR (_8 years ago!_) of [WikiTeam](https://github.com/WikiTeam/wikiteam/): [DokuWiki dump alpha](https://github.com/WikiTeam/wikiteam/pull/243) by [@PiRSquared17](https://github.com/PiRSquared17).\n\nI ([@yzqzss](https://github.com/yzqzss)) have rewritten the code in Python 3 and added ~~some features, also fixed~~ some bugs.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaveweb%2Fdokuwiki-dumper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaveweb%2Fdokuwiki-dumper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaveweb%2Fdokuwiki-dumper/lists"}