https://github.com/openzim/mwoffliner
MediaWiki scraper: all your wiki articles in one highly compressed ZIM file
https://github.com/openzim/mwoffliner
archive mediawiki nodejs offline openzim scraper wikipedia zim
Last synced: 11 days ago
JSON representation
MediaWiki scraper: all your wiki articles in one highly compressed ZIM file
- Host: GitHub
- URL: https://github.com/openzim/mwoffliner
- Owner: openzim
- License: gpl-3.0
- Created: 2016-02-08T01:01:27.000Z (about 9 years ago)
- Default Branch: main
- Last Pushed: 2025-04-12T14:55:34.000Z (12 days ago)
- Last Synced: 2025-04-12T15:44:36.069Z (12 days ago)
- Topics: archive, mediawiki, nodejs, offline, openzim, scraper, wikipedia, zim
- Language: TypeScript
- Homepage: https://www.npmjs.com/package/mwoffliner
- Size: 10.1 MB
- Stars: 330
- Watchers: 18
- Forks: 88
- Open Issues: 203
-
Metadata Files:
- Readme: README.md
- Changelog: Changelog
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# MWoffliner
MWoffliner is a tool for making a local offline HTML snapshot of any
online [MediaWiki](https://mediawiki.org) instance. It goes through
all online articles (or a selection if specified) and create the
corresponding [ZIM](https://openzim.org) file. It has mainly been
tested against Wikimedia projects like
[Wikipedia](https://wikipedia.org) and
[Wiktionary](https://wiktionary.org) --- but it should also work for
any recent MediaWiki.Read [CONTRIBUTING.md](./CONTRIBUTING.md) to know more about
MWoffliner development.User Help is available in the for a a
[FAQ](https://github.com/openzim/mwoffliner/wiki/Frequently-Asked-Questions).[](https://www.npmjs.com/package/mwoffliner)
[](https://www.npmjs.com/package/mwoffliner)
[](https://www.npmjs.com/package/mwoffliner)
[](https://ghcr.io/openzim/mwoffliner)
[](https://github.com/openzim/mwoffliner/actions/workflows/ci.yml?query=branch%3Amain)
[](https://codecov.io/gh/openzim/mwoffliner)
[](https://www.codefactor.io/repository/github/openzim/mwoffliner)
[](LICENSE)
[](https://slack.kiwix.org)## Features
- Scrape with or without image thumbnail
- Scrape with or without audio/video multimedia content
- S3 cache (optional)
- Image size optimiser / Webp converter
- Scrape all articles in namespaces or title list based
- Specify additional/non-main namespaces to scrapeRun `mwoffliner --help` to get all the possible options.
## Prerequisites
- *NIX Operating System (GNU/Linux, macOS, ...)
- [Redis](https://redis.io/)
- [NodeJS](https://nodejs.org/en/) version 22 (we support only one single Node.JS version, other versions might work or not)
- [Libzim](https://github.com/openzim/libzim) (On GNU/Linux & macOS we automatically download it)
- Various build tools which are probably already installed on your
machine (packages `libjpeg-dev`, `libglu1`, `autoconf`, `automake`, `gcc` on
Debian/Ubuntu)... and an online MediaWiki with its API available.
## Usage
To install MWoffliner globally:
```bash
npm i -g mwoffliner
```You might need to run this command with the `sudo` command, depending
how your `npm` is configured.`npm` permission checking can be a bit annoying for a
newcomer. Please read the documentation carefully if you hit
problems: https://docs.npmjs.com/cli/v7/using-npm/scripts#userThen to run it:
```bash
mwoffliner --help
```To install and run it locally:
```bash
npm i
npm run mwoffliner -- --help
```To use MWoffliner with a S3 cache, you should provide a S3 URL like
this:
```bash
--optimisationCacheUrl="https://wasabisys.com/?bucketName=my-bucket&keyId=my-key-id&secretAccessKey=my-sac"
```## API
MWoffliner provides also an API and therefore can be used as a NodeJS
library. Here a stub example that could go in your index.mjs file:
```javascript
import * as mwoffliner from 'mwoffliner';const parameters = {
mwUrl: "https://es.wikipedia.org",
adminEmail: "[email protected]",
verbose: true,
format: "nopic",
articleList: "./articleList"
};
mwoffliner.execute(parameters); // returns a Promise
```## Background
Complementary information about MWoffliner:
* MediaWiki software is used by thousands of wikis, the most
famous ones being the Wikimedia ones, including [Wikipedia](https://wikipedia.org).
* MediaWiki is a PHP wiki runtime engine.
* Wikitext is the name of the markup language that MediaWiki uses.
* MediaWiki includes a parser for WikiText into HTML, and this
parser creates the HTML pages displayed in your browser.License
-------[GPLv3](https://www.gnu.org/licenses/gpl-3.0) or later, see
[LICENSE](LICENSE) for more details.