Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kristian/minify-xml

Fast XML minifier / compressor / uglifier with a command-line
https://github.com/kristian/minify-xml

Last synced: about 22 hours ago
JSON representation

Fast XML minifier / compressor / uglifier with a command-line

Awesome Lists containing this project

README

        

# minify-xml

`minify-xml` is a lightweight and fast XML minifier for NodeJS with a command line.

Existing XML minifiers, such as `pretty-data` often do a pretty (*phun intended*) bad job minifying XML in usually only removing comments and whitespace between tags. `minify-xml` on the other hand also includes minification of tags, e.g. by collapsing the whitespace between multiple attributes and further minifications, such as the removal of unused namespace declarations. `minify-xml` is based on regular expressions and thus executes blazingly fast.

## Online

Use this package online to minify XMLs in your browser, visit:

**[Minify-X.ML](https://minify-x.ml/)** ([https://minify-x.ml/](https://minify-x.ml/))

## Installation

```bash
npm install minify-xml -g
```

## Usage

```js
import minifyXML from "minify-xml";

const xml = `



any valid element content is left unaffected (strangely enough = " ... "
and even > are valid characters in XML, only < must always be encoded)

]]>
`;

console.log(minifyXML(xml));
```

This outputs the minified XML:

```xml

any valid element content is left unaffected (strangely enough = " ... "
and even > are valid characters in XML, only < must always be encoded)
]]>
```

Alternatively a [Node.js `Transform` stream](https://nodejs.org/api/stream.html#stream_class_stream_transform) can be provided to minify XML streams, which is especially helpful for very large files (> 2 GiB, which is the maximum `Buffer` size in Node.js on 64-bit machines):

```js
import { minifyStream as minifyXMLStream } from "minify-xml";

fs.createReadStream("sitemap.xml", "utf8")
.pipe(minifyXMLStream())
.pipe(process.stdout);
```

Similar to streams, Node.js 15 introduced an asynchronous [`stream.pipeline` API](https://nodejs.org/docs/latest-v18.x/api/stream.html#streampipelinesource-transforms-destination-options) that with `stream/promises` utilizes promises. This way you can utilize the advantages of the streaming API (namely no file size limit) in conjunction with the convenience of using a modern promise based API:

```js
import { minifyPipeline as minifyXMLPipeline } from "minify-xml";

await minifyXMLPipeline(fs.createReadStream("catalogue.xml", "utf8"), process.stdout, { end: false });
```

## Options

You may pass in the following options when calling minify:

```js
import { minify as minifyXML, minifyStream as minifyXMLStream } from "minify-xml";
minifyXML(``, { ... });
minifyXMLStream({ ... });
```

- `removeComments` (default: `true`): Remove comments like ``.

- `removeWhitespaceBetweenTags` (default: `true`): Remove whitespace between tags like ` `. Can be limited to tags only by passing the string `"strict"`, otherwise by default other XML constructs as the prolog ``, processing instructions ``, the document type declaration ``, CDATA sections `` and comments `` will be also considered as tags.

- `considerPreserveWhitespace` (default: `true`): Do consider the `xml:space="preserve"` attribute or `

` tags in any namespace when `removeWhitespaceBetweenTags`. If set to true and `xml:space="preserve"` is specified, whitespace between tags like `   ` will _not_ be removed.

- `collapseWhitespaceInTags` (default: `true`): Collapse whitespace in tags like ``.

- `collapseEmptyElements` (default: `true`): Collapse empty elements like ``.

- `trimWhitespaceFromTexts` (default: `false`): Remove leading and tailing whitespace in elements containing text only or a mixture of text and other elements like ` Hello World `.

- `collapseWhitespaceInTexts` (default: `false`): Collapse whitespace in elements containing text or a mixture of text and other elements (useful for (X)HTML) like `Hello World`.

- `collapseWhitespaceInProlog` (default: `true`): Collapse and remove whitespace in the xml prolog ``.

- `collapseWhitespaceInDocType` (default: `true`): Collapse and remove whitespace in the xml document type declaration ``

- `removeSchemaLocationAttributes` (default: `false`): Remove any `xsi:schemaLocation` and `xsi:noNamespaceSchemaLocation` attributes ``

- `removeUnnecessaryStandaloneDeclaration` (default: `true`): Remove an unnecessary standalone declaration in the xml prolog ``. Note that according to the W3C standalone has "no meaning" and thus is removed, in case there are no external markup declarations.

- `removeUnusedNamespaces` (default: `true`): Remove any namespaces from tags, which are not used anywhere in the document, like ``. Notice the word *anywhere* here, the minifier not does consider the structure of the XML document, thus namespaces which might be only used in a certain sub-tree of elements might not be removed, even though they are not used in that sub-tree.

- `removeUnusedDefaultNamespace`(default: `true`): Remove default namespace declaration like `` in case there is no tag without a namespace in the whole document.

- `shortenNamespaces` (default: `true`): Shorten namespaces, like `` to a minimal length, e.g. ``. First an attempt is made to shorten the existing namespace to one letter only (e.g. `namespace` is shortened to `n`), in case that letter is already taken, the shortest possible other namespace is used.

- `ignoreCData` (default: `true`): Ignore any content inside of CData tags ``.

For stream processing following additional options can be supplied:

- `streamMaxMatchLength` (default: `262144`, 256 KiB): The maximum size of matches between chunks. See [`replacestream`](https://www.npmjs.com/package/replacestream#does-that-apply-across-more-than-2-chunks-how-does-it-work-with-regexes) for a detailed explanation.

### Stream Limitations

Note that the default `streamMaxMatchLength` was deliberately chosen as high as a multiple of the Node.js default stream buffer size (the default buffer size for readable streams is 16 KiB, for file system streams it is 64 KiB), as the stream option is specifically meant to be used with very large files / read streams and a larger `streamMaxMatchLength` will result in a more accurate minification, because some very large tags might require to be read into the buffer all at once to be minified.

On 32-bit machines the maximum buffer size in Node.js is 1 GiB and 2 GiB on 64-bit machines (see [this issue](https://bugs.chromium.org/p/v8/issues/detail?id=4153)). Minify XML can handle strings up to that size and using the `minify` function should be preferred over the `minifyStream` option. For larger files / streams the streaming API has to be used, which comes with certain limitations, because no prior knowledge can be obtained for the minification (mainly because we assume we can read the stream only once, an option to obtain the required information by e.g. first parsing a file and then minifying it might be added some time in future). For now the options `removeUnusedNamespaces`, `removeUnusedDefaultNamespace`, `shortenNamespaces` and `ignoreCData` cannot be used with the streaming API and calling the `minifyStream` function with these options enabled, will result in an error.

Further multiple buffers of the set size, will be created for each minification option enabled (sometimes a minification requires even multiple buffers / replacements). Thus enabling more options will also allocate more memory depending on the `streamMaxMatchLength` option and in case the file / read stream is generally larger than the buffer size set. As the input will be pumped through all minification as a stream, roughly `1.5 * n * buffer size` will get allocated. E.g. the default buffer size of 256 KiB with all default options enabled for streaming, will for instance result in 11 buffers / replacements to be made, so 11 * 256 KiB = 2.75 MiB is to be allocated if the input stream is 256 KiB or larger.

## CLI

You can run `minify-xml` from the command line to minify XML files:

```bash
minify-xml sitemap.xml
minify-xml blog.atom --in-place
minify-xml view.xml --output view.min.xml
minify-xml db.xml --stream > out.xml
```

Use any of the options above like:

```bash
minify-xml index.html --collapse-whitespace-in-texts --ignore-cdata false
```

## Author

XML minifier by [Kristian Kraljić](https://kra.lc/). Original package and CLI by [Mathias Bynens](https://mathiasbynens.be/).

## Bugs

Please file any issues [on Github](https://github.com/kristian/minify-xml/issues).

## License

This library is dual licensed under the [MIT and Apache 2.0](LICENSE) licenses.