Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/axelpale/filterxml

Simplify your XML by removing XML nodes that match XPath expressions
https://github.com/axelpale/filterxml

Last synced: 3 months ago
JSON representation

Simplify your XML by removing XML nodes that match XPath expressions

Awesome Lists containing this project

README

        

# filterxml

[![npm](https://img.shields.io/npm/v/filterxml.svg?colorB=green)](https://www.npmjs.com/package/filterxml)
[![npm](https://img.shields.io/npm/dm/filterxml.svg)](https://www.npmjs.com/package/filterxml)
[![Node Version](https://img.shields.io/node/v/filterxml.svg)](https://github.com/axelpale/filterxml)
[![GitHub Actions workflow status](https://img.shields.io/github/actions/workflow/status/axelpale/filterxml/filterxml-ci.yml)](https://github.com/axelpale/filterxml/actions/workflows/filterxml-ci.yml)

Keep it simple! Here is a Node.js module to remove unnecessary XML nodes that match given XPath expressions. It uses [xpath](https://www.npmjs.com/package/xpath) and [xmldom](https://github.com/xmldom/xmldom) under the hood.

[Command-line usage](#command-line-usage) – [Node API](#node-api-usage) – [Examples](#example) – [Working with namespaces](#working-with-namespaces)

![Logo](logo.png?raw=true "Fight the power!")

## Command-line usage

Install with `$ npm install filterxml -g` and then:

$ filterxml -e pattern -n prefix=namespaceURI input.xml output.xml

For example, remove `Style` and `StyleMap` from a [Keyhole Markup Language](https://en.wikipedia.org/wiki/Keyhole_Markup_Language) document with:

$ filterxml --exclude kml:Style --exclude kml:StyleMap \
--namespace kml=http://www.opengis.net/kml/2.2 \
source.kml simplified.kml

Specify multiple patterns and namespaces with additional `-e, --exclude` and `-n, --namespace` flags. See `filterxml --help` for details.

## Node API usage

Install with `$ npm install filterxml` and then:

> const filterxml = require('filterxml')
> filterxml(xmlIn, patterns, namespaces, function (err, xmlOut) { ... })

Where
- `xmlIn` is a string representing the input XML document.
- `patterns` is an array of XPath expressions, like 'book', '/bookstore/book', or '//html:title'. The matching XML nodes will be removed.
- `namespaces` is a map from prefixes to namespace URIs, for example `{ html: 'http://www.w3.org/TR/html4/' }`
- `xmlOut` is a string representing the filtered output XML document.

Common XPath expressions to match nodes include:
- `x:book` to match all book nodes under a namespace associated with the `x` prefix in `namespaces`.
- `x:bookstore/x:book` to match all books **directly** under a bookstore.
- `x:bookstore//x:book` to match all books **somewhere** under a bookstore.
- `x:bookstore/x:book[1]` to match the **first** book directly under a bookstore.
- `book` to match all book nodes that **are not** under a namespace. This is a quite rare situation in real-world XML documents.

## Limitations

Internally, filterxml depends on [xmldom](https://www.npmjs.com/package/@xmldom/xmldom) that respects the standard Web API [XMLSerializer](https://developer.mozilla.org/en-US/docs/Web/API/XMLSerializer/serializeToString). The serializer can sometimes produce unexpected results:
- most empty-element tags like `` will be converted to begin and end tags like ``.
- some common empty-element tags like `` are preserved but will lose the space before the slash like ``.

## Example

Let us filter out all `book` nodes:

const xmlIn = '' +
'Animal Farm' +
'Nineteen Eighty-Four' +
'Reflections on Writing' +
''

filterxml(xmlIn, ['book'], {}, function (err, xmlOut) {
if (err) { throw err }
console.log(xmlOut)
})

Outputs:

Reflections on Writing

## Real-world example

Let us remove Style tags from a Keyhole Markup Language (KML) file:




Awesome locations

<IconStyle>
<scale>1.1</scale>
<Icon>
<href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href>
</Icon>
<hotSpot x="20" y="2" xunits="pixels" yunits="pixels"/>
</IconStyle>
<PolyStyle>
<fill>0</fill>
</PolyStyle>


Reykjavik

-21.933333,64.133333,0



We read the file, filter it, and save the result. Note how we must add a namespace prefix into our pattern to match nodes under the namespace defined in `kml` node. Note also how we must associate any used prefix with a namespace URI.

const filterxml = require('filterxml')
const fs = require('fs')

const xmlIn = fs.readFileSync('./norway.kml')
const patterns = ['x:Style']
const namespaces = {
x: 'http://www.opengis.net/kml/2.2'
}

filterxml(xmlIn, patterns, namespaces, function (err, xmlOut) {
if (err) { throw err }
fs.writeFileSync('./norway-simplified.kml', xmlOut)
})

The resulting `norway-simplified.kml`:




Awesome locations

Reykjavik

-21.933333,64.133333,0



## Working with namespaces

Often XML documents specify multiple namespaces. For example:




1
25.59176188650433,45.6493071755744,0

To match and remove the `gx:drawOrder` node we can **not** just `filterxml(xmlIn, ['gx:drawOrder'], {}, callback)`. The callback would receive an error `No namespace associated with prefix gx in gx:drawOrder`. Instead, we must specify what the prefix `gx` in our XPath pattern means. It misleadingly looks like it has already been specified in the `kml` tag. We cannot blindly trust it. This is because the same prefix can map to different namespace URI in different part of the document:





A place described with OpenGIS markup




A place described with Google's KML markup

Therefore we must always specify the prefixes we use in our XPath patterns. To remove `gx:drawOrder` the following is a valid approach. Note that we can use whatever prefix we want as long as we associate it with a correct namespace URI.

const patterns = ['foo:drawOrder']
const namespaces = { foo: 'http://www.google.com/kml/ext/2.2' }
filterxml(xmlIn, patterns, namespaces, function (err, xmlOut) {
...
})

The snippet above results with `xmlOut` equal to:




25.59176188650433,45.6493071755744,0

So, always declare your prefixes!

## Licence

[MIT](LICENSE)