Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/axelpale/filterxml
Simplify your XML by removing XML nodes that match XPath expressions
https://github.com/axelpale/filterxml
Last synced: 3 months ago
JSON representation
Simplify your XML by removing XML nodes that match XPath expressions
- Host: GitHub
- URL: https://github.com/axelpale/filterxml
- Owner: axelpale
- License: mit
- Created: 2017-10-07T11:55:19.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2024-03-14T13:11:31.000Z (10 months ago)
- Last Synced: 2024-03-14T22:58:41.532Z (10 months ago)
- Language: JavaScript
- Homepage:
- Size: 243 KB
- Stars: 2
- Watchers: 3
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# filterxml
[![npm](https://img.shields.io/npm/v/filterxml.svg?colorB=green)](https://www.npmjs.com/package/filterxml)
[![npm](https://img.shields.io/npm/dm/filterxml.svg)](https://www.npmjs.com/package/filterxml)
[![Node Version](https://img.shields.io/node/v/filterxml.svg)](https://github.com/axelpale/filterxml)
[![GitHub Actions workflow status](https://img.shields.io/github/actions/workflow/status/axelpale/filterxml/filterxml-ci.yml)](https://github.com/axelpale/filterxml/actions/workflows/filterxml-ci.yml)Keep it simple! Here is a Node.js module to remove unnecessary XML nodes that match given XPath expressions. It uses [xpath](https://www.npmjs.com/package/xpath) and [xmldom](https://github.com/xmldom/xmldom) under the hood.
[Command-line usage](#command-line-usage) – [Node API](#node-api-usage) – [Examples](#example) – [Working with namespaces](#working-with-namespaces)
![Logo](logo.png?raw=true "Fight the power!")
## Command-line usage
Install with `$ npm install filterxml -g` and then:
$ filterxml -e pattern -n prefix=namespaceURI input.xml output.xml
For example, remove `Style` and `StyleMap` from a [Keyhole Markup Language](https://en.wikipedia.org/wiki/Keyhole_Markup_Language) document with:
$ filterxml --exclude kml:Style --exclude kml:StyleMap \
--namespace kml=http://www.opengis.net/kml/2.2 \
source.kml simplified.kmlSpecify multiple patterns and namespaces with additional `-e, --exclude` and `-n, --namespace` flags. See `filterxml --help` for details.
## Node API usage
Install with `$ npm install filterxml` and then:
> const filterxml = require('filterxml')
> filterxml(xmlIn, patterns, namespaces, function (err, xmlOut) { ... })Where
- `xmlIn` is a string representing the input XML document.
- `patterns` is an array of XPath expressions, like 'book', '/bookstore/book', or '//html:title'. The matching XML nodes will be removed.
- `namespaces` is a map from prefixes to namespace URIs, for example `{ html: 'http://www.w3.org/TR/html4/' }`
- `xmlOut` is a string representing the filtered output XML document.Common XPath expressions to match nodes include:
- `x:book` to match all book nodes under a namespace associated with the `x` prefix in `namespaces`.
- `x:bookstore/x:book` to match all books **directly** under a bookstore.
- `x:bookstore//x:book` to match all books **somewhere** under a bookstore.
- `x:bookstore/x:book[1]` to match the **first** book directly under a bookstore.
- `book` to match all book nodes that **are not** under a namespace. This is a quite rare situation in real-world XML documents.## Limitations
Internally, filterxml depends on [xmldom](https://www.npmjs.com/package/@xmldom/xmldom) that respects the standard Web API [XMLSerializer](https://developer.mozilla.org/en-US/docs/Web/API/XMLSerializer/serializeToString). The serializer can sometimes produce unexpected results:
- most empty-element tags like `` will be converted to begin and end tags like ``.
- some common empty-element tags like `` are preserved but will lose the space before the slash like ``.## Example
Let us filter out all `book` nodes:
const xmlIn = '' +
'Animal Farm' +
'Nineteen Eighty-Four' +
'Reflections on Writing' +
''filterxml(xmlIn, ['book'], {}, function (err, xmlOut) {
if (err) { throw err }
console.log(xmlOut)
})Outputs:
Reflections on Writing
## Real-world example
Let us remove Style tags from a Keyhole Markup Language (KML) file:
Awesome locations
<IconStyle>
<scale>1.1</scale>
<Icon>
<href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href>
</Icon>
<hotSpot x="20" y="2" xunits="pixels" yunits="pixels"/>
</IconStyle>
<PolyStyle>
<fill>0</fill>
</PolyStyle>
Reykjavik
-21.933333,64.133333,0
We read the file, filter it, and save the result. Note how we must add a namespace prefix into our pattern to match nodes under the namespace defined in `kml` node. Note also how we must associate any used prefix with a namespace URI.
const filterxml = require('filterxml')
const fs = require('fs')const xmlIn = fs.readFileSync('./norway.kml')
const patterns = ['x:Style']
const namespaces = {
x: 'http://www.opengis.net/kml/2.2'
}filterxml(xmlIn, patterns, namespaces, function (err, xmlOut) {
if (err) { throw err }
fs.writeFileSync('./norway-simplified.kml', xmlOut)
})The resulting `norway-simplified.kml`:
Awesome locations
Reykjavik
-21.933333,64.133333,0
## Working with namespaces
Often XML documents specify multiple namespaces. For example:
1
25.59176188650433,45.6493071755744,0
To match and remove the `gx:drawOrder` node we can **not** just `filterxml(xmlIn, ['gx:drawOrder'], {}, callback)`. The callback would receive an error `No namespace associated with prefix gx in gx:drawOrder`. Instead, we must specify what the prefix `gx` in our XPath pattern means. It misleadingly looks like it has already been specified in the `kml` tag. We cannot blindly trust it. This is because the same prefix can map to different namespace URI in different part of the document:
A place described with OpenGIS markup
A place described with Google's KML markup
Therefore we must always specify the prefixes we use in our XPath patterns. To remove `gx:drawOrder` the following is a valid approach. Note that we can use whatever prefix we want as long as we associate it with a correct namespace URI.
const patterns = ['foo:drawOrder']
const namespaces = { foo: 'http://www.google.com/kml/ext/2.2' }
filterxml(xmlIn, patterns, namespaces, function (err, xmlOut) {
...
})The snippet above results with `xmlOut` equal to:
25.59176188650433,45.6493071755744,0
So, always declare your prefixes!
## Licence
[MIT](LICENSE)