https://github.com/angrycoding/shallow-xml
XML parser that allows you to parse or extract fragments from invalid xml / html documents
https://github.com/angrycoding/shallow-xml
Last synced: 4 days ago
JSON representation
XML parser that allows you to parse or extract fragments from invalid xml / html documents
- Host: GitHub
- URL: https://github.com/angrycoding/shallow-xml
- Owner: angrycoding
- Created: 2013-04-26T21:08:10.000Z (almost 13 years ago)
- Default Branch: master
- Last Pushed: 2013-04-26T23:34:14.000Z (almost 13 years ago)
- Last Synced: 2025-04-09T18:54:16.154Z (10 months ago)
- Homepage:
- Size: 121 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Shallow XML parser for Node.js
===========
This parser allows you to parse XML documents that contains fragments that is
impossible to parse using "classical" parsing methods. For instance you have an
abstract HTML document provided by the 3rd party. The only thing you know about
this document is that it contains "something" inside `` tag. You
have no idea if it's valid but your task is to extract this data and use it
for something else. You won't be able to parse this document (in case if it's invalid)
using classical XML parser.
Instead of doing it in classical way, shallow-xml uses a technique called
[shallow parsing](http://en.wikipedia.org/wiki/Shallow_parsing). Using regular expressions
it converts original xml - document into a list of tokens, trying to extract as much
useful information as possible, based on known structure defined by the developer.
Take a look on following XML document (OpenSocial gadget definition):
```xml
- 111
- 222
- 333
hello
```
As you can see this xml document cannot be parsed using classical xml parser,
because HTML code that is placed inside `` tag is invalid.
However this document is not a problem for shallow parser, just define known structure.
By setting up the structure you're saying that only listed tags has to be processed during the parsing:
```javascript
// create parser instance
var GadgetParser = new Parser([
// parse only listed tags
'Module', 'ModulePrefs', 'Content'
]);
// parse xml document
var gadgetDoc = GadgetParser.parse(xml);
// output xml document
console.info(gadgetDoc);
```
## Parser API ##
```javascript
// find Module element
var moduleEl = gadgetDoc('Module');
// find ModulePrefs element inside Module element
var modulePrefsEl = moduleEl('ModulePrefs');
// extract @title attribute of ModulePrefs element
console.info(modulePrefsEl('@title'));
// find Content elements inside Module element
var contentEls = moduleEl('Content');
// extract @view attribute of second Content element
console.info(contentEls(0)('@view'));
```