Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/isaacs/sax-js
A sax style parser for JS
https://github.com/isaacs/sax-js
Last synced: 1 day ago
JSON representation
A sax style parser for JS
- Host: GitHub
- URL: https://github.com/isaacs/sax-js
- Owner: isaacs
- License: other
- Created: 2010-02-09T01:49:11.000Z (almost 15 years ago)
- Default Branch: main
- Last Pushed: 2024-05-27T23:46:20.000Z (8 months ago)
- Last Synced: 2025-01-12T20:43:50.584Z (3 days ago)
- Language: JavaScript
- Size: 473 KB
- Stars: 1,097
- Watchers: 26
- Forks: 324
- Open Issues: 98
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Authors: AUTHORS
Awesome Lists containing this project
- awesome-nodejs-pure-js - sax-js
README
# sax js
A sax-style parser for XML and HTML.
Designed with [node](http://nodejs.org/) in mind, but should work fine in
the browser or other CommonJS implementations.## What This Is
* A very simple tool to parse through an XML string.
* A stepping stone to a streaming HTML parser.
* A handy way to deal with RSS and other mostly-ok-but-kinda-broken XML
docs.## What This Is (probably) Not
* An HTML Parser - That's a fine goal, but this isn't it. It's just
XML.
* A DOM Builder - You can use it to build an object model out of XML,
but it doesn't do that out of the box.
* XSLT - No DOM = no querying.
* 100% Compliant with (some other SAX implementation) - Most SAX
implementations are in Java and do a lot more than this does.
* An XML Validator - It does a little validation when in strict mode, but
not much.
* A Schema-Aware XSD Thing - Schemas are an exercise in fetishistic
masochism.
* A DTD-aware Thing - Fetching DTDs is a much bigger job.## Regarding `Hello, world!').close();
// stream usage
// takes the same options as the parser
var saxStream = require("sax").createStream(strict, options)
saxStream.on("error", function (e) {
// unhandled errors will throw, since this is a proper node
// event emitter.
console.error("error!", e)
// clear the error
this._parser.error = null
this._parser.resume()
})
saxStream.on("opentag", function (node) {
// same object as above
})
// pipe is supported, and it's readable/writable
// same chunks coming in also go out.
fs.createReadStream("file.xml")
.pipe(saxStream)
.pipe(fs.createWriteStream("file-copy.xml"))
```## Arguments
Pass the following arguments to the parser function. All are optional.
`strict` - Boolean. Whether or not to be a jerk. Default: `false`.
`opt` - Object bag of settings regarding string formatting. All default to `false`.
Settings supported:
* `trim` - Boolean. Whether or not to trim text and comment nodes.
* `normalize` - Boolean. If true, then turn any whitespace into a single
space.
* `lowercase` - Boolean. If true, then lowercase tag names and attribute names
in loose mode, rather than uppercasing them.
* `xmlns` - Boolean. If true, then namespaces are supported.
* `position` - Boolean. If false, then don't track line/col/position.
* `strictEntities` - Boolean. If true, only parse [predefined XML
entities](http://www.w3.org/TR/REC-xml/#sec-predefined-ent)
(`&`, `'`, `>`, `<`, and `"`)
* `unquotedAttributeValues` - Boolean. If true, then unquoted
attribute values are allowed. Defaults to `false` when `strict`
is true, `true` otherwise.## Methods
`write` - Write bytes onto the stream. You don't have to do this all at
once. You can keep writing as much as you want.`close` - Close the stream. Once closed, no more data may be written until
it is done processing the buffer, which is signaled by the `end` event.`resume` - To gracefully handle errors, assign a listener to the `error`
event. Then, when the error is taken care of, you can call `resume` to
continue parsing. Otherwise, the parser will not continue while in an error
state.## Members
At all times, the parser object will have the following members:
`line`, `column`, `position` - Indications of the position in the XML
document where the parser currently is looking.`startTagPosition` - Indicates the position where the current tag starts.
`closed` - Boolean indicating whether or not the parser can be written to.
If it's `true`, then wait for the `ready` event to write again.`strict` - Boolean indicating whether or not the parser is a jerk.
`opt` - Any options passed into the constructor.
`tag` - The current tag being dealt with.
And a bunch of other stuff that you probably shouldn't touch.
## Events
All events emit with a single argument. To listen to an event, assign a
function to `on`. Functions get executed in the this-context of
the parser object. The list of supported events are also in the exported
`EVENTS` array.When using the stream interface, assign handlers using the EventEmitter
`on` function in the normal fashion.`error` - Indication that something bad happened. The error will be hanging
out on `parser.error`, and must be deleted before parsing can continue. By
listening to this event, you can keep an eye on that kind of stuff. Note:
this happens *much* more in strict mode. Argument: instance of `Error`.`text` - Text node. Argument: string of text.
`doctype` - The ``. Argument:
object with `name` and `body` members. Attributes are not parsed, as
processing instructions have implementation dependent semantics.`sgmldeclaration` - Random SGML declarations. Stuff like ``
would trigger this kind of event. This is a weird thing to support, so it
might go away at some point. SAX isn't intended to be used to parse SGML,
after all.`opentagstart` - Emitted immediately when the tag name is available,
but before any attributes are encountered. Argument: object with a
`name` field and an empty `attributes` set. Note that this is the
same object that will later be emitted in the `opentag` event.`opentag` - An opening tag. Argument: object with `name` and `attributes`.
In non-strict mode, tag names are uppercased, unless the `lowercase`
option is set. If the `xmlns` option is set, then it will contain
namespace binding information on the `ns` member, and will have a
`local`, `prefix`, and `uri` member.`closetag` - A closing tag. In loose mode, tags are auto-closed if their
parent closes. In strict mode, well-formedness is enforced. Note that
self-closing tags will have `closeTag` emitted immediately after `openTag`.
Argument: tag name.`attribute` - An attribute node. Argument: object with `name` and `value`.
In non-strict mode, attribute names are uppercased, unless the `lowercase`
option is set. If the `xmlns` option is set, it will also contains namespace
information.`comment` - A comment node. Argument: the string of the comment.
`opencdata` - The opening tag of a ``) of a `` tags trigger a `"script"`
event, and their contents are not checked for special xml characters.
If you pass `noscript: true`, then this behavior is suppressed.## Reporting Problems
It's best to write a failing test if you find an issue. I will always
accept pull requests with failing tests if they demonstrate intended
behavior, but it is very hard to figure out what issue you're describing
without a test. Writing a test is also the best way for you yourself
to figure out if you really understand the issue you think you have with
sax-js.