Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Maxim-Mazurok/sax-ts

SAX-style (Simple API for XML) parser in TypeScript
https://github.com/Maxim-Mazurok/sax-ts

contributions-welcome deno help-wanted node node-js node-module nodejs npm npm-module npm-package sax sax-parser typescript xml xml-parser

Last synced: 2 months ago
JSON representation

SAX-style (Simple API for XML) parser in TypeScript

Awesome Lists containing this project

README

        

# sax-ts 📦

**Simple API for XML in TypeScript**

[![CI status](https://github.com/Maxim-Mazurok/sax-ts/actions/workflows/test.yml/badge.svg)](https://github.com/Maxim-Mazurok/sax-ts/actions/workflows/test.yml)
[![License](https://img.shields.io/badge/license-ISC-brightgreen.svg)](https://github.com/Maxim-Mazurok/sax-ts/blob/master/LICENSE.md)
[![NPM](https://img.shields.io/npm/v/sax-ts.svg)](https://www.npmjs.com/package/sax-ts)
[![DenoLib](https://denolib.com/badge?scope=Maxim-Mazurok&repo=sax-ts)](https://github.com/denolib)
[![Deno Third Party Modules](https://shield.deno.dev/x/sax_ts)](https://deno.land/x/sax_ts)
[![JSR](https://jsr.io/badges/@maxim-mazurok/sax-ts)](https://jsr.io/@maxim-mazurok/sax-ts)

A [SAX](https://en.wikipedia.org/wiki/Simple_API_for_XML)-style parser for XML
and HTML.

Designed with [deno](https://deno.land/) in mind, so it's **browser compatible**

## What This Is

- A very simple tool to parse through an XML string.
- A handy way to deal with RSS and other mostly-ok-but-kinda-broken XML
docs.
- A perfect way to parse 80 GB of XML data and don't burn your laptop :)

## Usage

### Deno
```typescript
import { SAXParser } from 'https://unpkg.com/[email protected]/src/sax.ts';
// for semver, use "@%5E1.2.8", which is the urlencoded version of "@^1.2.8"
const strict: boolean = true; // change to false for HTML parsing
const options: {} = {}; // refer to "Arguments" section
const parser = new SAXParser(strict, options);

parser.onerror = function (e) {
// an error happened.
console.error(e);
};
parser.ontext = function (t) {
// got some text. t is the string of text.
console.log('onText: ', t)
};
parser.onopentag = function (node) {
// opened a tag. node has "name" and "attributes"
console.log('onOpenTag: ', node)
};
parser.onattribute = function (attr) {
// an attribute. attr has "name" and "value"
console.log('onAttribute: ', attr)
};
parser.onend = function () {
// parser stream is done, and ready to have more stuff written to it.
console.warn('end of XML')
};

parser.write('Hello, world!').close();
```

## Arguments

Pass the following arguments to the parser function. All are optional.

`strict` - Boolean. Disabled "forgiving" mode. Default: `false`.

`options` - Object bag of settings regarding string formatting. All default to
`false`.

Settings supported:

- `trim` - Boolean. Whether or not to trim text and comment nodes.
- `normalize` - Boolean. If true, then turn any whitespace into a single
space.
- `lowercase` - Boolean. If true, then lowercase tag names and attribute names
in loose mode, rather than uppercasing them.
- `xmlns` - Boolean. If true, then namespaces are supported.
- `position` - Boolean. If false, then don't track line/col/position.
- `strictEntities` - Boolean. If true, only parse [predefined XML
entities](http://www.w3.org/TR/REC-xml/#sec-predefined-ent)
(`&`, `'`, `>`, `<`, and `"`)

## Methods

`write` - Write bytes onto the stream. You don't have to do this all at
once. You can keep writing as much as you want.

`close` - Close the stream. Once closed, no more data may be written until
it is done processing the buffer, which is signaled by the `end` event.

`resume` - To gracefully handle errors, assign a listener to the `error`
event. Then, when the error is taken care of, you can call `resume` to
continue parsing. Otherwise, the parser will not continue while in an error
state.

## Events

All events emit with a single argument. To listen to an event, assign a
function to `on`. Functions get executed in the this-context of
the parser object. The list of supported events are also in the exported
`EVENTS` array.

`error` - Indication that something bad happened. The error will be hanging
out on `parser.error`, and must be deleted before parsing can continue. By
listening to this event, you can keep an eye on that kind of stuff. Note:
this happens *much* more in strict mode. Argument: instance of `Error`.
```javascript
//TODO: currently `error` is protected, need to expose it to user somehow.
```

`text` - Text node. Argument: string of text.

`doctype` - The ``. Argument:
object with `name` and `body` members. Attributes are not parsed, as
processing instructions have implementation dependent semantics.

`sgmldeclaration` - Random SGML declarations. Stuff like ``
would trigger this kind of event. This is a weird thing to support, so it
might go away at some point. SAX isn't intended to be used to parse SGML,
after all.

`opentagstart` - Emitted immediately when the tag name is available,
but before any attributes are encountered. Argument: object with a
`name` field and an empty `attributes` set. Note that this is the
same object that will later be emitted in the `opentag` event.

`opentag` - An opening tag. Argument: object with `name` and `attributes`.
In non-strict mode, tag names are uppercased, unless the `lowercase`
option is set. If the `xmlns` option is set, then it will contain
namespace binding information on the `ns` member, and will have a
`local`, `prefix`, and `uri` member.

`closetag` - A closing tag. In loose mode, tags are auto-closed if their
parent closes. In strict mode, well-formedness is enforced. Note that
self-closing tags will have `closeTag` emitted immediately after `openTag`.
Argument: tag name.

`attribute` - An attribute node. Argument: object with `name` and `value`.
In non-strict mode, attribute names are in upper-case, unless the `lowercase`
option is set. If the `xmlns` option is set, it will also contains namespace
information.

`comment` - A comment node. Argument: the string of the comment.

`opencdata` - The opening tag of a ``) of a `` tags trigger a `"script"`
event, and their contents are not checked for special xml characters.
If you pass `noscript: true`, then this behavior is suppressed.

---

# Disclaimers

## What This Is (probably) Not

- An HTML Parser - That's a fine goal, but this isn't it. It's just XML.
- A DOM Builder - You can use it to build an object model out of XML, but it
does not do that out of the box.
- XSLT - No DOM = no querying.
- 100% Compliant with (some other SAX implementation) - Most SAX
implementations are in Java and do a lot more than this does.
- An XML Validator - It does a little validation when in strict mode, but
not much.
- A Schema-Aware XSD Thing - Schemas are an exercise in fetishistic
masochism.
- A DTD-aware Thing - Fetching DTDs is a much bigger job.

## Regarding `