{"id":16506441,"url":"https://github.com/matteodelabre/saxophone","last_synced_at":"2025-03-16T18:32:31.463Z","repository":{"id":40791692,"uuid":"64762939","full_name":"matteodelabre/saxophone","owner":"matteodelabre","description":"Fast and lightweight event-driven streaming XML parser in pure JavaScript","archived":false,"fork":false,"pushed_at":"2023-03-01T08:43:31.000Z","size":639,"stargazers_count":36,"open_issues_count":8,"forks_count":21,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-09T04:31:55.054Z","etag":null,"topics":["javascript","large-dataset","parser","sax","xml"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/matteodelabre.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-08-02T14:17:10.000Z","updated_at":"2024-09-20T11:41:14.000Z","dependencies_parsed_at":"2024-08-27T08:16:59.754Z","dependency_job_id":null,"html_url":"https://github.com/matteodelabre/saxophone","commit_stats":{"total_commits":102,"total_committers":8,"mean_commits":12.75,"dds":"0.36274509803921573","last_synced_commit":"732d7f353e78b2f106792d55d077a8c48c987a4e"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matteodelabre%2Fsaxophone","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matteodelabre%2Fsaxophone/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matteodelabre%2Fsaxophone/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matteodelabre%2Fsaxophone/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/matteodelabre","download_url":"https://codeload.github.com/matteodelabre/saxophone/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243826780,"owners_count":20354220,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["javascript","large-dataset","parser","sax","xml"],"created_at":"2024-10-11T15:19:48.136Z","updated_at":"2025-03-16T18:32:27.575Z","avatar_url":"https://github.com/matteodelabre.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Saxophone 🎷\n\nFast and lightweight event-driven streaming XML parser in pure JavaScript.\n\n[![npm version](https://img.shields.io/npm/v/saxophone.svg?style=flat-square)](https://www.npmjs.com/package/saxophone)\n[![npm downloads](https://img.shields.io/npm/dm/saxophone.svg?style=flat-square)](https://www.npmjs.com/package/saxophone)\n[![build status](https://img.shields.io/github/workflow/status/matteodelabre/saxophone/test?style=flat-square)](https://github.com/matteodelabre/saxophone/actions)\n[![coverage](https://img.shields.io/coveralls/matteodelabre/saxophone.svg?style=flat-square)](https://coveralls.io/github/matteodelabre/saxophone)\n\nSaxophone is inspired by SAX parsers such as [sax-js](https://github.com/isaacs/sax-js) and [EasySax](https://github.com/vflash/easysax): unlike most XML parsers, it does not create a Document Object Model ([DOM](https://en.wikipedia.org/wiki/Document_Object_Model)) tree as a result of parsing documents.\nInstead, it emits events for each tag or text node encountered as the parsing goes on, which makes it an online algorithm.\nThis means that Saxophone has a low memory footprint, can easily parse large documents, and can parse documents as they come from a stream.\n\nThe parser does not keep track of the document state while parsing and, in particular, does not check whether the document is well-formed or valid, making it super-fast (see the [benchmark](#Benchmark) below).\n\nThis library is best suited when you need to extract simple data out of an XML document that you know is well-formed. The parser will not report precise errors in case of syntax problems. An example would be reading data from an API endpoint.\n\n## Installation\n\nThis library works both on Node.JS and recent browsers.\nTo install with `npm`:\n\n```sh\n$ npm install --save saxophone\n```\n\n## Benchmark\n\nThis benchmark compares the performance of four of the most popular SAX parsers against Saxophone’s performance while parsing a 21 KB document. Below are the results when run on a Intel® Core™ i7-7500U processor (2.70GHz, 2 physical cores with 2 logical cores each).\n\nLibrary            | Version | Operations per second (higher is better)\n-------------------|--------:|----------------------------------------:\n**Saxophone**      |   0.8.0 |                         **5,608 ±1.97%**\n**EasySax**        |   0.3.2 |                         **8,192 ±2.33%**\nnode-expat         |   2.4.0 |                               939 ±0.89%\nlibxmljs.SaxParser | 0.19.10 |                               767 ±0.79%\nsax-js             |   1.2.4 |                               771 ±0.82%\n\n```sh\n$ git clone https://github.com/matteodelabre/saxophone.git\n$ cd saxophone\n$ npm install\n$ npm install --no-save easysax node-expat libxmljs sax\n$ npm run benchmark\n```\n\n## Tests and coverage\n\nTo run tests and check coverage, use the following commands:\n\n```sh\n$ git clone https://github.com/matteodelabre/saxophone.git\n$ cd saxophone\n$ npm install\n$ npm test\n$ npm run coverage\n```\n\n## Examples\n\n### Simple example\n\n```js\nconst Saxophone = require('saxophone');\nconst parser = new Saxophone();\n\n// Called whenever an opening tag is found in the document,\n// such as \u003cexample id=\"1\" /\u003e - see below for a list of events\nparser.on('tagopen', tag =\u003e {\n    console.log(\n        `Open tag \"${tag.name}\" with attributes: ${JSON.stringify(Saxophone.parseAttrs(tag.attrs))}.`\n    );\n});\n\n// Called when we are done parsing the document\nparser.on('finish', () =\u003e {\n    console.log('Parsing finished.');\n});\n\n// Triggers parsing - remember to set up listeners before\n// calling this method\nparser.parse('\u003croot\u003e\u003cexample id=\"1\" /\u003e\u003cexample id=\"2\" /\u003e\u003c/root\u003e');\n```\n\nOutput:\n\n```sh\nOpen tag \"root\" with attributes: {}.\nOpen tag \"example\" with attributes: {\"id\":\"1\"}.\nOpen tag \"example\" with attributes: {\"id\":\"2\"}.\nParsing finished.\n```\n\n### Streaming example\n\nSame example as above but with `Stream`s.\n\n```js\nconst Saxophone = require('saxophone');\nconst parser = new Saxophone();\n\n// Called whenever an opening tag is found in the document,\n// such as \u003cexample id=\"1\" /\u003e - see below for a list of events\nparser.on('tagopen', tag =\u003e {\n    console.log(\n        `Open tag \"${tag.name}\" with attributes: ${JSON.stringify(Saxophone.parseAttrs(tag.attrs))}.`\n    );\n});\n\n// Called when we are done parsing the document\nparser.on('finish', () =\u003e {\n    console.log('Parsing finished.');\n});\n\n// stdin is '\u003croot\u003e\u003cexample id=\"1\" /\u003e\u003cexample id=\"2\" /\u003e\u003c/root\u003e'\nprocess.stdin.setEncoding('utf8');\nprocess.stdin.pipe(parser);\n```\n\nOutput:\n\n```sh\nOpen tag \"root\" with attributes: {}.\nOpen tag \"example\" with attributes: {\"id\":\"1\"}.\nOpen tag \"example\" with attributes: {\"id\":\"2\"}.\nParsing finished.\n```\n\n## Documentation\n\n### `new Saxophone()`\n\nCreates a new Saxophone parser instance. This object is a writable stream that will emit an event for each tag or node parsed from the incoming data. See [the list of events below.](#events)\n\n### `Saxophone#on()`, `Saxophone#removeListener()`, ...\n\nManage event listeners just like with any other event emitter. Saxophone inherits from all `EventEmitter` methods. See the relevant [Node documentation.](https://nodejs.org/api/events.html)\n\n### `Saxophone#parse(xml)`\n\nTrigger the parsing of a whole document. This method will fire registered listeners, so you need to set them up before calling it. This is equivalent to writing `xml` to the stream and closing it.\n\n**Note:** the parser cannot be reused afterwards, you need to create a new instance.\n\nArguments:\n\n* `xml` is an UTF-8 string or a `Buffer` containing the XML that you want to parse.\n\nThis method returns the parser instance.\n\n### `Saxophone#write(xml)`\n\nParse a chunk of a XML document. This method will fire registered listeners so you need to set them up before calling it.\n\n**Note:** an event is emitted for a tag or a node only when it has been closed. If the chunk starts a tag but does not close it, the tag will not be reported until it is closed by a later chunk.\n\nArguments:\n\n* `xml` is an UTF-8 string or a `Buffer` containing a chunk of the XML that you want to parse.\n\n### `Saxophone#end(xml = \"\")`\n\nWrite an optional last chunk then close the stream. After the stream is closed, a final `finish` event is emitted and no other event will be emitted afterwards. No more data may be written into the stream after closing it.\n\nArguments:\n\n* `xml` is an UTF-8 string or a `Buffer` containing a chunk of the XML that you want to parse.\n\n### `Saxophone.parseAttrs(attrs)`\n\nParse a string list of XML attributes, as produced by the main parsing algorithm. This is not done automatically because it may not be required for every tag and it takes some time.\n\nThe result is an object associating the attribute names (as object keys) to their attribute values (as object values).\n\n### `Saxophone.parseEntities(text)`\n\nParses a piece of XML text and expands all XML entities inside it to the character they represent. Just like attributes, this is not parsed automatically because it takes some time.\n\nThis ignores invalid entities, including unrecognized ones, leaving them as-is.\n\n### Events\n\n#### `tagopen`\n\nEmitted when an opening tag is parsed. This encompasses both regular tags and self-closing tags. An object is passed with the following data:\n\n* `name`: name of the parsed tag.\n* `attrs`: attributes of the tag (as a string). To parse this string, use `Saxophone.parseAttrs`.\n* `isSelfClosing`: true if the tag is self-closing.\n\n#### `tagclose`\n\nEmitted when a closing tag is parsed. An object containing the `name` of the tag is passed.\n\n#### `processinginstruction`\n\nEmitted when a processing instruction (such as `\u003c? contents ?\u003e`) is parsed. An object with the `contents` of the processing instruction is passed.\n\n#### `text`\n\nEmitted when a text node between two tags is parsed. An object with the `contents` of the text node is passed. You might need to expand XML entities inside the contents of the text node, using `Saxophone.parseEntities`.\n\n#### `cdata`\n\nEmitted when a CDATA section (such as `\u003c![CDATA[ contents ]]\u003e`) is parsed. An object with the `contents` of the CDATA section is passed.\n\n#### `comment`\n\nEmitted when a comment (such as `\u003c!-- contents --\u003e`) is parsed. An object with the `contents` of the comment is passed.\n\n#### `error`\n\nEmitted when a parsing error is encountered while reading the XML stream such that the rest of the XML cannot be correctly interpreted:\n\n* when a DOCTYPE node is found (not supported yet);\n* when a comment node contains the `--` sequence;\n* when opening and closing tags are mismatched or missing;\n* when a tag name starts with white space;\n* when nodes are unclosed (missing their final `\u003e`).\n\nBecause this library's goal is not to provide accurate error reports, the passed error will only contain a short description of the syntax error (without giving the position, for example).\n\n#### `finish`\n\nEmitted after all events, without arguments.\n\n## Contributions\n\nThis is free and open source software. All contributions (even small ones) are welcome. [Check out the contribution guide to get started!](CONTRIBUTING.md)\n\nThanks to:\n\n* [Norman Rzepka](https://github.com/normanrz) for implementing the streaming API and the check for opening and closing tags mismatch.\n* [winston01](https://github.com/winston01) for spotting and fixing an error in the parser when a tag sits astride two chunks.\n* [MattGson](https://github.com/MattGson) for spotting another similar error.\n\n## License\n\nReleased under the MIT license. [See the full license text.](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatteodelabre%2Fsaxophone","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmatteodelabre%2Fsaxophone","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatteodelabre%2Fsaxophone/lists"}