https://github.com/ticky/hastml
πΆπ»ββοΈ Hasty HTML, XML and SGML walker
https://github.com/ticky/hastml
but-really-just-a-walker html quote-unquote-parser sgml xhtml xml
Last synced: 5 months ago
JSON representation
πΆπ»ββοΈ Hasty HTML, XML and SGML walker
- Host: GitHub
- URL: https://github.com/ticky/hastml
- Owner: ticky
- License: mit
- Created: 2017-09-19T22:41:32.000Z (almost 9 years ago)
- Default Branch: develop
- Last Pushed: 2026-01-19T19:35:49.000Z (5 months ago)
- Last Synced: 2026-01-20T01:21:15.890Z (5 months ago)
- Topics: but-really-just-a-walker, html, quote-unquote-parser, sgml, xhtml, xml
- Language: JavaScript
- Homepage:
- Size: 150 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 11
-
Metadata Files:
- Readme: Readme.md
- License: License.md
Awesome Lists containing this project
README
# Hasty HTML, XML and SGML walker
[](https://www.npmjs.com/package/hastml)  [](https://travis-ci.org/ticky/hastml) [](https://codecov.io/gh/ticky/hastml)
πΆπ»ββοΈ A quick and _extremely_ permissive way to process XML-like inputs.
## Installation
```shell
yarn add hastml
```
\~_or_\~
```shell
npm install --save hastml
```
## API
### `walk` (default)
The `walk` function accepts two arguments. The text, and a callback.
```javascript
import walk from 'hastml';
const myHtml = 'hello!';
const callback = (match, tagFragment, offset, string, thisTag, stack) => {
console.log(tagFragment, offset);
};
const output = walk(
myHtml,
callback
);
// Console output:
// => '<', 0
// => '>', 6
// => '', 13
// => '>', 20
```
#### Callback function
The callback function is passed six arguments denoting the state of the walker.
- `match`: The text which will be replaced if the callback returns a string value.
- `tagFragment`: The part of the HTML tag we're currently stepping over.
One of `<`, `>`, ``, `/>`, ``.
- `offset`: Numeric index of the current match.
- `string`: The full HTML string which is being walked over.
- `thisTag`: An object representing the state of the current tag.
- `stack`: Array representing the path to the current tag through the document's structure.
If the callback function returns a string value, that string will replace the value of `match` in the output from `walk`. Otherwise, no change is made.
#### Tag objects
Tag objects contain data about tags being processed. What information they contain depends on where the walker is in relation to the tag it represents.
If the walker is at the start of the tag, for instance, it will only contain the `tagName`, `openIndex` and an `"open"` `state`. If the walker has reached the closing tag, it will contain more indexes, and have changed state.
The tag objects (passed to the callback function as `thisTag`, and within the `stack`) can contain the following information:
- `tagName`: The tag name, i.e. `` would give `html`.
- `state`: The current state of the tag.
- `openIndex`: The index at which the tag was opened.
- `contentIndex`: The index immediately before content of the tag begins. Not included for void elements.
- `closingIndex`: The index where the closing part of the tag begins. Not included for void elements.
- `closeIndex`: The index at which the tag was closed.
The indexes can be visualised as follows:
```
openIndex contentIndex closeIndex
β β β
This span
β
closingIndex
β β
openIndex closeIndex
```
#### Tag object `state`
The `state` property of tag objects indicate where the walker is in relation to the tag. The following states are possible:
- `"open"`: The opening part of the tag has been found.
- `"content"`: The tag's content has been walked to.
- `"closing"`: The closing part of the tag has been found.
These states can be visualised as follows:
```
"open" "content"
β β
This span
β
"closing"
β
"open"
```
Note that there is not currently a "closed" state indicating the tag has no further content or closing tag. This may be added in the future. A tag object with a populated `closeIndex` and `state` of either `"open"` or `"content"` indicates this state.