https://github.com/j-mendez/clean-html-js
convert a url or html into a readability object
https://github.com/j-mendez/clean-html-js
Last synced: 7 months ago
JSON representation
convert a url or html into a readability object
- Host: GitHub
- URL: https://github.com/j-mendez/clean-html-js
- Owner: j-mendez
- License: mit
- Created: 2020-02-08T02:17:52.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-05-07T10:36:05.000Z (over 1 year ago)
- Last Synced: 2025-02-27T02:11:17.276Z (7 months ago)
- Language: TypeScript
- Homepage:
- Size: 1.11 MB
- Stars: 13
- Watchers: 4
- Forks: 2
- Open Issues: 18
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## clean-html-js
[](https://circleci.com/gh/j-mendez/clean-html-js/tree/master)
clean html content for reading. simply pass in your content as html and get a readability object
## Installation Instructions
```bash
$ yarn add clean-html-js
```## Example

```ts
import cleanHtml from "clean-html-js";const url = "https://www.a11ywatch.com";
async function grabReaderData() {
const source = await fetch(url);
const html = await source.text();
return await cleanHtml(html, url);
}async function grabReaderDataSimple() {
return await cleanHtml("", url);
}grabReaderData().then((data) => {
console.log(data);
});// or just the url
grabReaderDataSimple().then((data) => {
console.log(data);
});
```## Available Params
| param | default | type | description |
| --------- | ------- | ------ | -------------------------------------------------------------------- |
| html | "" | string | Required: html string to parse |
| sourceUrl | "" | string | Optional: url of the html source to prevent fetching extra resources |
| config | {} | Config | Optional: config object |If html is not provided and sourceUrl is found an attempt to fetch the html is done.
## Config
merges with [config](src/clean-html.ts)
| prop | default | type | description |
| ----------- | ------- | ---------------- | ------------------------------------------------- |
| allowedTags | null | array of strings | html elements allowed note:(svgs must be inlined) |
| nonTextTags | null | array of strings | html elements that should not be treated as text |## Testing
to test custom pages pass in your params seperated by commas into the jest test example `yarn jest '-params=mozilla,https://www.mozilla.com'` or `yarn jest '-params=a11ywatch,https://www.a11ywatch.com'`. First param is the html file being pulled from the `examples` folder and the second is an optional uri for the resources.
1. `npm test`