https://github.com/millerlogic/htmlstrip
Strips HTML from the input, outputs plain text, streamed in realtime without preloading the whole document
https://github.com/millerlogic/htmlstrip
html html-parser html-parser-library html-strip parser
Last synced: 22 days ago
JSON representation
Strips HTML from the input, outputs plain text, streamed in realtime without preloading the whole document
- Host: GitHub
- URL: https://github.com/millerlogic/htmlstrip
- Owner: millerlogic
- License: mit
- Created: 2018-08-07T18:53:48.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-02-09T22:40:23.000Z (about 7 years ago)
- Last Synced: 2025-12-01T04:21:08.555Z (2 months ago)
- Topics: html, html-parser, html-parser-library, html-strip, parser
- Language: Go
- Homepage:
- Size: 6.84 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# htmlstrip
Strips HTML from the input, outputs plain text. It is streamed in realtime without preloading the whole document.
* Easy to use Writer interface: \
```io.Copy(&htmlstrip.Writer{W: os.Stdout}, os.Stdin)```
* All it does is strip HTML into plain text.
* Should never use excessive memory as it does not buffer the whole document.
* Script, style and head tags are removed entirely, as they are not part of the page's text.
* The provided command strips HTML from standard input or specified files, writes plain text to standard output. \
```go install github.com/millerlogic/htmlstrip/cmd/htmlstrip```
* Could be used as an extremely basic, non-interactive text browser: \
```curl -s -S https://en.wikipedia.org/wiki/Chinchilla | htmlstrip | less```