An open API service indexing awesome lists of open source software.

https://github.com/millerlogic/htmlstrip

Strips HTML from the input, outputs plain text, streamed in realtime without preloading the whole document
https://github.com/millerlogic/htmlstrip

html html-parser html-parser-library html-strip parser

Last synced: 22 days ago
JSON representation

Strips HTML from the input, outputs plain text, streamed in realtime without preloading the whole document

Awesome Lists containing this project

README

          

# htmlstrip
Strips HTML from the input, outputs plain text. It is streamed in realtime without preloading the whole document.

* Easy to use Writer interface: \
```io.Copy(&htmlstrip.Writer{W: os.Stdout}, os.Stdin)```

* All it does is strip HTML into plain text.

* Should never use excessive memory as it does not buffer the whole document.

* Script, style and head tags are removed entirely, as they are not part of the page's text.

* The provided command strips HTML from standard input or specified files, writes plain text to standard output. \
```go install github.com/millerlogic/htmlstrip/cmd/htmlstrip```

* Could be used as an extremely basic, non-interactive text browser: \
```curl -s -S https://en.wikipedia.org/wiki/Chinchilla | htmlstrip | less```