An open API service indexing awesome lists of open source software.

https://github.com/aarzilli/sandblast

Library to extract text from HTML files
https://github.com/aarzilli/sandblast

Last synced: 3 months ago
JSON representation

Library to extract text from HTML files

Awesome Lists containing this project

README

          

Library that uses Readability-like heuristics to extract text from an HTML document.

Example:
```go
import "golang.org/x/net/html"

node, err := html.Parse(bytes.NewReader(raw_html))
if err != nil {
log.Fatal("Parsing error: ", err)
}
title, text := sandblast.Extract(node)
fmt.Printf("Title: %s\n%s", title, text)

```
See also `example/extract.go`, a command line utility to extract text from a URL.