Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/philipjkim/goreadability
Webpage summary extractor using Facebook Open Graph and arc90's readability
https://github.com/philipjkim/goreadability
opengraph readability scraper
Last synced: 2 months ago
JSON representation
Webpage summary extractor using Facebook Open Graph and arc90's readability
- Host: GitHub
- URL: https://github.com/philipjkim/goreadability
- Owner: philipjkim
- License: mit
- Created: 2016-04-20T01:40:14.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2019-04-22T09:46:39.000Z (almost 6 years ago)
- Last Synced: 2024-06-18T21:46:42.701Z (7 months ago)
- Topics: opengraph, readability, scraper
- Language: Go
- Homepage:
- Size: 1000 KB
- Stars: 69
- Watchers: 7
- Forks: 8
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-go-extra - goreadability - 04-20T01:40:14Z|2019-04-22T09:46:39Z| (Utilities / Fail injection)
README
# goreadability
[![GoDoc](https://godoc.org/github.com/philipjkim/goreadability?status.svg)](https://godoc.org/github.com/philipjkim/goreadability) [![Go Report Card](https://goreportcard.com/badge/github.com/philipjkim/goreadability)](https://goreportcard.com/report/github.com/philipjkim/goreadability) [![Code Coverage](http://gocover.io/_badge/github.com/philipjkim/goreadability)](https://gocover.io/github.com/philipjkim/goreadability) [![Build Status](https://travis-ci.org/philipjkim/goreadability.svg)](https://travis-ci.org/philipjkim/goreadability)
goreadability is a tool for extracting the primary readable content of a webpage. It is a Go port of arc90's readability project, based on [ruby-readability](https://github.com/cantino/ruby-readability).
From v2.0 goreadability uses opengraph tag values if exists. You can disable opengraph lookup and follow the traditional readability rules by setting `Option.LookupOpenGraphTags` to `false`.
## Install
```
go get github.com/philipjkim/goreadability
```## Example
```go
// URL to extract contents (title, description, images, ...)
url := "https://en.wikipedia.org/wiki/Lego"// Default option
opt := readability.NewOption()// You can modify some option values if needed.
opt.ImageRequestTimeout = 3000 // mscontent, err := readability.Extract(url, opt)
if err != nil {
log.Fatal(err)
}log.Println(content.Title)
log.Println(content.Description)
log.Println(content.Images)
```## Testing
```sh
go test# or if you want to see verbose logs:
DEBUG=true go test -v
```## Command Line Tool
TODO
## Related Projects
- [ruby-readability](https://github.com/cantino/ruby-readability) is the base of this project.
- [fastimage](https://github.com/rubenfonseca/fastimage) finds the type and/or size of a remote image given its uri, by fetching as little as needed.## Potential Issues
TODO
## License
[MIT](LICENSE)