https://github.com/fern4lvarez/go-metainspector
Simple web scraping for Go.
https://github.com/fern4lvarez/go-metainspector
Last synced: 6 months ago
JSON representation
Simple web scraping for Go.
- Host: GitHub
- URL: https://github.com/fern4lvarez/go-metainspector
- Owner: fern4lvarez
- License: mit
- Created: 2013-04-14T03:43:54.000Z (about 12 years ago)
- Default Branch: master
- Last Pushed: 2017-04-18T05:28:29.000Z (about 8 years ago)
- Last Synced: 2024-06-20T10:22:49.706Z (11 months ago)
- Language: Go
- Homepage: http://godoc.org/github.com/fern4lvarez/go-metainspector/metainspector
- Size: 196 KB
- Stars: 66
- Watchers: 5
- Forks: 8
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# go-metainspector [](https://travis-ci.org/fern4lvarez/go-metainspector)[](http://godoc.org/github.com/fern4lvarez/go-metainspector/metainspector)
Simple web metadata scraping in Go.
**go-metainspector** is a web scraper package that provides access
to basic info and meta tags of a given URL.
It is inspired by the [metainspector gem](https://github.com/jaimeiniesta/metainspector) by [Jaime Iniesta](https://twitter.com/jaimeiniesta) and completely written in Go.## Install
* Step 1: Get the `metainspector` package
```
go get -u github.com/fern4lvarez/go-metainspector/metainspector
```* Step 2 (Optional): Run tests
```
$ go test -v -cover ./...
```##Usage
### API
```
package mainimport (
"fmt""github.com/fern4lvarez/go-metainspector/metainspector"
)func main() {
url := "http://www.cloudcontrol.com/pricing"
MI, err := metainspector.New(url)
if err != nil {
fmt.Printf("Error: %v", err)
} else {
fmt.Printf("\nURL: %s\n", MI.Url())
fmt.Printf("Scheme: %s\n", MI.Scheme())
fmt.Printf("Host: %s\n", MI.Host())
fmt.Printf("Root: %s\n", MI.RootURL())
fmt.Printf("Title: %s\n", MI.Title())
fmt.Printf("Language: %s\n", MI.Language())
fmt.Printf("Author: %s\n", MI.Author())
fmt.Printf("Description: %s\n", MI.Description())
fmt.Printf("Charset: %s\n", MI.Charset())
fmt.Printf("Feed URL: %s\n", MI.Feed())
fmt.Printf("Links: %v\n", MI.Links())
fmt.Printf("Images: %v\n", MI.Images())
fmt.Printf("Keywords: %v\n", MI.Keywords())
fmt.Printf("Compatibility: %v\n", MI.Compatibility())
}
```### CLI
You can use the go-metainspector as a command line tool.
*Note:* Once you have installed the package, you might want to set an alias for the CLI: add `alias metainspect=go-metainspector` or similar in your appropiated dotfile.Simple to use!
```
$ go-metainspector -help
Usage of go-metainspector:
-u="www.example.com": URL to metainspect.
-all=false: Show full results.$ go-metainspector -u www.cloudcontrol.com
www.cloudcontrol.com
----> Title: cloudControl » Cloud App Platform » supercharging development
----> Author: cloudControl GmbH
----> Description: Cloud hosting secure, easy and fair: Highly available and scalable cloud hosting with no administraton hassle and pay as you go billing
----> Charset: utf-8
----> Language: en
----> Feed URL: https://www.cloudcontrol.com/blog/feed
----> Keywords: cloudcontrol cloud control cloud hosting cloud computing cloud hosting web-hosting platform as a service paas
----> Links: http://www.cloudcontrol.com/ http://www.cloudcontrol.com/pricing http://www.cloudcontrol.com/dev-center http://www.cloudcontrol.com/add-ons https://console.cloudcontrolled.com http://www.cloudcontrol.com/for-isvs http://www.cloudcontrol.com/for-developers http://www.cloudcontrol.com/use-cases/vamos http://www.cloudcontrol.com/use-cases/searchbox http://www.cloudcontrol.com/use-cases/snipclip ...
----> Images: http://www.cloudcontrol.com/assets/spinner-9bde1d21899a52974160da652c0a6622.gif https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/006/thumb/vamos-logo-rgb-vertical.png?1360313785 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/008/thumb/logo.png?1362736875 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/002/thumb/snipclip_logo_2011_rgb.png?1359481602 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/004/thumb/ormigo.png?1359481203 https://s3-eu-west-1.amazonaws.com/cctrl-www-assets/add-ons/blitz.png https://s3-eu-west-1.amazonaws.com/cctrl-www-assets/add-ons/sendgrid.png https://s3-eu-west-1.amazonaws.com/cctrl-www-production/solution_providers/logos/000/000/013/small/emind_klein.png?1364402125 ...
````-u` is optional though:
```
$ go-metainspector www.cloudcontrol.com
----> Title: cloudControl » Cloud App Platform » supercharging development
----> Author: cloudControl GmbH
...
```Sometimes one site contains so many links and images that makes it ugly to print them all. By default it's cut on 10 links and images. Use `-all` to print them all:
```
$ go-metainspector -u www.cloudcontrol.com -all
----> Title: cloudControl » Cloud App Platform » supercharging development
----> Author: cloudControl GmbH
----> Description: Cloud hosting secure, easy and fair: Highly available and scalable cloud hosting with no administraton hassle and pay as you go billing
----> Charset: utf-8
----> Language: en
----> Feed URL: https://www.cloudcontrol.com/blog/feed
----> Keywords: cloudcontrol cloud control cloud hosting cloud computing cloud hosting web-hosting platform as a service paas
----> Links: http://www.cloudcontrol.com/ http://www.cloudcontrol.com/pricing http://www.cloudcontrol.com/dev-center http://www.cloudcontrol.com/add-ons https://console.cloudcontrolled.com http://www.cloudcontrol.com/for-isvs http://www.cloudcontrol.com/for-developers http://www.cloudcontrol.com/add-ons/sendgrid http://www.cloudcontrol.com/add-ons/newrelic http://www.cloudcontrol.com/use-cases/ormigo http://www.cloudcontrol.com/use-cases/vamos http://www.cloudcontrol.com/use-cases/afp http://www.cloudcontrol.com/use-cases/adcloud http://www.cloudcontrol.com/use-cases/kinderfee http://www.cloudcontrol.com/use-cases/snipclip http://www.cloudcontrol.com/dev-center/Quickstart http://www.cloudcontrol.com/dev-center/Platform Documentation http://status.cloudcontrol.com http://www.cloudcontrol.com/dev-center/support https://console.cloudcontrolled.com http://www.cloudcontrol.com/team http://www.cloudcontrol.com/jobs http://www.cloudcontrol.com/blog http://www.cloudcontrol.com/contact http://www.cloudcontrol.com/add-on-provider-program http://www.cloudcontrol.com/solution-providers http://www.cloudcontrol.com/tos http://www.cloudcontrol.com/privacy-policy http://www.cloudcontrol.com/imprint
----> Images: http://www.cloudcontrol.com/assets/spinner-9bde1d21899a52974160da652c0a6622.gif https://s3-eu-west-1.amazonaws.com/cctrl-www-assets/add-ons/sendgrid.png https://s3-eu-west-1.amazonaws.com/cctrl-www-assets/add-ons/newrelic.png https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/004/thumb/ormigo.png?1359481203 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/006/thumb/vamos-logo-rgb-vertical.png?1360313785 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/003/thumb/AFP-logo.png?1359481038 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/001/thumb/adcloud.png?1359563276 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/005/thumb/Kinderfee_Logo_Claim.png?1359481378 https://s3-eu-west-1.amazonaws.com/cctrl-www-production/use_cases/logos/000/000/002/thumb/snipclip_logo_2011_rgb.png?1359481602
```### Web
You can use go-metainspector via web. Check out and run the [`go-metainspector-site`](https://github.com/fern4lvarez/go-metainspector-site) or just try it out: http://gometainspector.cloudcontrolled.com##Contribute!
You all are welcome to take a seat and make a contribution to this repo: reviews, issues, feature suggestions, possible code or functionality enhancements... Everything is appreciated!##TODO (aka Nice To Have)
* Extend documentation
* Write a CHANGELOG
* Mock http requests to speed up unit tests
* Internal links, External links
* Map() method wrapping all data
* Set Timeout optionally (now is 20 secs)
* Your suggestion##License
go-metainspector is MIT licensed, see [here](https://github.com/fern4lvarez/go-metainspector/blob/master/LICENSE)