Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/v-braun/hero-scrape

Find the hero (main) image of an URL
https://github.com/v-braun/hero-scrape

crawler fastimage hero hero-image opengraph webscraping

Last synced: 22 days ago
JSON representation

Find the hero (main) image of an URL

Host: GitHub
URL: https://github.com/v-braun/hero-scrape
Owner: v-braun
License: mit
Created: 2018-12-01T19:52:59.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2018-12-24T22:53:10.000Z (about 6 years ago)
Last Synced: 2024-11-15T09:21:15.862Z (3 months ago)
Topics: crawler, fastimage, hero, hero-image, opengraph, webscraping
Language: Go
Homepage:
Size: 80.1 KB
Stars: 3
Watchers: 3
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # hero-scrape

> Find the hero (main) image of an URL 

[![Build Status](https://travis-ci.org/v-braun/hero-scrape.svg?branch=master)](https://travis-ci.org/v-braun/hero-scrape)

[![codecov](https://codecov.io/gh/v-braun/hero-scrape/branch/master/graph/badge.svg)](https://codecov.io/gh/v-braun/hero-scrape)

By [v-braun - viktor-braun.de](https://viktor-braun.de).







## Demo

See a demo on https://hero-scrape.viktor-braun.de

## Description

hero-scrape extracts the main image of a webpage.

It use different strategies to find the main images (OpenGraph HTML Tags and heuristic search).

You can use the existing strategies or implement your own.

To find the "biggest" image it is necessary to download it. [fastimage](https://github.com/rubenfonseca/fastimage/) is the perfect choice for that job.

## Installation

```bash

go get github.com/v-braun/hero-scrape

```

## Usage

**With pre configured strategies**

```go

pageUrl, _ := url.Parse("https://github.com/v-braun/hero-scrape")

res, _ := http.Get(pageUrl.String())

defer res.Body.Close()

result, _ := heroscrape.Scrape(pageUrl, res.Body)

fmt.Println(result.Image)

```

**With cusom strategies**

```go

pageUrl, _ := url.Parse("https://github.com/v-braun/hero-scrape")

res, _ := http.Get(pageUrl.String())

defer res.Body.Close()

result, _ := heroscrape.ScrapeWithStrategy(pageUrl, res.Body, , NewOgStrategy(), NewHeuristicStrategy(), YourOwnStrategy())

fmt.Println(result.Image)

```

## Related Projects

- [hero-scrape](https://github.com/v-braun/hero-scrape-web) Demo for this lib

- [fastimage](https://github.com/rubenfonseca/fastimage/) Finds the type and/or size of a remote image given its uri, by fetching as little as needed.

- [goquery](https://github.com/PuerkitoBio/goquery) A little like that j-thing, only in Go.

## Known Issues

If you discover any bugs, feel free to create an issue on GitHub fork and

send me a pull request.

[Issues List](https://github.com/v-braun/hero-scrape/issues).

## Authors

![image](https://avatars3.githubusercontent.com/u/4738210?v=3&s=50)  

[v-braun](https://github.com/v-braun/)

## Contributing

1. Fork it

2. Create your feature branch (`git checkout -b my-new-feature`)

3. Commit your changes (`git commit -am 'Add some feature'`)

4. Push to the branch (`git push origin my-new-feature`)

5. Create new Pull Request

## License

See [LICENSE](https://github.com/v-braun/hero-scrape/blob/master/LICENSE).