https://github.com/chaewonkong/surfio

Simple site meta scraper uses headless browser
https://github.com/chaewonkong/surfio

go goquery headlessbrowser scraper surf

Last synced: about 1 month ago
JSON representation

Simple site meta scraper uses headless browser

Host: GitHub
URL: https://github.com/chaewonkong/surfio
Owner: chaewonkong
Created: 2023-07-24T04:07:06.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-02-01T04:42:45.000Z (about 1 year ago)
Last Synced: 2024-06-19T13:35:52.887Z (10 months ago)
Topics: go, goquery, headlessbrowser, scraper, surf
Language: Go
Homepage:
Size: 13.7 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # 🌊 surfio

Simple image scraper in Go with surf and goquery.

Scrapes thumbnail images from given site url, with dynamic crawling(using headless browser).

## More

- [surf](https://github.com/headzoo/surf)

- [goquery](https://github.com/PuerkitoBio/goquery)

## Usage

```go

package main

import "github.com/chaewonkong/surfio"

func main() {

	url := [YOUR_URL]

	// when

	su := surfio.New(url)

	imgs, err := su.GetThumbnailImages()

	if err != nil {

		// handle error

	}

	// if not error, use image

	imgUrl := imgs[0].Url

}

```

imgs contains Image struct. There could be multiple images.

Primarily the scraper finds og:image for the first; if not found it will search every images available for that url and return them.

```go

type Image struct {

	Url     string `json:"url"`

	SrcType string `json:"src_type"`

}

```

## Todo

- [ ] Crawling Site meta (title, description, keyword...)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chaewonkong/surfio

Awesome Lists containing this project

README