https://github.com/chaewonkong/surfio
Simple site meta scraper uses headless browser
https://github.com/chaewonkong/surfio
go goquery headlessbrowser scraper surf
Last synced: about 1 month ago
JSON representation
Simple site meta scraper uses headless browser
- Host: GitHub
- URL: https://github.com/chaewonkong/surfio
- Owner: chaewonkong
- Created: 2023-07-24T04:07:06.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-01T04:42:45.000Z (about 1 year ago)
- Last Synced: 2024-06-19T13:35:52.887Z (10 months ago)
- Topics: go, goquery, headlessbrowser, scraper, surf
- Language: Go
- Homepage:
- Size: 13.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🌊 surfio
Simple image scraper in Go with surf and goquery.
Scrapes thumbnail images from given site url, with dynamic crawling(using headless browser).## More
- [surf](https://github.com/headzoo/surf)
- [goquery](https://github.com/PuerkitoBio/goquery)## Usage
```go
package mainimport "github.com/chaewonkong/surfio"
func main() {
url := [YOUR_URL]// when
su := surfio.New(url)
imgs, err := su.GetThumbnailImages()if err != nil {
// handle error
}// if not error, use image
imgUrl := imgs[0].Url
}
```imgs contains Image struct. There could be multiple images.
Primarily the scraper finds og:image for the first; if not found it will search every images available for that url and return them.```go
type Image struct {
Url string `json:"url"`
SrcType string `json:"src_type"`
}
```## Todo
- [ ] Crawling Site meta (title, description, keyword...)