https://github.com/koshqua/scrapio

Simple and easy-to-use scraper and crawler in Go.
https://github.com/koshqua/scrapio

crawler framework go golang json scraper spider

Last synced: 6 months ago
JSON representation

Simple and easy-to-use scraper and crawler in Go.

Host: GitHub
URL: https://github.com/koshqua/scrapio
Owner: Koshqua
License: apache-2.0
Created: 2020-02-14T13:10:27.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2020-05-04T23:55:52.000Z (about 6 years ago)
Last Synced: 2025-08-03T00:02:18.139Z (12 months ago)
Topics: crawler, framework, go, golang, json, scraper, spider
Language: Go
Homepage:
Size: 9.42 MB
Stars: 13
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

[![Codacy Badge](https://api.codacy.com/project/badge/Grade/66c67645f9fd404bbf47a3f443ecba5c)](https://app.codacy.com/manual/Koshqua/scrapio?utm_source=github.com&utm_medium=referral&utm_content=Koshqua/scrapio&utm_campaign=Badge_Grade_Dashboard)
[![GoDoc](https://godoc.org/github.com/koshqua/scrapio?status.svg)](https://pkg.go.dev/github.com/koshqua/scrapio)
[![Go Report Card](https://goreportcard.com/badge/github.com/Koshqua/scrapio)](https://goreportcard.com/report/github.com/Koshqua/scrapio)

## Scrapio

**Scrapio** - is a lightweight and user-friendy web crawling and scraping library.
The main goal of creating the project was to make scraping big amounts of similar data from web easy and user-friendly. It might be useful for wide range of applications, like data mining, data processing and archiving.
After some time, I am going to make it a standalone service, which will work as an API.

### Installation

### Features
At the moment works as a library which can be used to crawl and scrap data from web.
What it can do:
- Crawl all pages on host, return all the links.
- Scrap text, image urls and links from Crawl Result pages.
- It leaves the choice of data output(csv,json, etc) up to you.
- It's free and quite powerful.
- Written in go, concurrent, depending on Network Speed can crawl and scrap up to 2k pages/minute.

### Installation
```
go get github.com/koshqua/scrapio
```

### Usage
**Crawler** is easy to use. You just need to specify a starting URL and it will crawl all the URL on the host.

```go
//init a new crawler, give it a start url, it's not necessary should be basic URL
cr := &crawler.Crawler{StartURL: "https://gulfnews.com/"}
//Start crawling func.
//After some time im going to implement more configs for this func, like max results, etc.
cr.Crawl()
//Do something with result, it's up to you
```
**Scraper** uses data structure given by crawler.
Before initiating a scraper, you need to create a few selectors, to assign them to scraper.
Selectors are the simple css-like selectors.
```go
//create some Selectors, which you want to scrap.
h2 := scraper.NewSelector("h2", true, true, true)
img := scraper.NewSelector("img", true, true, true)
p := scraper.NewSelector("p:first-of-type", true, true, true)
//Initiate a new scrapper with given selectors
//Scraper depends on the crawler from previous code snippet.
//It gets pages and creates new structure with selectors and scrap results.
sc := scraper.InitScraper(*cr, []scraper.Selector{h2, img, p})
//And just start scraping
err := sc.Scrap()
if err != nil {
log.Fatalln(err)
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/koshqua/scrapio

Awesome Lists containing this project

README