https://github.com/telanflow/scrago
A micro crawler framework. achieved by GOLANG.
https://github.com/telanflow/scrago
crawler go micro-framework spider
Last synced: 20 days ago
JSON representation
A micro crawler framework. achieved by GOLANG.
- Host: GitHub
- URL: https://github.com/telanflow/scrago
- Owner: telanflow
- License: apache-2.0
- Created: 2018-03-09T02:17:09.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-11-04T15:09:02.000Z (over 4 years ago)
- Last Synced: 2025-03-12T21:34:58.604Z (4 months ago)
- Topics: crawler, go, micro-framework, spider
- Language: Go
- Homepage:
- Size: 191 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Teler
A micro crawler framework. achieved by GOLANG.
[](https://travis-ci.org/telanflow/scrago) [](https://github.com/telanflow/scrago/stargazers) [](https://github.com/telanflow/scrago)
[](https://996.icu)
[-blue.svg)](https://github.com/996icu/996.ICU/blob/master/LICENSE)## Quick Start
#### Download and install
go get github.com/telanflow/scrago
#### Create file `my_spider.go`
```go
package mainimport (
"net/http"
"net/http/cookiejar""github.com/telanflow/scrago"
"github.com/telanflow/scrago/pages"
"github.com/telanflow/scrago/downloader"
)type MySpider struct{
jar http.CookieJar
}// Init
func (m *MySpider) Init(ctx *teler.Context) {
// Set the persistent cookie.
m.jar, _ = cookiejar.New(nil)
ctx.GetDownloader().UseOptions(downloader.WithCookieJar(m.jar))// Add Target Url
//ctx.AddUrl("https://www.baidu.com")
}// Page Process
func (m *MySpider) Process(ctx *teler.Context, page *pages.Page) {}
// Pipeline Output
func (m *MySpider) Output(items *pages.PageItem) {}
func main() {
// Start Spider
scrago.New(&MySpider{}).AddUrl("https://www.baidu.com").Run()
}```
#### Build and run
go build my_spider.go
./my_spider
## Documentation
[中文文档](https://github.com/telanflow/scrago/wiki/%E6%A1%86%E6%9E%B6%E7%AE%80%E4%BB%8B)
## Licenseteler licensed under the Apache Licence, Version 2.0
(http://www.apache.org/licenses/LICENSE-2.0.html).