Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/foolin/scrago
An simpe, fast, extensible crawl page framework for golang
https://github.com/foolin/scrago
crawler go scrago scrapy
Last synced: 26 days ago
JSON representation
An simpe, fast, extensible crawl page framework for golang
- Host: GitHub
- URL: https://github.com/foolin/scrago
- Owner: foolin
- Created: 2017-05-22T11:50:37.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-04-19T05:58:25.000Z (almost 7 years ago)
- Last Synced: 2024-11-09T22:35:49.639Z (3 months ago)
- Topics: crawler, go, scrago, scrapy
- Language: Go
- Homepage:
- Size: 23.4 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# scrago
Scrago is an simpe, fast, extensible crawl page framework for golang.
# Install
```
go get github.com/foolin/scrago
```# Document
[Godoc](https://godoc.org/github.com/foolin/scrago "go document")
# Exmaple
### Step 1:
```gotype ExampModel struct {
Title string `scrago:"title"`
Name string `scrago:"#main>.intro>h2::text()"`
Description string `scrago:"#main>.intro>p::html()"`
Intro string `scrago:"#main>.intro::outerHtml()"`
Keywords []string `scrago:"#main .keywords::GetMyKeywords()"`
}func (e *ExampModel) GetMyKeywords(s *goquery.Selection) ([]string, error) {
v := s.Text()
if v == ""{
return nil, fmt.Errorf("not found keywords!")
}
arr := strings.Split(v, ",")
for i := 0; i < len(arr); i++{
arr[i] = strings.TrimSpace(arr[i])
}
return arr, nil
}```
### Step 2:
```gofunc main() {
examp := ExampModel{}
s := scrago.New()
err := s.HttpGetParser("https://raw.githubusercontent.com/foolin/scrago/master/example/data/example.html", &examp)
if err != nil {
log.Fatal(err)
}else{
printjson(examp)
}
}func printjson(v interface{}) {
enc := json.NewEncoder(os.Stdout)
enc.SetEscapeHTML(false)
enc.SetIndent("", " ")
enc.Encode(v)
}```
### Step 3:
Execute result:```json
{
"Title": "Scrago exmaples",
"Name": "Scrago framework",
"Description": "An open source and collaborative framework for extracting the data you need from websites.\n In a fast, simple, yet extensible way.",
"Intro": "\n",Scrago framework
\nAn open source and collaborative framework for extracting the data you need from websites.\n In a fast, simple, yet extensible way.
\nScrago, Scrap, Spider, Crawl, GoLang, Simple, Easy\n
"Keywords": [
"Scrago",
"Scrap",
"Spider",
"Crawl",
"GoLang",
"Simple",
"Easy"
]
}```
Origin page:
```html
Scrago exmaples
Scrago framework
An open source and collaborative framework for extracting the data you need from websites.
In a fast, simple, yet extensible way.
Scrago, Scrap, Spider, Crawl, GoLang, Simple, Easy
- true
- 123
- 45.6
- hello
- Aa
- Bb
- Cc
```
# Struct tag
Between selector and function use "::" symbol segmentation
```go
`scrago:"selector::function"````
* selector:
Css selector, sea more:github.com/PuerkitoBio/goquery* function:
Get data function,default is text()。1.Inner function:
- text() get text value.
- html() get html vlaue.
- outerHtml() get outer html value.
- attr(xxx) get attribute value, eg:attr(href)。2.Write custom function:
```gofunc (e *ExampModel) MyFunc(s *goquery.Selection) (MyReturnType, error) {
//todo
return ReturnValue, nil
}```
eg:
```gotype ExampModel struct {
TextField string `scrago:"#xxx"`
TextField2 string `scrago:".xxx::text()"`
Link string `scrago:"a::attr(href)"`
MyField string `scrago:"#xxx::MyFunc()"`
}func (e *ExampModel) MyFunc(s *goquery.Selection) (String, error) {
//todo
return s.Text(), nil
}```
# Exmaples
* [Simple](https://github.com/foolin/scrago/tree/master/example/simple "Simple Example")
* [Parser](https://github.com/foolin/scrago/tree/master/example/parser "Parser Example")
* [Quotesbot](https://github.com/foolin/scrago/tree/master/example/quotesbot "Quotesbot Example")# Relative
* github.com/PuerkitoBio/goquery