Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zhshch2002/goribot
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
https://github.com/zhshch2002/goribot
crawler go golang golang-library scraper scrapy spider spider-framework spiderbasic
Last synced: 3 months ago
JSON representation
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
- Host: GitHub
- URL: https://github.com/zhshch2002/goribot
- Owner: zhshch2002
- License: apache-2.0
- Archived: true
- Created: 2019-09-08T10:39:47.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-07-20T08:16:29.000Z (almost 4 years ago)
- Last Synced: 2024-01-17T10:54:22.240Z (5 months ago)
- Topics: crawler, go, golang, golang-library, scraper, scrapy, spider, spider-framework, spiderbasic
- Language: Go
- Homepage: https://github.com/zhshch2002/gospider
- Size: 612 KB
- Stars: 210
- Watchers: 11
- Forks: 30
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-go-cn - goribot
- awesome-seeds - Goribot
- awesome-go - goribot - A simple golang spider/scraping framework,build a spider in 3 lines. (Text Processing / HTTP Clients)
- awesome-Char - goribot - A simple golang spider/scraping framework,build a spider in 3 lines. (Text Processing / HTTP Clients)
- awesome-reader - goribot - A simple golang spider/scraping framework,build a spider in 3 lines. (Text Processing / HTTP Clients)
- awesome-go - goribot - A simple golang spider/scraping framework,build a spider in 3 lines. (Text Processing / HTTP Clients)
- go-awesome-cn-star - goribot
- awesome-hacking-lists - goribot - [Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。 (Go)
- awesome-hacking-lists - zhshch2002/goribot - [Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。 (Go)
- awesome-go - goribot - A simple golang spider/scraping framework,build a spider in 3 lines. (Text Processing / HTTP Clients)
- awesome-hacking-lists - goribot - [Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。 (Go (531))
README
# Goribot
一个分布式友好的轻量的 Golang 爬虫框架。[完整文档 | Document](https://wiki.imagician.net/goribot/)
> !! Warning !!
>
> Goribot 已经被迁移到 [Gospider|github.com/zhshch2002/gospider](https://github.com/zhshch2002/gospider)。修复了一些调度问题并分离了网络请求部分到另一个仓库。此仓库会继续保留,建议新朋友使用新的 Gospider。
>
> Goribot has been moved to [Gospider|github.com/zhshch2002/gospider](https://github.com/zhshch2002/gospider). Fixed some scheduling issues and separated the network request part to another repo. This repo will continue to be kept, suggest new friends to use the new Gospider.![GitHub go.mod Go version](https://img.shields.io/github/go-mod/go-version/zhshch2002/goribot)
![GitHub tag (latest by date)](https://img.shields.io/github/v/tag/zhshch2002/goribot)
[![codecov](https://codecov.io/gh/zhshch2002/goribot/branch/master/graph/badge.svg)](https://codecov.io/gh/zhshch2002/goribot)
[![go-report](https://goreportcard.com/badge/github.com/zhshch2002/goribot)](https://goreportcard.com/report/github.com/zhshch2002/goribot)
![license](https://img.shields.io/github/license/zhshch2002/goribot)
![code-size](https://img.shields.io/github/languages/code-size/zhshch2002/goribot.svg)
[![](https://godoc.org/github.com/nathany/looper?status.svg)](https://godoc.org/github.com/zhshch2002/goribot)## 🚀Feature
* 优雅的 API
* 整洁的文档
* 高速(单核处理 >1K task/sec)
* 友善的分布式支持
* 便捷的细节
* 相对链接自动转换
* 字符编码自动解码
* HTML,JSON 自动解析
* 丰富的扩展支持
* [请求去重](https://imagician.net/goribot/extensions.html#reqdeduplicate-%e8%af%b7%e6%b1%82%e5%8e%bb%e9%87%8d)(👈支持分布式)
* [限制请求、速率、并发](https://imagician.net/goribot/extensions.html#limiter-%e9%99%90%e5%88%b6%e8%af%b7%e6%b1%82%e3%80%81%e9%80%9f%e7%8e%87%e3%80%81%e5%b9%b6%e5%8f%91)
* [Json](https://imagician.net/goribot/extensions.html#saveitemsasjson-%e4%bf%9d%e5%ad%98-item-%e5%88%b0-json-%e6%96%87%e4%bb%b6),[CSV](https://imagician.net/goribot/extensions.html#saveitemsascsv-%e4%bf%9d%e5%ad%98-item-%e5%88%b0-csv-%e6%96%87%e4%bb%b6) 存储结果
* [Robots.txt 支持](https://imagician.net/goribot/extensions.html#robotstxt-robots-txt-%e6%94%af%e6%8c%81)
* [记录请求异常](https://imagician.net/goribot/extensions.html#spiderlogerror-%e8%ae%b0%e5%bd%95%e6%84%8f%e5%a4%96%e5%92%8c%e9%94%99%e8%af%af)
* [随机 UA ](https://imagician.net/goribot/extensions.html#randomuseragent-%e9%9a%8f%e6%9c%ba-ua)、[随机代理](https://imagician.net/goribot/extensions.html#randomproxy-%e9%9a%8f%e6%9c%ba%e4%bb%a3%e7%90%86)
* [失败重试](https://imagician.net/goribot/extensions.html#retry-%e5%a4%b1%e8%b4%a5%e9%87%8d%e8%af%95)
* 轻量,适于学习或快速开箱搭建> 版本警告
>
> Goribot 仅支持 Go1.13 及以上版本。## 👜获取 Goribot
```sh
go get -u github.com/zhshch2002/goribot
```
> Goribot 包含一个历史开发版本,如果您需要使用过那个版本,请拉取 Tag 为 v0.0.1 版本。## ⚡建立你的第一个项目
```Go
package mainimport (
"fmt"
"github.com/zhshch2002/goribot"
)func main() {
s := goribot.NewSpider()s.AddTask(
goribot.GetReq("https://httpbin.org/get"),
func(ctx *goribot.Context) {
fmt.Println(ctx.Resp.Text)
fmt.Println(ctx.Resp.Json("headers.User-Agent"))
},
)s.Run()
}
```## 🎉完成
至此你已经可以使用 Goribot 了。更多内容请从 [开始使用](https://imagician.net/goribot/get-start.html) 了解。## 🙏感谢
* [ants](https://github.com/panjf2000/ants)
* [chardet](https://github.com/saintfish/chardet)
* [colly](https://github.com/gocolly/colly)
* [gjson](https://github.com/tidwall/gjson)
* [goquery](https://github.com/PuerkitoBio/goquery)
* [go-logging](https://github.com/op/go-logging)
* [go-redis](https://github.com/go-redis/redis)
* [robots](https://github.com/slyrz/robots)
* [glob](https://github.com/gobwas/glob)万分感谢以上项目的帮助🙏。