Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fzdwx/go-pachong
go 爬虫,能根据一个入口url不断爬取。go web crawler, able to continuously crawl data according to an entry url
https://github.com/fzdwx/go-pachong
crawler go golang
Last synced: 1 day ago
JSON representation
go 爬虫,能根据一个入口url不断爬取。go web crawler, able to continuously crawl data according to an entry url
- Host: GitHub
- URL: https://github.com/fzdwx/go-pachong
- Owner: fzdwx
- License: afl-3.0
- Created: 2021-08-19T11:11:00.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-06-18T02:49:37.000Z (over 2 years ago)
- Last Synced: 2024-06-20T15:57:19.911Z (8 months ago)
- Topics: crawler, go, golang
- Language: Go
- Homepage:
- Size: 22.5 KB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 流量粉碎机
* [流量粉碎机](#流量粉碎机)
* [usage](#usage)
* [todo List](#todo-List)
* [last](#last)_巨耗流量_
能根据入口的url爬取页面中的解析出来的url路径,然后再次对解析出来的url继续爬取...
can crawl the parsed URL path in the page according to the URL of the entrance, and then continue to crawl the parsed
URL again...## usage
调用`pa.NewPa(string)`函数,第一个入参是入口url,然后调用`AddCallback(func(string,string))`传入的函数是每次爬取到的页面返回的页面数据,可以根据需要实现。最后调用`Go()`开始爬取
call `pa.NewPa(string)`, the first input parameter is the entry URL, and the call `AddCallback(func(string,string))`
function passed in is the page data returned by each crawled page, which can be implemented as needed,last call `Go()````go
var wg sync.WaitGroupfunc TestPa(t *testing.T) {
url := "https://github.com/"
// 一直阻塞,没有调用wg.Done() keeps blocking
wg.Add(1)_ = pa.NewPa(url).AddCallback(func(url, body string) {
// nothing
}).Go()
wg.Wait()
}
```
## todo List- [ ] todo...
## last
欢迎任何有助于项目的issue。
Any issues that help the project are welcome。