Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/fzdwx/go-pachong

go 爬虫，能根据一个入口url不断爬取。go web crawler, able to continuously crawl data according to an entry url
https://github.com/fzdwx/go-pachong

crawler go golang

Last synced: 1 day ago
JSON representation

go 爬虫，能根据一个入口url不断爬取。go web crawler, able to continuously crawl data according to an entry url

Host: GitHub
URL: https://github.com/fzdwx/go-pachong
Owner: fzdwx
License: afl-3.0
Created: 2021-08-19T11:11:00.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-06-18T02:49:37.000Z (over 2 years ago)
Last Synced: 2024-06-20T15:57:19.911Z (8 months ago)
Topics: crawler, go, golang
Language: Go
Homepage:
Size: 22.5 KB
Stars: 4
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # 流量粉碎机

* [流量粉碎机](#流量粉碎机)

    * [usage](#usage)

    * [todo List](#todo-List)

    * [last](#last)

_巨耗流量_

能根据入口的url爬取页面中的解析出来的url路径，然后再次对解析出来的url继续爬取...

can crawl the parsed URL path in the page according to the URL of the entrance, and then continue to crawl the parsed

URL again...

## usage

调用`pa.NewPa(string)`函数，第一个入参是入口url，然后调用`AddCallback(func(string,string))`传入的函数是每次爬取到的页面返回的页面数据，可以根据需要实现。最后调用`Go()`开始爬取

call `pa.NewPa(string)`, the first input parameter is the entry URL, and the call `AddCallback(func(string,string))`

function passed in is the page data returned by each crawled page, which can be implemented as needed,last call `Go()`

```go

var wg sync.WaitGroup

func TestPa(t *testing.T) {

	url := "https://github.com/"

	// 一直阻塞，没有调用wg.Done()  keeps blocking

	wg.Add(1)

	_ = pa.NewPa(url).AddCallback(func(url, body string) {

		// nothing

	}).Go()

	wg.Wait()

}

```

## todo List

- [ ] todo...

## last

欢迎任何有助于项目的issue。

Any issues that help the project are welcome。