{"id":13413990,"url":"https://github.com/cyucelen/walker","last_synced_at":"2026-01-15T02:41:11.636Z","repository":{"id":65950366,"uuid":"603178692","full_name":"cyucelen/walker","owner":"cyucelen","description":"Seamlessly fetch paginated data from any source. Simple and high performance API scraping included!","archived":false,"fork":false,"pushed_at":"2023-02-17T20:37:54.000Z","size":662,"stargazers_count":14,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-01-12T13:10:28.011Z","etag":null,"topics":["api-scraper","indexer","pagination","paginator","scraper"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cyucelen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-02-17T19:32:25.000Z","updated_at":"2025-05-26T02:05:51.000Z","dependencies_parsed_at":"2023-03-13T20:31:41.646Z","dependency_job_id":null,"html_url":"https://github.com/cyucelen/walker","commit_stats":{"total_commits":14,"total_committers":1,"mean_commits":14.0,"dds":0.0,"last_synced_commit":"d65bd3d935edf8cdf4bc3f3f158325f34e02a9b4"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/cyucelen/walker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyucelen%2Fwalker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyucelen%2Fwalker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyucelen%2Fwalker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyucelen%2Fwalker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cyucelen","download_url":"https://codeload.github.com/cyucelen/walker/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyucelen%2Fwalker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28441412,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-15T00:55:22.719Z","status":"online","status_checked_at":"2026-01-15T02:00:08.019Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api-scraper","indexer","pagination","paginator","scraper"],"created_at":"2024-07-30T20:01:54.600Z","updated_at":"2026-01-15T02:41:11.598Z","avatar_url":"https://github.com/cyucelen.png","language":"Go","readme":"\u003cp align=\"center\"\u003e\n  \u003cimg height=\"400px\" src=\"assets/logo.png\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    \u003cb\u003eSeamlessly fetch paginated data from any source!\u003c/b\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pkg.go.dev/github.com/cyucelen/walker?tab=doc\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/go.dev-reference-007d9c?logo=go\u0026logoColor=white\" alt=\"godoc\" title=\"godoc\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/cyucelen/walker/tags\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/v/tag/cyucelen/walker\" alt=\"semver tag\" title=\"semver tag\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/cyucelen/walker/actions/workflows/go.yml\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/actions/workflow/status/cyucelen/walker/go.yml?branch=master\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://codecov.io/gh/cyucelen/walker\"\u003e\n    \u003cimg src=\"https://codecov.io/gh/cyucelen/walker/branch/master/graph/badge.svg\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://goreportcard.com/report/github.com/cyucelen/walker\"\u003e\n    \u003cimg src=\"https://goreportcard.com/badge/github.com/cyucelen/walker\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/cyucelen/walker/blob/master/LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/license/cyucelen/walker.svg\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n# walker\n\nWalker simplifies the process of fetching paginated data from any data source. With Walker, you can easily configure the start position and count of documents to fetch, depending on your needs. Additionally, Walker supports parallel processing, allowing you to fetch data more efficiently and at a faster rate.\n\nThe real purpose of the library is to provide a solution for walking through the pagination of API endpoints. With the `NewApiWalker`, you can easily fetch data from any paginated API endpoint and process the data concurrently. You can also create your own custom walker to fit your specific use case.\n\n## Features\n\n* Provides a walker to paginate through the pagination of API endpoint. This is for scraping an API, if such a term exists.\n* `cursor` and `offset` pagination strategies.\n* Fetching and processing data concurrently without any effort.\n* Total fetch count limiting\n* Rate limiting\n\n## Examples\n\n### Basic Usage\n\n```go\nfunc source(start, fetchCount int) ([]int, error) {\n\treturn []int{start, fetchCount}, nil\n}\n\nfunc sink(result []int, stop func()) error {\n\tfmt.Println(result)\n\treturn nil\n}\n\nfunc main() {\n\twalker.New(source, sink).Walk()\n}\n```\n**Output:**\n```\n[0 10]\n[1 10]\n[4 10]\n[2 10]\n[3 10]\n[5 10]\n[8 10]\n[9 10]\n[7 10]\n[6 10]\n...\nto Infinity\n```\n\n* `source` function will receive `start` as the page number and `count` as the number of documents. Use this values to fetch data from your source.\n* `sink` function will receive the result you returned from `source` and a `stop` function. You can save the results in this function and decide to stop sourcing any further pages depending on your results by calling `stop` function, otherwise it will continue to forever unless [a limit provided](#configuration).\n* Beware of order is not ensured since source and sink functions called concurrently.\n\n### Walking through the pagination of API endpoints \n\n**Fetching all the breweries from `Open Brewery DB`:**\n\n```go\nfunc buildRequest(start, fetchCount int) (*http.Request, error) {\n\turl := fmt.Sprintf(\"https://api.openbrewerydb.org/breweries?page=%d\u0026per_page=%d\", start, fetchCount)\n\treturn http.NewRequest(http.MethodGet, url, http.NoBody)\n}\n\nfunc sink(res *http.Response, stop func()) error {\n\tvar payload []map[string]any\n\tjson.NewDecoder(res.Body).Decode(\u0026payload)\n\n\tif len(payload) == 0 {\n\t\tstop()\n\t\treturn nil\n\t}\n\n\treturn saveBreweries(payload)\n}\n\nfunc main() {\n\twalker.NewApiWalker(http.DefaultClient, buildRequest, sink).Walk()\n}\n```\n\nTo create API walker you just need to provide: \n* `RequestBuilder` function to create http request using provided values\n* `sink` function to process the http response\n\nCheck [examples](/example/) for more usecases.\n\n## Configuration\n\n| Option           | Description                                            | Default                     | Available Values                                          |\n| ---------------- | ------------------------------------------------------ | --------------------------- | --------------------------------------------------------- |\n| WithPagination   | Defines the pagination strategy                        | `walker.OffsetPagination{}` | `walker.OffsetPagination{}`, `walker.CursorPagination{}`  |\n| WithMaxBatchSize | Defines limit for document count to stop after reached | `10`                        | `int`                                                     |\n| WithParallelism  | Defines number of workers to run provided source       | `runtime.NumCPU()`          | `int`                                                     |\n| WithLimiter      | Defines limit for document count to stop after reached | `walker.InfiniteLimiter()`  | `walker.InfiniteLimiter()`, `walker.ConstantLimiter(int)` |\n| WithRateLimit    | Defines rate limit by **count** and per **duration**   | `unlimited`                 | `(int, time.Duration)`                                    |\n| WithContext      | Defines context                                        | `context.Background()`      | `context.Context`                                         |\n\n\n## Contribution\n\nI would like to accept any contributions to make `walker` better and feature rich. Feel free to contribute with your usecase!","funding_links":[],"categories":["Text Processing","文本处理","Template Engines"],"sub_categories":["Scrapers","刮刀"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyucelen%2Fwalker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcyucelen%2Fwalker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyucelen%2Fwalker/lists"}