{"id":36497487,"url":"https://github.com/dineshsprabu/concurrent-web-crawler","last_synced_at":"2026-01-12T02:04:48.616Z","repository":{"id":57619438,"uuid":"90268454","full_name":"dineshsprabu/concurrent-web-crawler","owner":"dineshsprabu","description":"Flexible and concurrent web crawler implemented in 'go'","archived":false,"fork":false,"pushed_at":"2017-05-05T06:00:06.000Z","size":14,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-06-20T03:34:48.633Z","etag":null,"topics":["concurrent-web-crawler","crawler","go-crawler","spider","web-crawler"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dineshsprabu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-05-04T13:44:29.000Z","updated_at":"2023-05-13T07:38:40.000Z","dependencies_parsed_at":"2022-09-16T18:50:15.854Z","dependency_job_id":null,"html_url":"https://github.com/dineshsprabu/concurrent-web-crawler","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/dineshsprabu/concurrent-web-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dineshsprabu%2Fconcurrent-web-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dineshsprabu%2Fconcurrent-web-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dineshsprabu%2Fconcurrent-web-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dineshsprabu%2Fconcurrent-web-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dineshsprabu","download_url":"https://codeload.github.com/dineshsprabu/concurrent-web-crawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dineshsprabu%2Fconcurrent-web-crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28331542,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-12T00:36:25.062Z","status":"online","status_checked_at":"2026-01-12T02:00:08.677Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["concurrent-web-crawler","crawler","go-crawler","spider","web-crawler"],"created_at":"2026-01-12T02:03:42.047Z","updated_at":"2026-01-12T02:04:48.603Z","avatar_url":"https://github.com/dineshsprabu.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Concurrent Web Crawler\n\nHighly configurable crawler with powerful concurrency and better status logging.\n\n[![GoDoc](https://godoc.org/github.com/dineshsprabu/concurrent-web-crawler?status.svg)](https://godoc.org/github.com/dineshsprabu/concurrent-web-crawler). [![Build Status](https://travis-ci.org/dineshsprabu/concurrent-web-crawler.svg?branch=master)](https://travis-ci.org/dineshsprabu/concurrent-web-crawler)\n\n## Installation\n\n```\ngo get github.com/dineshsprabu/concurrent-web-crawler\n\n```\n\n## Usage\n\n```go\n\npackage main\n\nimport(\n\"github.com/dineshsprabu/concurrent-web-crawler\"\n)\n\nfunc main(){\n\t// Creating a web crawler object with configurations.\n\tmyCrawler := web.Crawler{ \n\t\t\tMaxConcurrencyLimit: 2, \n\t\t\tStoragePath: \"crawler/storage\", \n\t\t\tCrawlDelay: 10,\n\t\t}\n\n\t// List of URLS to be crawled as a string array.\n\turls := []string{ \n\t\t\t\t\"https://httpbin.org/ip\", \n\t\t\t\t\"http://example.com\", \n\t\t\t\t\"https://archive.org/details/opensource_movies\",\n\t\t\t}\n\n\t// Starting the crawler by passing the list of URLs.\n\tmyCrawler.Start(urls)\n}\n\n```\n\n## Log\n\n```\n\n\u003e go run crawler_sample.go \n2017/05/04 20:29:59 ||  [Processing] Spawning subroutines :  2\n2017/05/04 20:29:59 ||  [Processing] Fetching page content :  https://archive.org/details/opensource_movies\n2017/05/04 20:29:59 ||  [Processing] Fetching page content :  https://httpbin.org/ip\n2017/05/04 20:30:01 ||  [Processing] Writing to the file :  crawler/ip.html\n2017/05/04 20:30:01 ||  [Success] Crawled page :  https://httpbin.org/ip\n2017/05/04 20:30:03 ||  [Processing] Writing to the file :  crawler/details/opensource_movies.html\n2017/05/04 20:30:03 ||  [Success] Crawled page :  https://archive.org/details/opensource_movies\n2017/05/04 20:30:11 ||  [Processing] Fetching page content :  http://example.com\n2017/05/04 20:30:12 ||  [Processing] Writing to the file :  crawler/example.com/index.html\n2017/05/04 20:30:12 ||  [Success] Crawled page :  http://example.com\n2017/05/04 20:30:22 ||  [Status] Failed urls :  []\n\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdineshsprabu%2Fconcurrent-web-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdineshsprabu%2Fconcurrent-web-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdineshsprabu%2Fconcurrent-web-crawler/lists"}