{"id":7558195,"url":"https://github.com/infinilabs/crawler","last_synced_at":"2026-04-11T05:01:12.432Z","repository":{"id":39575161,"uuid":"96291648","full_name":"infinilabs/crawler","owner":"infinilabs","description":"🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)","archived":false,"fork":false,"pushed_at":"2021-05-19T08:41:59.000Z","size":57257,"stargazers_count":308,"open_issues_count":9,"forks_count":82,"subscribers_count":25,"default_branch":"master","last_synced_at":"2025-03-30T17:09:50.182Z","etag":null,"topics":["crawler","crawling","elasticsearch","lightweight","scraping","spider","web-crawler","web-scraping","web-spider"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/infinilabs.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/security.md","support":null},"funding":{"patreon":"medcl"}},"created_at":"2017-07-05T07:45:39.000Z","updated_at":"2025-02-15T21:17:12.000Z","dependencies_parsed_at":"2022-08-26T13:23:13.281Z","dependency_job_id":null,"html_url":"https://github.com/infinilabs/crawler","commit_stats":null,"previous_names":["infinilabs/crawler","infinitbyte/gopa"],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinilabs%2Fcrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinilabs%2Fcrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinilabs%2Fcrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinilabs%2Fcrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/infinilabs","download_url":"https://codeload.github.com/infinilabs/crawler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247526770,"owners_count":20953143,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","crawling","elasticsearch","lightweight","scraping","spider","web-crawler","web-scraping","web-spider"],"created_at":"2024-04-08T01:51:00.798Z","updated_at":"2026-04-11T05:01:07.056Z","avatar_url":"https://github.com/infinilabs.png","language":"Go","readme":"\u003cimg width=\"200\" alt=\"What a Spider!\" src=\"https://raw.githubusercontent.com/infinitbyte/gopa/master/docs/assets/img/logo.svg?sanitize=true\"\u003e\n\nGOPA, A Spider Written in Go.\n\n[![Travis](https://travis-ci.org/infinitbyte/gopa.svg?branch=master)](https://travis-ci.org/infinitbyte/gopa)\n[![Go Report Card](https://goreportcard.com/badge/github.com/infinitbyte/gopa)](https://goreportcard.com/report/github.com/infinitbyte/gopa)\n[![Join the chat at https://gitter.im/infinitbyte/gopa](https://badges.gitter.im/infinitbyte/gopa.svg)](https://gitter.im/infinitbyte/gopa?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n\n\n## Goal\n\n* Light weight, low footprint, memory requirement should \u003c 100MB\n* Easy to deploy, no runtime or dependency required\n* Easy to use, no programming or scripts ability needed, out of box features\n\n\n## Screenshoot\n\n\u003cimg width=\"800\" alt=\"What a Spider! GOPA Spider!\" src=\"https://raw.githubusercontent.com/infinitbyte/gopa/master/docs/assets/img/screenshot/2017.10.20_v0.9.gif\"\u003e\n\n\n---\n\n\n- [How to use](#how-to-use)\n  - [Requirements](#requirements)\n  - [Setup](#setup)\n    - [Download Pre Built Package](#download-pre-built-package)\n    - [Compile The Package Manually](#compile-the-package-manually)\n  - [Required Config](#required-config)\n  - [Start](#start)\n  - [Stop](#stop)\n- [Configuration](#configuration)\n- [UI](#ui)\n- [API](#api)\n- [Architecture](#architecture)\n- [Contributing](#contributing)\n- [License](#license)\n\n\n\n## How to use\n\n### Requirements\n\n* Elasticsearch v5.3+\n\n\n### Setup\n\nFirst of all, get it, two opinions: download the pre-built package or compile it yourself.\n\n#### Download Pre Built Package\n\nGo to [Release](https://github.com/infinitbyte/gopa/releases) page, download the right package for your platform.\n\n_Note: Darwin is for Mac_\n\n#### Compile The Package Manually\n\nRequirements\n* Golang 1.9+\n\nSupported platform\n- Mac/Linux: Run `make build` to build the Gopa. \u003cbr/\u003e\n- Windows:  Checkout this wiki page - [How to build GOPA on windows](https://github.com/infinitbyte/gopa/wiki/How-to-build-GOPA-on-windows).\n\nFor example:\n```\n#apt  install golang-go\n#brew install golang\nmkdir ~/go/src/github.com/infinitbyte/ -p\ncd ~/go/src/github.com/infinitbyte/\ngit clone https://github.com/infinitbyte/gopa.git\ncd gopa\nmake\n```\n\nAfter a few minutes, you should have:\n\n\u003e `gopa`, the main program, a single binary.\u003cbr/\u003e\n\u003e `gopa.yml`, main configuration for gopa.\u003cbr/\u003e\n\n\n### Required Config\n\n_Note: Elasticsearch version should \u003e= v5.3_\n\n- Enable elastic module in `gopa.yml`, update the elasticsearch's setting:\n```\nelasticsearch:\n- name: default\n  enabled: true\n  endpoint: http://localhost:9200\n  index_prefix: gopa-\n  basic_auth:\n    username: elastic\n    password: changeme\n\n```\n\u003c/details\u003e\u003c/p\u003e\n\n\n### Start\n\nBesides Elasticsearch, Gopa doesn't require any other dependencies, just simply run `./gopa` to start the program.\n\nGopa can be run as daemon(_Note: Only available on Linux and Mac_):\n\u003cp\u003e\u003cdetails\u003e\n  \u003csummary\u003eExample\u003c/summary\u003e\n  \u003cpre\u003e\n➜  gopa git:(master) ✗ ./bin/gopa --daemon\n  ________ ________ __________  _____\n /  _____/ \\_____  \\\\______   \\/  _  \\\n/   \\  ___  /   |   \\|     ___/  /_\\  \\\n\\    \\_\\  \\/    |    \\    |  /    |    \\\n \\______  /\\_______  /____|  \\____|__  /\n        \\/         \\/                \\/\n[gopa] 0.10.0_SNAPSHOT\n///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 ///\n\n[10-21 16:01:09] [INF] [instance.go:23] workspace: data/gopa/nodes/0\n[gopa] started.\u003c/pre\u003e\n\u003c/details\u003e\u003c/p\u003e\n\nAlso run `./gopa -h` to get the full list of command line options.\n\u003cp\u003e\u003cdetails\u003e\n  \u003csummary\u003eExample\u003c/summary\u003e\n  \u003cpre\u003e\n➜  gopa git:(master) ✗ ./bin/gopa -h\n  ________ ________ __________  _____\n /  _____/ \\_____  \\\\______   \\/  _  \\\n/   \\  ___  /   |   \\|     ___/  /_\\  \\\n\\    \\_\\  \\/    |    \\    |  /    |    \\\n \\______  /\\_______  /____|  \\____|__  /\n        \\/         \\/                \\/\n[gopa] 0.10.0_SNAPSHOT\n///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 ///\n\nUsage of ./bin/gopa:\n  -config string\n    \tthe location of config file (default \"gopa.yml\")\n  -cpuprofile string\n    \twrite cpu profile to this file\n  -daemon\n    \trun in background as daemon\n  -debug\n    \trun in debug mode, gopa will quit with panic error\n  -log string\n    \tthe log level,options:trace,debug,info,warn,error (default \"info\")\n  -log_path string\n    \tthe log path (default \"log\")\n  -memprofile string\n    \twrite memory profile to this file\n  -pidfile string\n    \tpidfile path (only for daemon)\n  -pprof string\n    \tenable and setup pprof/expvar service, eg: localhost:6060 , the endpoint will be: http://localhost:6060/debug/pprof/ and http://localhost:6060/debug/vars\u003c/pre\u003e\n\u003c/details\u003e\u003c/p\u003e\n\n\n### Stop\n\nIt's safety to press `ctrl+c` stop the current running Gopa, Gopa will handle the rest,saving the checkpoint,\nyou may restore the job later, the world is still in your hand.\n\nIf you are running `Gopa` as daemon, you may stop it like this:\n\n```\n kill -QUIT `pgrep gopa`\n```\n\n## Configuration\n\n## UI\n\n* Search Console `http://127.0.0.1:9000/`\n* Admin Console  `http://127.0.0.1:9000/admin/`\n\n## API\n\n## Architecture\n\n\u003cimg width=\"800\" alt=\"What a Spider! GOPA Spider!\" src=\"https://raw.githubusercontent.com/infinitbyte/gopa/master/docs/assets/img/architecture-v1.png\"\u003e\n\n\n\n## Who uses it?\n\nYou use GOPA and you want to be listed there? [Contact me](https://medcl.com).\n\n\n\n\nLicense\n=======\nReleased under the [Apache License, Version 2.0](https://github.com/infinitbyte/gopa/blob/master/LICENSE) .\n","funding_links":["https://patreon.com/medcl"],"categories":["Go"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinilabs%2Fcrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finfinilabs%2Fcrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinilabs%2Fcrawler/lists"}