{"id":17088500,"url":"https://github.com/temoto/robotstxt","last_synced_at":"2025-05-15T21:06:06.271Z","repository":{"id":565815,"uuid":"770149","full_name":"temoto/robotstxt","owner":"temoto","description":"The robots.txt exclusion protocol implementation for Go language","archived":false,"fork":false,"pushed_at":"2022-11-09T09:51:34.000Z","size":97,"stargazers_count":273,"open_issues_count":4,"forks_count":56,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-08T08:11:18.058Z","etag":null,"topics":["go","go-library","golang","golang-library","production-ready","robots-txt","status-active","web"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/temoto.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2010-07-12T10:54:05.000Z","updated_at":"2025-03-09T07:07:45.000Z","dependencies_parsed_at":"2022-08-06T09:15:49.071Z","dependency_job_id":null,"html_url":"https://github.com/temoto/robotstxt","commit_stats":null,"previous_names":["temoto/robotstxt.go","temoto/robotstxt-go"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/temoto%2Frobotstxt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/temoto%2Frobotstxt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/temoto%2Frobotstxt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/temoto%2Frobotstxt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/temoto","download_url":"https://codeload.github.com/temoto/robotstxt/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254422756,"owners_count":22068678,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["go","go-library","golang","golang-library","production-ready","robots-txt","status-active","web"],"created_at":"2024-10-14T13:37:54.246Z","updated_at":"2025-05-15T21:06:06.239Z","avatar_url":"https://github.com/temoto.png","language":"Go","readme":"What\n====\n\nThis is a robots.txt exclusion protocol implementation for Go language (golang).\n\n\nBuild\n=====\n\nTo build and run tests run `go test` in source directory.\n\n\nContribute\n==========\n\nWarm welcome.\n\n* If desired, add your name in README.rst, section Who.\n* Run `script/test \u0026\u0026 script/clean \u0026\u0026 echo ok`\n* You can ignore linter warnings, but everything else must pass.\n* Send your change as pull request or just a regular patch to current maintainer (see section Who).\n\nThank you.\n\n\nUsage\n=====\n\nAs usual, no special installation is required, just\n\n    import \"github.com/temoto/robotstxt\"\n\nrun `go get` and you're ready.\n\n1. Parse\n^^^^^^^^\n\nFirst of all, you need to parse robots.txt data. You can do it with\nfunctions `FromBytes(body []byte) (*RobotsData, error)` or same for `string`::\n\n    robots, err := robotstxt.FromBytes([]byte(\"User-agent: *\\nDisallow:\"))\n    robots, err := robotstxt.FromString(\"User-agent: *\\nDisallow:\")\n\nAs of 2012-10-03, `FromBytes` is the most efficient method, everything else\nis a wrapper for this core function.\n\nThere are few convenient constructors for various purposes:\n\n* `FromResponse(*http.Response) (*RobotsData, error)` to init robots data\nfrom HTTP response. It *does not* call `response.Body.Close()`::\n\n    robots, err := robotstxt.FromResponse(resp)\n    resp.Body.Close()\n    if err != nil {\n        log.Println(\"Error parsing robots.txt:\", err.Error())\n    }\n\n* `FromStatusAndBytes(statusCode int, body []byte) (*RobotsData, error)` or\n`FromStatusAndString` if you prefer to read bytes (string) yourself.\nPassing status code applies following logic in line with Google's interpretation\nof robots.txt files:\n\n    * status 2xx  -\u003e parse body with `FromBytes` and apply rules listed there.\n    * status 4xx  -\u003e allow all (even 401/403, as recommended by Google).\n    * other (5xx) -\u003e disallow all, consider this a temporary unavailability.\n\n2. Query\n^^^^^^^^\n\nParsing robots.txt content builds a kind of logic database, which you can\nquery with `(r *RobotsData) TestAgent(url, agent string) (bool)`.\n\nExplicit passing of agent is useful if you want to query for different agents. For\nsingle agent users there is an efficient option: `RobotsData.FindGroup(userAgent string)`\nreturns a structure with `.Test(path string)` method and `.CrawlDelay time.Duration`.\n\nSimple query with explicit user agent. Each call will scan all rules.\n\n::\n\n    allow := robots.TestAgent(\"/\", \"FooBot\")\n\nOr query several paths against same user agent for performance.\n\n::\n\n    group := robots.FindGroup(\"BarBot\")\n    group.Test(\"/\")\n    group.Test(\"/download.mp3\")\n    group.Test(\"/news/article-2012-1\")\n\n\nWho\n===\n\nHonorable contributors (in undefined order):\n\n    * Ilya Grigorik (igrigorik)\n    * Martin Angers (PuerkitoBio)\n    * Micha Gorelick (mynameisfiber)\n\nInitial commit and other: Sergey Shepelev temotor@gmail.com\n\n\nFlair\n=====\n\n.. image:: https://travis-ci.org/temoto/robotstxt.svg?branch=master\n    :target: https://travis-ci.org/temoto/robotstxt\n\n.. image:: https://codecov.io/gh/temoto/robotstxt/branch/master/graph/badge.svg\n    :target: https://codecov.io/gh/temoto/robotstxt\n\n.. image:: https://goreportcard.com/badge/github.com/temoto/robotstxt\n    :target: https://goreportcard.com/report/github.com/temoto/robotstxt\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftemoto%2Frobotstxt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftemoto%2Frobotstxt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftemoto%2Frobotstxt/lists"}