{"id":16677237,"url":"https://github.com/folyd/robotstxt","last_synced_at":"2025-05-06T23:45:20.103Z","repository":{"id":57661510,"uuid":"252347373","full_name":"Folyd/robotstxt","owner":"Folyd","description":"A native Rust port of Google's robots.txt parser and matcher C++ library.","archived":false,"fork":false,"pushed_at":"2021-02-13T07:56:55.000Z","size":212,"stargazers_count":96,"open_issues_count":3,"forks_count":15,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-05-02T15:03:12.501Z","etag":null,"topics":["google-robots-parser","robotstxt","rust"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/robotstxt","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Folyd.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOGS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-02T03:36:39.000Z","updated_at":"2025-04-28T01:24:22.000Z","dependencies_parsed_at":"2022-09-05T23:51:11.871Z","dependency_job_id":null,"html_url":"https://github.com/Folyd/robotstxt","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Folyd%2Frobotstxt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Folyd%2Frobotstxt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Folyd%2Frobotstxt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Folyd%2Frobotstxt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Folyd","download_url":"https://codeload.github.com/Folyd/robotstxt/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252788404,"owners_count":21804280,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["google-robots-parser","robotstxt","rust"],"created_at":"2024-10-12T13:25:41.361Z","updated_at":"2025-05-06T23:45:20.082Z","avatar_url":"https://github.com/Folyd.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# robotstxt\n\n![Crates.io](https://img.shields.io/crates/v/robotstxt)\n![Docs.rs](https://docs.rs/robotstxt/badge.svg)\n[![Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)\n\nA native Rust port of [Google's robots.txt parser and matcher C++ library](https://github.com/google/robotstxt).\n\n- Native Rust port, no third-part crate dependency\n- Zero unsafe code\n- Preserves all behavior of original library\n- Consistent API with the original library\n- 100% google original test passed\n\n## Installation\n\n```toml\n[dependencies]\nrobotstxt = \"0.3.0\"\n```\n\n## Quick start\n\n```rust\nuse robotstxt::DefaultMatcher;\n\nlet mut matcher = DefaultMatcher::default();\nlet robots_body = \"user-agent: FooBot\\n\\\n                   disallow: /\\n\";\nassert_eq!(false, matcher.one_agent_allowed_by_robots(robots_body, \"FooBot\", \"https://foo.com/\"));\n```\n\n## About\n\nQuoting the README from Google's robots.txt parser and matcher repo:\n\n\u003e The Robots Exclusion Protocol (REP) is a standard that enables website owners to control which URLs may be accessed by automated clients (i.e. crawlers) through a simple text file with a specific syntax. It's one of the basic building blocks of the internet as we know it and what allows search engines to operate.\n\u003e\n\u003e Because the REP was only a de-facto standard for the past 25 years, different implementers implement parsing of robots.txt slightly differently, leading to confusion. This project aims to fix that by releasing the parser that Google uses.\n\u003e\n\u003e The library is slightly modified (i.e. some internal headers and equivalent symbols) production code used by Googlebot, Google's crawler, to determine which URLs it may access based on rules provided by webmasters in robots.txt files. The library is released open-source to help developers build tools that better reflect Google's robots.txt parsing and matching.\n\nCrate **robotstxt** aims to be a faithful conversion, from C++ to Rust, of Google's robots.txt parser and matcher.\n\n## Testing\n\n```\n$ git clone https://github.com/Folyd/robotstxt\nCloning into 'robotstxt'...\n$ cd robotstxt/tests \n...\n$ mkdir c-build \u0026\u0026 cd c-build\n...\n$ cmake ..\n...\n$ make\n...\n$ make test\nRunning tests...\nTest project ~/robotstxt/tests/c-build\n    Start 1: robots-test\n1/1 Test #1: robots-test ......................   Passed    0.33 sec\n```\n\n## License\n\nThe robotstxt parser and matcher Rust library is licensed under the terms of the\nApache license. See [LICENSE](LICENSE) for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffolyd%2Frobotstxt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffolyd%2Frobotstxt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffolyd%2Frobotstxt/lists"}