{"id":17155300,"url":"https://github.com/glyn/nginx_robot_access","last_synced_at":"2025-04-13T13:04:06.502Z","repository":{"id":233572000,"uuid":"786637345","full_name":"glyn/nginx_robot_access","owner":"glyn","description":"NGINX robot access module","archived":false,"fork":false,"pushed_at":"2025-03-19T10:46:26.000Z","size":112,"stargazers_count":6,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-27T04:08:11.452Z","etag":null,"topics":["hacktoberfest","nginx","robots-txt"],"latest_commit_sha":null,"homepage":"https://underlap.org/blocking-ai-web-crawlers","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/glyn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-15T02:30:54.000Z","updated_at":"2025-03-19T10:39:50.000Z","dependencies_parsed_at":"2024-04-22T18:49:30.228Z","dependency_job_id":"a4950407-ac68-4fdb-8798-48ef72c52382","html_url":"https://github.com/glyn/nginx_robot_access","commit_stats":null,"previous_names":["glyn/nginx_robot_access"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glyn%2Fnginx_robot_access","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glyn%2Fnginx_robot_access/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glyn%2Fnginx_robot_access/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glyn%2Fnginx_robot_access/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/glyn","download_url":"https://codeload.github.com/glyn/nginx_robot_access/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248717247,"owners_count":21150389,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hacktoberfest","nginx","robots-txt"],"created_at":"2024-10-14T21:51:09.729Z","updated_at":"2025-04-13T13:04:06.462Z","avatar_url":"https://github.com/glyn.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NGINX robot access module\n\nThis NGINX module enforces the rules in `robots.txt` for web crawlers that choose\nto disregard those rules.\n\nRegardless of the rules in `robots.txt`, the module always allows the path `/robots.txt` to be accessed.\nThis gives web crawlers the _option_ of obeying `robots.txt`.\nIf any other files should always be accessible, these should be made available via `robots.txt`.\n\nSee the following instructions for how to build and configure the module.\n\n## Building\n\nThis module is written in Rust. After [installing Rust](https://www.rust-lang.org/tools/install),\nthe module may be built using `cargo`, but **must** be built for the version of NGINX that is in use.\n\nFor example, to build the module for NGINX version 1.22.1, issue the following command in the root directory of a clone of this repository:\n~~~\nNGX_VERSION=1.22.1 cargo build --release\n~~~\n\nThis will build a shared library in `target/release`.\n\n## Configuring\n\nTo enable this module, it must be loaded in the NGINX configuration, e.g.:\n~~~\nload_module /var/lib/libnginx_robot_access.so;\n~~~\n\nFor this module to work correctly, the absolute file path of `robots.txt` must be configured in the NGINX configuration using the directive `robots_txt_path`. The directive takes a single argument: the absolute file path of `robots.txt`, e.g.:\n~~~\nrobots_txt_path /etc/robots.txt;\n~~~\n\nThe directive may be specified in any of the `http`, `server`, or `location` configuration blocks.\nConfiguring the directive in the `location` block overrides any configuration of the directive in the `server` block. Configuring the directive in the `server` block overrides any configuration in the `http` block.\n\nFor example, here's a simple configuration that enables the module and sets the path to `/etc/robots.txt`:\n~~~\nload_module /var/lib/libnginx_robot_access.so;\n...\nhttp {\n    ...\n    server {\n        ...\n        location / {\n            ...\n            robots_txt_path /etc/robots.txt;\n        }\n...\n~~~\n\n## Validating\n\nTo make sure the module is working correctly, use `curl` to access your site and specify a user agent that your `robots.txt` file denies access for, e.g.:\n~~~\ncurl -A \"GPTBot\" https://example.org\n~~~\n\n## Debugging\n\nSome debug logging is included in the module. To use this, enable debug logging in the NGINX configuration, e.g.:\n~~~\nerror_log  logs/error.log debug;\n~~~\n\n## Contributing\n\nSee the [Contributor Guide](./CONTRIBUTING.md) if you'd like to submit changes.\n\n## Acknowledgements\n\n* [ngx-rust](https://github.com/nginxinc/ngx-rust): a Rust binding for NGINX.\n* [robotstxt](https://github.com/Folyd/robotstxt): a Rust port of Google's [C++ implementation](https://github.com/google/robotstxt). Thanks @Folyd!\n\n## Alternatives\n\n* Configure NGINX to [block specific user agents](https://www.xmodulo.com/block-specific-user-agents-nginx-web-server.html), although this doesn't share the configuration in `robots.txt`.\n* [NGINX configuration for AI web crawlers](https://github.com/ai-robots-txt/ai.robots.txt/blob/main/servers/nginx.conf), but again this doesn't share the configuration in `robots.txt`.\n* [Roboo](https://github.com/yuri-gushin/Roboo) is an NGINX module which protects against robots that fail to implement certain browser features.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglyn%2Fnginx_robot_access","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fglyn%2Fnginx_robot_access","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglyn%2Fnginx_robot_access/lists"}