{"id":31655593,"url":"https://github.com/foundata/hugo-component-robotstxt","last_synced_at":"2025-10-07T13:17:14.235Z","repository":{"id":312460557,"uuid":"1047572221","full_name":"foundata/hugo-component-robotstxt","owner":"foundata","description":"Hugo theme component to manage robots.txt","archived":false,"fork":false,"pushed_at":"2025-08-31T21:34:00.000Z","size":40,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-02T04:29:26.794Z","etag":null,"topics":["hugo","hugo-theme-component"],"latest_commit_sha":null,"homepage":"https://foundata.com/en/projects/hugo-component-robotstxt/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/foundata.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSES/GPL-3.0-or-later.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-30T18:05:46.000Z","updated_at":"2025-09-29T04:57:21.000Z","dependencies_parsed_at":"2025-08-30T20:28:17.356Z","dependency_job_id":"2c567154-b6ee-4be0-be81-eec4cae77458","html_url":"https://github.com/foundata/hugo-component-robotstxt","commit_stats":null,"previous_names":["foundata/hugo-component-robotstxt"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/foundata/hugo-component-robotstxt","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundata%2Fhugo-component-robotstxt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundata%2Fhugo-component-robotstxt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundata%2Fhugo-component-robotstxt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundata%2Fhugo-component-robotstxt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/foundata","download_url":"https://codeload.github.com/foundata/hugo-component-robotstxt/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundata%2Fhugo-component-robotstxt/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278780212,"owners_count":26044515,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hugo","hugo-theme-component"],"created_at":"2025-10-07T13:17:13.153Z","updated_at":"2025-10-07T13:17:14.230Z","avatar_url":"https://github.com/foundata.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hugo theme component: hugo-component-robotstxt (manage robots.txt)\n\nA reusable [theme component](https://gohugo.io/hugo-modules/theme-components/) to manage and generate the site's [`robots.txt`](https://developers.google.com/search/docs/crawling-indexing/robots/intro).\n\n\n## Table of contents\n\n- [Features](#features)\n- [Demo](#demo)\n- [Installation](#installation)\n  - [Using Hugo modules](#installation-hugo-modules)\n  - [Using Git submodules](#installation-git-submodules)\n- [Configuration](#configuration)\n  - [Settings](#settings)\n    - [`excludeNonProduction`](#setting-excludeNonProduction)\n    - [`exclude`](#setting-exclude)\n    - [`excludeCrawlers`](#setting-excludeCrawlers)\n  - [Sitemap handling](#sitemap-handling)\n- [Compatibility](#compatibility)\n- [Contributing](#contributing)\n- [Licensing, copyright](#licensing-copyright)\n- [Author information](#author-information)\n\n\n## Features\u003ca id=\"features\"\u003e\u003c/a\u003e\n\n* Automatically excludes all bots and crawlers in non-production environments by default.\n* Sane and useful `Disallow:` defaults.\n* Supports crawler-specific blocking and per-path exclusions.\n* Automatically manages sitemap references:\n  * If a sitemap is enabled (default: `sitemap.xml`), its URL is added to `robots.txt`.\n  * If a sitemap is disabled or renamed, the reference is updated or omitted accordingly.\n\n\n## Demo\u003ca id=\"demo\"\u003e\u003c/a\u003e\n\nClone the repository and run the included [example content](./exampleSite/content/) (requires Hugo, Go, and Git):\n\n```bash\ngit clone https://github.com/foundata/hugo-component-robotstxt.git\ncd ./hugo-component-robotstxt/exampleSite\nHUGO_MODULE_WORKSPACE=hugo.work hugo server --ignoreVendorPaths \"**\"\n```\n\nOr look at the following pages using this theme component:\n\n* https://foundata.com/robots.txt\n* https://golang.foundata.com/robots.txt\n\n\n## Installation\u003ca id=\"installation\"\u003e\u003c/a\u003e\n\n### Using Hugo modules\u003ca id=\"installation-hugo-modules\"\u003e\u003c/a\u003e\n\nAdd the following module path(s) to your [`theme:` configuration](https://gohugo.io/hugo-modules/theme-components/):\n\n```yaml\ntheme:\n  - \"golang.foundata.com/hugo-component-robotstxt\"\n```\n\nHugo automatically fetches and import theme module paths as Go/Hugo modules, so you do **not** need to list them under `module.imports` manually. Using modules requires [Hugo, Go, and Git](https://gohugo.io/hugo-modules/use-modules/#prerequisite) to be installed on your system.\n\n\n### Using Git submodules\u003ca id=\"installation-git-submodules\"\u003e\u003c/a\u003e\n\nFrom the root directory of your Hugo site, initialize a new Git repository (if you haven't already), then add the theme as a [Git submodule](https://git-scm.com/book/en/v2/Git-Tools-Submodules):\n\n```bash\ngit submodule add https://github.com/foundata/hugo-component-robotstxt.git themes/robotstxt\n```\n\nNow reference the theme directory name in your [`theme:` configuration](https://gohugo.io/hugo-modules/theme-components/):\n\n```yaml\ntheme:\n  - \"robotstxt\"\n```\n\n## Configuration\u003ca id=\"configuration\"\u003e\u003c/a\u003e\n\n\u003e ℹ️ **Heads-up:** You have to set [`enableRobotsTXT: true`](https://gohugo.io/configuration/all/#enablerobotstxt) (which is `false` by default) and make sure `robotstxt` is *not* listed at [`disableKinds`](https://gohugo.io/configuration/all/#disablekinds) (which should be OK by default). Otherwise, no `robots.txt` will be created.\n\n\nExample:\n\n```yaml\n# Enable generation of robots.txt file.\nenableRobotsTXT: true\n\nparams:\n  robotsTxt:\n    # Block all user agents (\"Disallow: /\") in non-production environments.\n    excludeNonProduction: true\n    exclude:\n      # Version control\n      - \"/.git/\"\n      # System and metadata dirs\n      - \"/.well-known/\"\n      # Log and temp files\n      - \"/*.log$\"\n      - \"/*.tmp$\"\n      - \"/*.bak$\"\n    excludeCrawlers:\n      - \"GPTBot\" # OpenAI / ChatGPT indexing\n      - \"ChatGPT-User\" # OpenAI / ChatGPT plugins, used for direct actions in the name of a ChatGPT user\n\n```\n\n\n### Settings\u003ca id=\"settings\"\u003e\u003c/a\u003e\n\nThis section documents the theme options you can place under `params.robotsTxt` in your Hugo configuration. The example configurations and are safe to copy-paste. All keys are optional and the theme falls back to sensible behavior unless otherwise noted.\n\n\n#### `excludeNonProduction`\u003ca id=\"setting-excludeNonProduction\"\u003e\u003c/a\u003e\n\n- Type: Boolean.\n- Default: `true`\n- Purpose: When `true`, the template adds the following directive in non-production builds:\n  ```\n  User-agent: *\n  Disallow: /\n  ```\n  Production detection is based on either:\n  * `hugo.IsProduction`\n  * `.Site.Params.env == \"production\"`\n- **Example (config):**\n  ```yaml\n  params:\n    robotsTxt:\n      excludeNonProduction: true\n  ```\n\n\n#### `exclude`\u003ca id=\"setting-exclude\"\u003e\u003c/a\u003e\n\n- Type: List of strings.\n- Default: `[\"/.git/\", \"/*.log$\", \"/*.tmp$\", \"/*.bak$\", \"/.well-known/\"]`\n- Purpose:\n  - List of [path patterns](https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt#url-matching-based-on-path-values).\n  - Each entry becomes a `Disallow:` rule for all .crawlers (`User-agent: *`).\n- **Example (config):**\n  ```yaml\n  params:\n    robotsTxt:\n      exclude:\n        - \"/download/\"\n        - \"*.asc$\"\n  ```\n  becomes the following in `robots.txt`:\n  ```\n  User-agent: *\n  Disallow: /download/\n  Disallow: *.asc$\n  ```\n\n\n#### `excludeCrawlers`\u003ca id=\"setting-excludeCrawlers\"\u003e\u003c/a\u003e\n\n- Type: List of strings.\n- Default: `[]` (empty list)\n- Purpose:\n  - List of crawler user-agent names to exclude. Most companies provide some kind of list, e.g.:\n    - https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers\n    - https://platform.openai.com/docs/bots/overview-of-openai-crawlers%23.eps\n  - Reminder: `robots.txt` is an *advisory* mechanism. It prevents compliant crawlers from fetching URLs, but does not protect sensitive files from  direct access.\n- **Example (config):** Each entry creates a crawler-specific block:\n  ```yaml\n  params:\n    robotsTxt:\n      excludeCrawlers:\n        - \"ia_archiver\"\n        - \"GPTBot\"\n  ```\n  becomes the following in `robots.txt`:\n  ```\n  User-agent: ia_archiver\n  Disallow: /\n\n  User-agent: GPTBot\n  Disallow: /\n  ```\n\n\n### Sitemap handling\u003ca id=\"sitemap-handling\"\u003e\u003c/a\u003e\n\nThere is nothing to configure. But the component is aware of [Hugo's sitemap configuration](https://gohugo.io/configuration/sitemap/):\n\n* By default Hugo generates the Sitemap as `/sitemap.xml`.\n* If disabled (`disableKinds = [\"sitemap\"]`) or if `sitemap.filename` is set to an empty string, no `Sitemap:` line is emitted.\n* If a custom filename is set (e.g. `sitemap.filename = \"mysite-map.xml\"`), the generated `robots.txt` will correctly reference it.\n\n\n## Compatibility\u003ca id=\"compatibility\"\u003e\u003c/a\u003e\n\nThis project is compatible with Hugo (extended) ≥ v0.148.0 and should always work with the latest Hugo release (we usually run the latest Hugo ourselves and fix issues promptly). It has been tested at least with:\n\n- [Hugo extended v0.149.0](https://github.com/gohugoio/hugo/releases/tag/v0.149.0)\n- [Hugo extended v0.148.0](https://github.com/gohugoio/hugo/releases/tag/v0.148.0)\n\nIf your version isn't listed, it might still work. Just give it a try.\n\n\n## Contributing\u003ca id=\"contributing\"\u003e\u003c/a\u003e\n\nSee [`CONTRIBUTING.md`](./CONTRIBUTING.md) if you want to get involved.\n\nThis projects's functionality is mature, so there might be little activity on the repository in the future. Don't get fooled by this, the project is under active maintenance and used daily by the maintainers.\n\n\n## Licensing, copyright\u003ca id=\"licensing-copyright\"\u003e\u003c/a\u003e\n\n\u003c!--REUSE-IgnoreStart--\u003e\nCopyright (c) 2025 foundata GmbH (https://foundata.com)\n\nThis project is licensed under the GNU General Public License v3.0 or later (SPDX-License-Identifier: `GPL-3.0-or-later`), see [`LICENSES/GPL-3.0-or-later.txt`](LICENSES/GPL-3.0-or-later.txt) for the full text.\n\nThe [`REUSE.toml`](REUSE.toml) file provides detailed licensing and copyright information in a human- and machine-readable format. This includes parts that may be subject to different licensing or usage terms, such as third-party components. The repository conforms to the [REUSE specification](https://reuse.software/spec/). You can use [`reuse spdx`](https://reuse.readthedocs.io/en/latest/readme.html#cli) to create a [SPDX software bill of materials (SBOM)](https://en.wikipedia.org/wiki/Software_Package_Data_Exchange).\n\u003c!--REUSE-IgnoreEnd--\u003e\n\n[![REUSE status](https://api.reuse.software/badge/github.com/foundata/hugo-component-robotstxt)](https://api.reuse.software/info/github.com/foundata/hugo-component-robotstxt)\n\n\n## Author information\u003ca id=\"author-information\"\u003e\u003c/a\u003e\n\nThis project was created and is maintained by [foundata](https://foundata.com/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoundata%2Fhugo-component-robotstxt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffoundata%2Fhugo-component-robotstxt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoundata%2Fhugo-component-robotstxt/lists"}