An open API service indexing awesome lists of open source software.

https://github.com/s-thom/create-robots-txt-action

An action to create a robots.txt file from different sources
https://github.com/s-thom/create-robots-txt-action

action actions gh-action gh-actions github-action github-actions robots-txt robotstxt

Last synced: 28 days ago
JSON representation

An action to create a robots.txt file from different sources

Awesome Lists containing this project

README

          

# Create Robots.txt Action

An action to create a [robots.txt](https://en.wikipedia.org/wiki/Robots.txt) file from a variety of sources.

| Input name | Example | Description |
| ------------------------ | ----------------- | -------------------------------------------------------------------------------------------------------------- |
| output-file | `robots.txt` | Where to write the resulting robots.txt file |
| input-file | `base-robots.txt` | An existing robots.txt. Will be added to the top of the output file. Must not be the same as the `output-file` |
| append-allow-rule | `true` | Whether to add an allow for all unspecified user agents to the end of the file |
| allowed-bot-names | | Multiline string. Names of bots that should not be included in the blocklist |
| blocked-bot-names | | Multiline string. Names of bots that should be included in the blocklist |
| cloudflare-api-token | | An API token for Cloudflare. Enables Cloudflare's bot categories as a source for bots |
| cloudflare-categories | `AI Crawler` | Bot categories to add to the blocklist. Required if `cloudflare-api-token` is set |
| dark-visitors-api-token | | An API token for Dark Visitors. Enables Dark Visitors' user agent categories as a source for bots |
| dark-visitors-categories | `AI Data Scraper` | User agent categories to add to the blocklist. Required if `dark-visitors-api-token` is set |

## Example workflow.yml

> [!NOTE]
>
> You will need to enable the "Allow GitHub Actions to create and approve pull requests" option in your repository's `Settings > Actions > General`

```yml
name: Update robots.txt

on:
workflow_dispatch:
schedule:
- cron: "13 6 * * 1"

jobs:
update-robots-txt:
name: Update robots.txt
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Create robots.txt
uses: s-thom/create-robots-txt-action@v1
with:
output-file: public/robots.txt
append-allow-rule: true
allowed-bot-names: |
Chrome-Lighthouse
cloudflare-api-token: ${{ secrets.CLOUDFLARE_RADAR_API_TOKEN }}
cloudflare-categories: |
YOUR BOT CATEGORIES HERE
dark-visitors-api-token: ${{ secrets.DARK_VISITORS_API_TOKEN }}
dark-visitors-categories: |
YOUR BOT CATEGORIES HERE

- name: Create Pull Request
uses: peter-evans/create-pull-request@v7
with:
add-paths: |
public/robots.txt
commit-message: "Update robots.txt"
branch: robots-txt
delete-branch: true
author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
committer: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
title: "Update robots.txt"
body: |
# Automated update of robots.txt

Generated by [s-thom/create-robots-txt-action](https://github.com/s-thom/create-robots-txt-action)
```

## Known bot categories for each provider

### Cloudflare

To create an API token, go to [the API Tokens page](https://dash.cloudflare.com/profile/api-tokens) in your profile settings and create a new token. Use the "Read Cloudflare Radar data" template or create a custom one with the "Account > Radar > Read" permission. Create the token and add it as an actions secret in your GitHub repository.

It is worth creating a new token for this workflow even if you already have one set up in your repository. It is good practice to give the least amount of privilege to any tokens given to third-party code, such as this action.

- Accessibility
- Advertising & Marketing
- Aggregator
- AI Assistant
- AI Crawler
- AI Search
- Archiver
- Feed Fetcher
- Monitoring & Analytics
- Page Preview
- Search Engine Crawler
- Search Engine Optimization
- Security
- Social Media Marketing
- Webhooks
- Other

### Dark Visitors

To find your API token, go to the Settings page for your project. The access token is visible on this page. Copy the token and add it as an actions secret in your GitHub repository.

- AI Assistants​
- AI Data Scrapers​
- AI Search Crawlers​

> [!NOTE]
>
> While Dark Visitors also defines these other categories, they do not include bots from these categories in their API.
>
> - Archivers​
> - Developer Helpers​
> - Fetchers​
> - Headless Browsers​
> - Intelligence Gatherers​
> - Scrapers​
> - Search Engine Crawlers​
> - SEO Crawlers​
> - Uncategorized​
> - Undocumented AI Agents

Development instructions

## Initial Setup

After you've cloned the repository to your local machine or codespace, you'll
need to perform some initial setup steps before you can develop your action.

> [!NOTE]
>
> You'll need to have a reasonably modern version of
> [Node.js](https://nodejs.org) handy (20.x or later should work!). If you are
> using a version manager like [`nodenv`](https://github.com/nodenv/nodenv) or
> [`fnm`](https://github.com/Schniz/fnm), this template has a `.node-version`
> file at the root of the repository that can be used to automatically switch to
> the correct version when you `cd` into the repository. Additionally, this
> `.node-version` file is used by GitHub Actions in any `actions/setup-node`
> actions.

1. :hammer_and_wrench: Install the dependencies

```bash
npm install
```

1. :building_construction: Package the TypeScript for distribution

```bash
npm run bundle
```

1. :white_check_mark: Run the tests

```bash
$ npm test

PASS ./index.test.js
✓ throws invalid number (3ms)
✓ wait 500 ms (504ms)
✓ test runs (95ms)

...
```

## Update the Action Metadata

The [`action.yml`](action.yml) file defines metadata about your action, such as
input(s) and output(s). For details about this file, see
[Metadata syntax for GitHub Actions](https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions).

When you copy this repository, update `action.yml` with the name, description,
inputs, and outputs for your action.

## Update the Action Code

The [`src/`](./src/) directory is the heart of your action! This contains the
source code that will be run when your action is invoked. You can replace the
contents of this directory with your own code.

There are a few things to keep in mind when writing your action code:

- Most GitHub Actions toolkit and CI/CD operations are processed asynchronously.
In `main.ts`, you will see that the action is run in an `async` function.

```javascript
import * as core from "@actions/core";
//...

async function run() {
try {
//...
} catch (error) {
core.setFailed(error.message);
}
}
```

For more information about the GitHub Actions toolkit, see the
[documentation](https://github.com/actions/toolkit/blob/master/README.md).

So, what are you waiting for? Go ahead and start customizing your action!

1. Create a new branch

```bash
git checkout -b releases/v1
```

1. Replace the contents of `src/` with your action code
1. Add tests to `__tests__/` for your source code
1. Format, test, and build the action

```bash
npm run all
```

> This step is important! It will run [`ncc`](https://github.com/vercel/ncc)
> to build the final JavaScript action code with all dependencies included.
> If you do not run this step, your action will not work correctly when it is
> used in a workflow. This step also includes the `--license` option for
> `ncc`, which will create a license file for all of the production node
> modules used in your project.

1. (Optional) Test your action locally

The [`@github/local-action`](https://github.com/github/local-action) utility
can be used to test your action locally. It is a simple command-line tool
that "stubs" (or simulates) the GitHub Actions Toolkit. This way, you can run
your TypeScript action locally without having to commit and push your changes
to a repository.

The `local-action` utility can be run in the following ways:
- Visual Studio Code Debugger

Make sure to review and, if needed, update
[`.vscode/launch.json`](./.vscode/launch.json)

- Terminal/Command Prompt

```bash
# npx local action
npx local-action . src/main.ts .env
```

You can provide a `.env` file to the `local-action` CLI to set environment
variables used by the GitHub Actions Toolkit. For example, setting inputs and
event payload data used by your action. For more information, see the example
file, [`.env.example`](./.env.example), and the
[GitHub Actions Documentation](https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables).

1. Commit your changes

```bash
git add .
git commit -m "My first action is ready!"
```

1. Push them to your repository

```bash
git push -u origin releases/v1
```

1. Create a pull request and get feedback on your action
1. Merge the pull request into the `main` branch

Your action is now published! :rocket:

For information about versioning your action, see
[Versioning](https://github.com/actions/toolkit/blob/master/docs/action-versioning.md)
in the GitHub Actions toolkit.

## Validate the Action

You can now validate the action by referencing it in a workflow file. For
example, [`ci.yml`](./.github/workflows/ci.yml) demonstrates how to reference an
action in the same repository.

```yaml
steps:
- name: Checkout
id: checkout
uses: actions/checkout@v4

- name: Test Local Action
id: test-action
uses: ./
with:
milliseconds: 1000

- name: Print Output
id: output
run: echo "${{ steps.test-action.outputs.time }}"
```

For example workflow runs, check out the
[Actions tab](https://github.com/actions/typescript-action/actions)! :rocket:

## Usage

After testing, you can create version tag(s) that developers can use to
reference different stable versions of your action. For more information, see
[Versioning](https://github.com/actions/toolkit/blob/master/docs/action-versioning.md)
in the GitHub Actions toolkit.

To include the action in a workflow in another repository, you can use the
`uses` syntax with the `@` symbol to reference a specific branch, tag, or commit
hash.

```yaml
steps:
- name: Checkout
id: checkout
uses: actions/checkout@v4

- name: Test Local Action
id: test-action
uses: actions/typescript-action@v1 # Commit with the `v1` tag
with:
milliseconds: 1000

- name: Print Output
id: output
run: echo "${{ steps.test-action.outputs.time }}"
```

## Publishing a New Release

This project includes a helper script, [`script/release`](./script/release)
designed to streamline the process of tagging and pushing new releases for
GitHub Actions.

GitHub Actions allows users to select a specific version of the action to use,
based on release tags. This script simplifies this process by performing the
following steps:

1. **Retrieving the latest release tag:** The script starts by fetching the most
recent SemVer release tag of the current branch, by looking at the local data
available in your repository.
1. **Prompting for a new release tag:** The user is then prompted to enter a new
release tag. To assist with this, the script displays the tag retrieved in
the previous step, and validates the format of the inputted tag (vX.X.X). The
user is also reminded to update the version field in package.json.
1. **Tagging the new release:** The script then tags a new release and syncs the
separate major tag (e.g. v1, v2) with the new release tag (e.g. v1.0.0,
v2.1.2). When the user is creating a new major release, the script
auto-detects this and creates a `releases/v#` branch for the previous major
version.
1. **Pushing changes to remote:** Finally, the script pushes the necessary
commits, tags and branches to the remote repository. From here, you will need
to create a new release in GitHub so users can easily reference the new tags
in their workflows.