{"id":47662270,"url":"https://github.com/austin-weeks/miasma","last_synced_at":"2026-06-11T04:01:13.240Z","repository":{"id":345637945,"uuid":"1184927517","full_name":"austin-weeks/miasma","owner":"austin-weeks","description":"Trap AI web scrapers in an endless poison pit.","archived":false,"fork":false,"pushed_at":"2026-06-08T02:14:32.000Z","size":31096,"stargazers_count":1089,"open_issues_count":11,"forks_count":24,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-06-08T03:10:42.138Z","etag":null,"topics":["ai","anti-ai","anti-spam","free-software","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/austin-weeks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yaml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":null,"patreon":null,"open_collective":null,"ko_fi":"austinweeks","tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"lfx_crowdfunding":null,"polar":null,"buy_me_a_coffee":null,"thanks_dev":null,"custom":null}},"created_at":"2026-03-18T04:12:41.000Z","updated_at":"2026-06-08T01:52:56.000Z","dependencies_parsed_at":"2026-03-20T17:02:44.856Z","dependency_job_id":"58e29321-28bc-450c-b9b2-e77471b4aea0","html_url":"https://github.com/austin-weeks/miasma","commit_stats":null,"previous_names":["austin-weeks/miasma"],"tags_count":24,"template":false,"template_full_name":null,"purl":"pkg:github/austin-weeks/miasma","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/austin-weeks%2Fmiasma","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/austin-weeks%2Fmiasma/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/austin-weeks%2Fmiasma/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/austin-weeks%2Fmiasma/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/austin-weeks","download_url":"https://codeload.github.com/austin-weeks/miasma/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/austin-weeks%2Fmiasma/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34181555,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","anti-ai","anti-spam","free-software","web-scraping"],"created_at":"2026-04-02T11:38:57.138Z","updated_at":"2026-06-11T04:01:13.224Z","avatar_url":"https://github.com/austin-weeks.png","language":"Rust","funding_links":["https://ko-fi.com/austinweeks"],"categories":["Rust"],"sub_categories":[],"readme":"# 🌀 Miasma\n\n[![No AI](https://custom-icon-badges.demolab.com/badge/No%20AI-2f2f2f?logo=non-ai\u0026logoColor=white\u0026logoSize=auto)](#)\n[![crates.io](https://img.shields.io/crates/v/miasma?logo=rust)](https://crates.io/crates/miasma)\n[![downloads](https://img.shields.io/crates/dr/miasma?logo=rust)](https://crates.io/crates/miasma)\n[![Release](https://github.com/austin-weeks/miasma/actions/workflows/Release.yaml/badge.svg)](https://github.com/austin-weeks/miasma/actions/workflows/Release.yaml)\n[![GitHub commits since latest release](https://img.shields.io/github/commits-since/austin-weeks/miasma/latest?logo=github)](#)\n\n\u003cpicture\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/austin-weeks/miasma/main/.github/images/miasma-art.png\" alt=\"Web crawlers getting stuck in a cloud of poison miasma.\" title=\"Cover art by @delphoxlover334\" /\u003e\n\u003c/picture\u003e\n\nAI companies continually scrape the internet at an enormous scale, swallowing up all of its contents to use as training data for their next models. If you have a public website, _they are already stealing your work._\n\n_Miasma_ is here to help you fight back! Spin up the server and point any malicious traffic towards it. _Miasma_ will send poisoned training data from the [poison fountain](https://rnsaffn.com/poison3) alongside multiple self-referential links. It's an endless buffet of slop for the slop machines.\n\n_Miasma_ is lightning fast and has a minimal memory footprint - you should not have to waste compute resources fending off the internet's leeches.\n\n\u003e [!CAUTION]\n\u003e There is inherent risk in deploying this software. Please fully read [configuration](#configuration) and [disclaimer](#disclaimer) before use.\n\n## Usage\n\nYou can run _Miasma_ locally, or with the official [docker image](https://hub.docker.com/r/austinweeks/miasma).\n\nIf you would like to incorporate _Miasma_ into an existing Rust server, you may also [use _Miasma_ as a library](https://docs.rs/miasma/).\n\n### Running Locally\n\nInstall with [cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) (recommended):\n\n```sh\ncargo install miasma\n```\n\nAlternatively, download a pre-built binary from [releases](https://github.com/austin-weeks/miasma/releases).\n\nCommunity-maintained packages are also available for a variety of package managers:\n\n\u003ca href=\"https://repology.org/project/miasma/versions\"\u003e\n    \u003cimg\n        src=\"https://repology.org/badge/vertical-allrepos/miasma.svg?exclude_unsupported=1\u0026minversion=0.2\"\n        alt=\"Packaging status\"\n    \u003e\n\u003c/a\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n\nStart _Miasma_ with default configuration:\n\n```sh\nmiasma\n```\n\nView all available [configuration options](#configuration):\n\n```sh\nmiasma --help\n```\n\n### Running with Docker\n\nRun _Miasma_ using the official [docker image](https://hub.docker.com/r/austinweeks/miasma):\n\n```sh\ndocker run --rm -p 9999:9999 austinweeks/miasma:latest\n```\n\nPass the same [configuration flags](#configuration) you would use locally:\n\n```sh\ndocker run --rm -p 9999:9999 austinweeks/miasma:latest \\\n    --link-prefix '/naughty-bots' \\\n    --max-in-flight 30\n```\n\nOr, run within a docker compose cluster:\n\n```yaml\nservices:\n  miasma:\n    image: austinweeks/miasma:latest\n    command: [\"--link-prefix\", \"/naughty-bots\", \"--max-in-flight\", \"30\"]\n    ports:\n      - 9999:9999\n```\n\n## How to Trap Scrapers\n\nLet's walk through an example of setting up a server to trap scrapers with _Miasma_. We'll pick `/naughty-bots` as our server's path to direct scraper traffic. We'll be using [_Nginx_](https://nginx.org/) as our server's reverse proxy, but the same result can be achieved with many different setups.\n\nWhen we're done, scrapers will be trapped like so:\n\n\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://raw.githubusercontent.com/austin-weeks/miasma/main/.github/images/flow-chart-dark.png\"\u003e\n    \u003cimg height=\"425\" src=\"https://raw.githubusercontent.com/austin-weeks/miasma/main/.github/images/flow-chart-light.png\" alt=\"Flow chart depicting cycle of trapped scrapers.\"\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n### Embedding Hidden Links\n\nWithin our site, we'll include a few hidden links leading to `/naughty-bots`.\n\n```html\n\u003ca\n  href=\"/naughty-bots/\"\n  style=\"display: none;\"\n  aria-hidden=\"true\"\n  tabindex=\"-1\"\n\u003e\n  Amazing high quality data here!\n\u003c/a\u003e\n```\n\nThe `style=\"display: none;\"`, `aria-hidden=\"true\"`, and `tabindex=\"-1\"` attributes ensure links are totally invisible to human visitors and will be ignored by screen readers and keyboard navigation. They will **only** be visible to scrapers.\n\n### Configuring our Nginx Proxy\n\nSince our hidden links point to `/naughty-bots/`, we'll configure this path to proxy requests to _Miasma_. Let's assume we're running _Miasma_ on port `9855`.\n\nWe'll also set up aggressive rate limiting based on the scraper's user agent to help ensure we don't accidentally DDoS ourselves.\n\n```nginx\nhttp {\n  # Reserve 8MB memory for tracking user agents\n  limit_req_zone $http_user_agent zone=miasma:8m rate=1r/s;\n\n  server {\n    location = /naughty-bots {\n      port_in_redirect off;\n      return 301 /naughty-bots/;\n    }\n    location /naughty-bots/ {\n      # Rate limit via the 'miasma' zone with no queueing\n      limit_req_status 429;\n      limit_req zone=miasma burst=5 nodelay;\n\n      # Proxy requests to Miasma\n      proxy_pass http://localhost:9855/;\n    }\n  }\n}\n```\n\nThis configuration will catch all variations of the `/naughty-bots` path -\u003e `/naughty-bots`, `/naughty-bots/`, `/naughty-bots/12345`, etc.\n\n### Run _Miasma_\n\nLastly, we'll start _Miasma_ and specify `/naughty-bots` as the link prefix. This instructs _Miasma_ to start links with `/naughty-bots/`, which ensures scrapers are properly routed through our _Nginx_ proxy back to _Miasma_.\n\nLet's limit the number of max in-flight connections to 50. At 50 connections, we can expect 50-60 MB peak memory usage. Note that any requests exceeding this limit will immediately receive a **429** response rather than being added to a queue.\n\nWe'll also force _Miasma_ to gzip compress all responses regardless of scrapers' `Accept-Encoding` header. Since gzipped responses are significantly smaller, this will help us cut down on egress costs.\n\nWhile we could keep scrapers trapped forever, we'll use the link count and max depth options to let scrapers go after they consume ~100K poisoned pages. With this setup, _Miasma_ will send around **250MB** of total data per scraper.\n\n```sh\nmiasma --link-prefix '/naughty-bots' -p 9855 -c 50 --force-gzip --link-count 5 --max-depth 8\n```\n\n### Enjoy!\n\nLet's deploy and watch as misbehaving bots greedily eat from our endless slop machine!\n\n\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/austin-weeks/miasma/main/.github/images/logs.gif\" /\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n### `robots.txt`\n\nBe sure to protect well-behaved bots and search engines from _Miasma_ via your [`robots.txt`](https://developers.google.com/search/docs/crawling-indexing/robots/intro)!\n\n```text\nUser-agent: *\nDisallow: /naughty-bots\n```\n\n## Metrics\n\n_Miasma_ offers the ability to track scraper request counts per unique User-Agent. This can be useful for identifying which bots are hitting your site most heavily. Metrics are written to a local SQLite database file and can be viewed at an endpoint of your choosing.\n\n## Configuration\n\n_Miasma_ can be configured via its CLI options:\n\n| Option                | Default                               | Description                                                                                                                                                                                                                                                             |\n| --------------------- | ------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `port`                | `9999`                                | The port the server should bind to.                                                                                                                                                                                                                                     |\n| `host`                | `localhost`                           | The host address the server should bind to.                                                                                                                                                                                                                             |\n| `unix-socket`         |                                       | Bind to a Unix domain socket rather than a TCP address. _Only available on Unix-like systems._                                                                                                                                                                          |\n| `max-in-flight`       | `500`                                 | Maximum number of allowable in-flight requests. Requests received when in flight is exceeded will receive a _429_ response. **_Miasma's_ memory usage scales directly with the number of in-flight requests - set this to a lower value if memory usage is a concern.** |\n| `link-prefix`         | `/`                                   | Prefix for self-directing links. This should be the path where you host _Miasma_, e.g. `/naughty-bots`.                                                                                                                                                                 |\n| `link-count`          | `5`                                   | Number of self-directing links to include in each response page.                                                                                                                                                                                                        |\n| `max-depth`           | `none`                                | Stop generating links once the scraper reaches the specified depth. This allows you to cut off scrapers after serving a desired amount of poison. _Use this in tandem with `link-count` to keep the numbers of active scrapers down to a manageable level._             |\n| `force-gzip`          | `false`                               | Always gzip responses regardless of the client's _Accept-Encoding_ header. **Forcing compression can help reduce egress costs.**                                                                                                                                        |\n| `unsafe-allow-html`   | `false`                               | Don't escape HTML characters in the poison source's responses. Escaping is enabled by default to prevent unintended client-side JavaScript execution. **Use this option with care.**                                                                                    |\n| `poison-source`       | `https://rnsaffn.com/poison2/?mask=0` | Proxy source for poisoned training data.                                                                                                                                                                                                                                |\n| `metrics-db-path`     |                                       | Path to SQLite database file to store metrics data. _Miasma_ will create a database at this location if one does not already exist.                                                                                                                                     |\n| `metrics-credentials` |                                       | Basic auth credentials required to access _Miasma's_ metrics page. Must match the format `\u003cusername\u003e:\u003cpassword\u003e`.                                                                                                                                                       |\n| `metrics-endpoint`    | `/metrics`                            | Endpoint at which _Miasma's_ metrics will be served.                                                                                                                                                                                                                    |\n\n## Disclaimer\n\n_Miasma_ is not affiliated with [the poison fountain](https://rnsaffn.com/poison3). We have no control over its responses and cannot guarantee the safety of its contents. You should **_never_** direct users towards your _Miasma_ location.\n\n_Miasma_ is not responsible for any retaliation from operators of affected scrapers. It is your responsibility to comply with applicable laws and hosting provider policies. See [LICENSE](LICENSE) (GPL-v3) for full warranty \u0026 limitation of liability details.\n\n---\n\n_Cover art by [@cerberussaturn07](https://www.instagram.com/cerberussaturn07/)_\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faustin-weeks%2Fmiasma","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faustin-weeks%2Fmiasma","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faustin-weeks%2Fmiasma/lists"}