{"id":27808257,"url":"https://github.com/dgl/haphash","last_synced_at":"2026-01-24T14:04:25.833Z","repository":{"id":290296318,"uuid":"973912251","full_name":"dgl/haphash","owner":"dgl","description":"Anti-scraper challenge for haproxy to stop naughty AI bots.","archived":false,"fork":false,"pushed_at":"2025-08-02T01:41:42.000Z","size":12,"stargazers_count":59,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-10-11T02:42:23.772Z","etag":null,"topics":["haproxy","waf"],"latest_commit_sha":null,"homepage":"https://dgl.cx/2025/04/using-haproxy-to-stop-scrapers","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"0bsd","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dgl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"ko_fi":"webgl"}},"created_at":"2025-04-28T01:19:13.000Z","updated_at":"2025-10-09T04:00:06.000Z","dependencies_parsed_at":"2025-08-02T07:45:16.350Z","dependency_job_id":null,"html_url":"https://github.com/dgl/haphash","commit_stats":null,"previous_names":["dgl/haphash"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dgl/haphash","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgl%2Fhaphash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgl%2Fhaphash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgl%2Fhaphash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgl%2Fhaphash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dgl","download_url":"https://codeload.github.com/dgl/haphash/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgl%2Fhaphash/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28729411,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-24T10:24:43.181Z","status":"ssl_error","status_checked_at":"2026-01-24T10:24:36.112Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["haproxy","waf"],"created_at":"2025-05-01T10:25:42.927Z","updated_at":"2026-01-24T14:04:25.828Z","avatar_url":"https://github.com/dgl.png","language":"HTML","funding_links":["https://ko-fi.com/webgl"],"categories":["HTML"],"sub_categories":[],"readme":"# haphash: Anti-scraper for haproxy\n\nThis is a simple anti-scraper solution for [haproxy](https://www.haproxy.org),\nusing a similar \"hashcash\" challenge as\n[anubis](https://xeiaso.net/blog/2025/anubis/) uses. The goal is to be as\nsimple as possible, so this can be implemented alongside other haproxy rules to\ncontrol traffic.\n\n## Overview\n\nAI crawlers keep [breaking the\nweb](https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/).\nI mostly have avoided this problem by having very lightweight pages, but lately\nI've noticed some scrapers are being particularly obnoxious. Many solutions to\nthis problem involve adding another proxy component, but I'm already running\nhaproxy in most places, which is a perfectly fine reverse proxy and I don't\nwant to make things more complex if I can avoid it.\n\nThis uses a haproxy [\"stick\ntable\"](https://www.haproxy.com/blog/introduction-to-haproxy-stick-tables) to\nstore details of IP addresses. It is based on simply allowing IP addresses,\nrather than cookies. As the IP address is stored in memory and there's no\ncookie, this likely does not add to any GDPR obligations (this is not legal\nadvice).\n\nIt is expected this will be combined with haproxy IP based [rate\nlimiting](https://www.haproxy.com/blog/four-examples-of-haproxy-rate-limiting),\nwith the benefit that this doesn't add another component to the system.\n\nIf you want to try it out, my [contact](https://dgl.cx/contact) page is always\nprotected by it.\n\n## The moving parts\n\n[`challenge.html`](challenge.html) is the HTML served to clients, templated via\nhaproxy. (Because this is templated you can't just open it in your browser --\nnote the double percent signs.)\n\n[`haproxy.conf`](haproxy.conf) is a haproxy config snippet that makes use of\nthis. It's expected you adjust this for your implementation. The \"challenge\"\nbackend is where the majority of the logic lives and should only need tiny\nchanges.\n\nThis is small:\n\n```console\n$ wc -l haproxy.conf challenge.html\n      38 haproxy.conf\n      94 challenge.html\n     132 total\n```\n\n## Set-up\n\nCopy [`challenge.html`](challenge.html) to `/etc/haproxy/challenge.html` (or\nother suitable location).\n\nFrom [haproxy.conf](haproxy.conf) add the `challenge` backend to your haproxy\nconfiguration. Add the relevant lines from `frontend www` to your frontend\nsection.\n\nTo start with it is recommended you protect a single path for testing purposes.\nRestarting haproxy will clear the stick table (configure\n[peers](https://www.haproxy.com/documentation/haproxy-configuration-tutorials/proxying-essentials/custom-rules/stick-tables/#synchronize-stick-tables-across-peers)\nto make the allowed IP addresses persist).\n\nThe difficulty is set in both the HTML and the haproxy config, it defaults to 4\n(which is pretty fast).\n\n## License\n\n©[David Leadbeater](https://一.st) 2025; [0BSD](https://dgl.cx/0bsd), see\n[COPYING](COPYING).\n\n## Alternatives\n\n* [haproxy enterprise](https://www.haproxy.com/documentation/haproxy-configuration-tutorials/security/enterprise-features/)\n* [tedu's anticrawl](https://flak.tedunangst.com/post/anticrawl)\n* [anubis](https://anubis.techaro.lol/)\n* [go-away](https://git.gammaspectra.live/git/go-away)\n* [haproxy-protection](https://gitgud.io/fatchan/haproxy-protection/)\n\n## Credits\n\n* [Xe Iaso](https://xeiaso.net/) for anubis.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdgl%2Fhaphash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdgl%2Fhaphash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdgl%2Fhaphash/lists"}