{"id":43129179,"url":"https://github.com/BlessedRebuS/Krawl","last_synced_at":"2026-02-11T17:01:20.305Z","repository":{"id":328983395,"uuid":"1113893223","full_name":"BlessedRebuS/Krawl","owner":"BlessedRebuS","description":"Krawl is a customizable lightweight cloud native web deception server and anti-crawler that creates fake web applications with low-hanging vulnerabilities and realistic, randomly generated decoy data","archived":false,"fork":false,"pushed_at":"2026-02-10T23:27:03.000Z","size":2816,"stargazers_count":272,"open_issues_count":8,"forks_count":15,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-02-11T02:08:49.851Z","etag":null,"topics":["anti-crawling","blue-team","cloud-native","crawler","cybersecurity","deception","honeypot","kubernetes","security","self-hosted","spider","web"],"latest_commit_sha":null,"homepage":"https://demo.krawlme.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BlessedRebuS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-10T15:58:21.000Z","updated_at":"2026-02-08T21:00:51.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/BlessedRebuS/Krawl","commit_stats":null,"previous_names":["blessedrebus/krawl"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/BlessedRebuS/Krawl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlessedRebuS%2FKrawl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlessedRebuS%2FKrawl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlessedRebuS%2FKrawl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlessedRebuS%2FKrawl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BlessedRebuS","download_url":"https://codeload.github.com/BlessedRebuS/Krawl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlessedRebuS%2FKrawl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29338656,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T16:14:43.024Z","status":"ssl_error","status_checked_at":"2026-02-11T16:14:15.258Z","response_time":97,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anti-crawling","blue-team","cloud-native","crawler","cybersecurity","deception","honeypot","kubernetes","security","self-hosted","spider","web"],"created_at":"2026-01-31T21:00:26.601Z","updated_at":"2026-02-11T17:01:20.299Z","avatar_url":"https://github.com/BlessedRebuS.png","language":"Python","readme":"\u003ch1 align=\"center\"\u003eKrawl\u003c/h1\u003e\r\n\r\n\u003ch3 align=\"center\"\u003e\r\n  \u003ca name=\"readme-top\"\u003e\u003c/a\u003e\r\n  \u003cimg\r\n    src=\"img/krawl-svg.svg\"\r\n    height=\"250\"\r\n  \u003e\r\n\u003c/h3\u003e\r\n\u003cdiv align=\"center\"\u003e\r\n\r\n\u003cp align=\"center\"\u003e\r\n  A modern, customizable web honeypot server designed to detect and track malicious activity from attackers and web crawlers through deceptive web pages, fake credentials, and canary tokens.\r\n\u003c/p\u003e\r\n\r\n\u003cdiv align=\"center\"\u003e\r\n  \u003ca href=\"https://github.com/blessedrebus/krawl/blob/main/LICENSE\"\u003e\r\n    \u003cimg src=\"https://img.shields.io/github/license/blessedrebus/krawl\" alt=\"License\"\u003e\r\n  \u003c/a\u003e\r\n  \u003ca href=\"https://github.com/blessedrebus/krawl/releases\"\u003e\r\n    \u003cimg src=\"https://img.shields.io/github/v/release/blessedrebus/krawl\" alt=\"Release\"\u003e\r\n  \u003c/a\u003e\r\n\u003c/div\u003e\r\n\r\n\u003cdiv align=\"center\"\u003e\r\n  \u003ca href=\"https://ghcr.io/blessedrebus/krawl\"\u003e\r\n    \u003cimg src=\"https://img.shields.io/badge/ghcr.io-krawl-blue\" alt=\"GitHub Container Registry\"\u003e\r\n  \u003c/a\u003e\r\n  \u003ca href=\"https://kubernetes.io/\"\u003e\r\n    \u003cimg src=\"https://img.shields.io/badge/kubernetes-ready-326CE5?logo=kubernetes\u0026logoColor=white\" alt=\"Kubernetes\"\u003e\r\n  \u003c/a\u003e\r\n  \u003ca href=\"https://github.com/BlessedRebuS/Krawl/pkgs/container/krawl-chart\"\u003e\r\n    \u003cimg src=\"https://img.shields.io/badge/helm-chart-0F1689?logo=helm\u0026logoColor=white\" alt=\"Helm Chart\"\u003e\r\n  \u003c/a\u003e\r\n\u003c/div\u003e\r\n\r\n\u003cbr\u003e\r\n\r\n\u003cp align=\"center\"\u003e\r\n  \u003ca href=\"#what-is-krawl\"\u003eWhat is Krawl?\u003c/a\u003e •\r\n  \u003ca href=\"#-installation\"\u003eInstallation\u003c/a\u003e •\r\n  \u003ca href=\"#honeypot-pages\"\u003eHoneypot Pages\u003c/a\u003e •\r\n  \u003ca href=\"#dashboard\"\u003eDashboard\u003c/a\u003e •\r\n  \u003ca href=\"./ToDo.md\"\u003eTodo\u003c/a\u003e •\r\n  \u003ca href=\"#-contributing\"\u003eContributing\u003c/a\u003e\r\n\u003c/p\u003e\r\n\r\n\u003cbr\u003e\r\n\u003c/div\u003e\r\n\r\n## Demo\r\nTip: crawl the `robots.txt` paths for additional fun\r\n### Krawl URL: [http://demo.krawlme.com](http://demo.krawlme.com)\r\n### View the dashboard [http://demo.krawlme.com/das_dashboard](http://demo.krawlme.com/das_dashboard)\r\n\r\n## What is Krawl?\r\n\r\n**Krawl** is a cloud‑native deception server designed to detect, delay, and analyze malicious attackers, web crawlers and automated scanners.\r\n\r\nIt creates realistic fake web applications filled with low‑hanging fruit such as admin panels, configuration files, and exposed fake credentials to attract and identify suspicious activity.\r\n\r\nBy wasting attacker resources, Krawl helps clearly distinguish malicious behavior from legitimate crawlers.\r\n\r\nIt features:\r\n\r\n- **Spider Trap Pages**: Infinite random links to waste crawler resources based on the [spidertrap project](https://github.com/adhdproject/spidertrap)\r\n- **Fake Login Pages**: WordPress, phpMyAdmin, admin panels\r\n- **Honeypot Paths**: Advertised in robots.txt to catch scanners\r\n- **Fake Credentials**: Realistic-looking usernames, passwords, API keys\r\n- **[Canary Token](#customizing-the-canary-token) Integration**: External alert triggering\r\n- **Random server headers**: Confuse attacks based on server header and version\r\n- **Real-time Dashboard**: Monitor suspicious activity\r\n- **Customizable Wordlists**: Easy JSON-based configuration\r\n- **Random Error Injection**: Mimic real server behavior\r\n\r\n![dashboard](img/deception-page.png)\r\n\r\n![geoip](img/geoip_dashboard.png)\r\n\r\n## 🚀 Installation\r\n\r\n### Docker Run\r\n\r\nRun Krawl with the latest image:\r\n\r\n```bash\r\ndocker run -d \\\r\n  -p 5000:5000 \\\r\n  -e KRAWL_PORT=5000 \\\r\n  -e KRAWL_DELAY=100 \\\r\n  -e KRAWL_DASHBOARD_SECRET_PATH=\"/my-secret-dashboard\" \\\r\n  -e KRAWL_DATABASE_RETENTION_DAYS=30 \\\r\n  --name krawl \\\r\n  ghcr.io/blessedrebus/krawl:latest\r\n```\r\n\r\nAccess the server at `http://localhost:5000`\r\n\r\n### Docker Compose\r\n\r\nCreate a `docker-compose.yaml` file:\r\n\r\n```yaml\r\nservices:\r\n  krawl:\r\n    image: ghcr.io/blessedrebus/krawl:latest\r\n    container_name: krawl-server\r\n    ports:\r\n      - \"5000:5000\"\r\n    environment:\r\n      - CONFIG_LOCATION=config.yaml\r\n      - TZ=\"Europe/Rome\"\r\n    volumes:\r\n      - ./config.yaml:/app/config.yaml:ro\r\n      - krawl-data:/app/data\r\n    restart: unless-stopped\r\n\r\nvolumes:\r\n  krawl-data:\r\n```\r\n\r\nRun with:\r\n\r\n```bash\r\ndocker-compose up -d\r\n```\r\n\r\nStop with:\r\n\r\n```bash\r\ndocker-compose down\r\n```\r\n\r\n### Kubernetes\r\n**Krawl is also available natively on Kubernetes**. Installation can be done either [via manifest](kubernetes/README.md) or [using the helm chart](helm/README.md).\r\n\r\n## Use Krawl to Ban Malicious IPs\r\nKrawl uses a reputation-based system to classify attacker IP addresses. Every five minutes, Krawl exports the identified malicious IPs to a `malicious_ips.txt` file.\r\n\r\nThis file can either be mounted from the Docker container into another system or downloaded directly via `curl`:\r\n\r\n```bash\r\ncurl https://your-krawl-instance/\u003cDASHBOARD-PATH\u003e/api/download/malicious_ips.txt\r\n```\r\n\r\nThis file can be used to [update a set of firewall rules](https://www.allthingstech.ch/using-opnsense-and-ip-blocklists-to-block-malicious-traffic), for example on OPNsense and pfSense, enabling automatic blocking of malicious IPs or using IPtables\r\n\r\n## IP Reputation\r\nKrawl [uses tasks that analyze recent traffic to build and continuously update an IP reputation](src/tasks/analyze_ips.py) score. It runs periodically and evaluates each active IP address based on multiple behavioral indicators to classify it as an attacker, crawler, or regular user. Thresholds are fully customizable.\r\n\r\n![ip reputation](img/ip-reputation.png)\r\n\r\nThe analysis includes:\r\n- **Risky HTTP methods usage** (e.g. POST, PUT, DELETE ratios)\r\n- **Robots.txt violations**\r\n- **Request timing anomalies** (bursty or irregular patterns)\r\n- **User-Agent consistency**\r\n- **Attack URL detection** (e.g. SQL injection, XSS patterns)\r\n\r\nEach signal contributes to a weighted scoring model that assigns a reputation category:\r\n- `attacker`\r\n- `bad_crawler`\r\n- `good_crawler`\r\n- `regular_user`\r\n- `unknown` (for insufficient data)\r\n\r\nThe resulting scores and metrics are stored in the database and used by Krawl to drive dashboards, reputation tracking, and automated mitigation actions such as IP banning or firewall integration.\r\n\r\n## Forward server header\r\nIf Krawl is deployed behind a proxy such as NGINX the **server header** should be forwarded using the following configuration in your proxy:\r\n\r\n```bash\r\nlocation / {\r\n    proxy_pass https://your-krawl-instance;\r\n    proxy_pass_header Server;\r\n}\r\n```\r\n\r\n## API\r\nKrawl uses the following APIs\r\n- https://iprep.lcrawl.com (IP Reputation)\r\n- https://nominatim.openstreetmap.org/reverse (Reverse IP Lookup)\r\n- https://api.ipify.org (Public IP discovery)\r\n- http://ident.me (Public IP discovery)\r\n- https://ifconfig.me (Public IP discovery)\r\n\r\n## Configuration\r\nKrawl uses a **configuration hierarchy** in which **environment variables take precedence over the configuration file**. This approach is recommended for Docker deployments and quick out-of-the-box customization.\r\n\r\n### Configuration via Enviromental Variables\r\n\r\n| Environment Variable | Description | Default |\r\n|----------------------|-------------|---------|\r\n| `CONFIG_LOCATION` | Path to yaml config file | `config.yaml` |\r\n| `KRAWL_PORT` | Server listening port | `5000` |\r\n| `KRAWL_DELAY` | Response delay in milliseconds | `100` |\r\n| `KRAWL_SERVER_HEADER` | HTTP Server header for deception | `\"\"` |\r\n| `KRAWL_LINKS_LENGTH_RANGE` | Link length range as `min,max` | `5,15` |\r\n| `KRAWL_LINKS_PER_PAGE_RANGE` | Links per page as `min,max` | `10,15` |\r\n| `KRAWL_CHAR_SPACE` | Characters used for link generation | `abcdefgh...` |\r\n| `KRAWL_MAX_COUNTER` | Initial counter value | `10` |\r\n| `KRAWL_CANARY_TOKEN_URL` | External canary token URL | None |\r\n| `KRAWL_CANARY_TOKEN_TRIES` | Requests before showing canary token | `10` |\r\n| `KRAWL_DASHBOARD_SECRET_PATH` | Custom dashboard path | Auto-generated |\r\n| `KRAWL_PROBABILITY_ERROR_CODES` | Error response probability (0-100%) | `0` |\r\n| `KRAWL_DATABASE_PATH` | Database file location | `data/krawl.db` |\r\n| `KRAWL_DATABASE_RETENTION_DAYS` | Days to retain data in database | `30` |\r\n| `KRAWL_HTTP_RISKY_METHODS_THRESHOLD` | Threshold for risky HTTP methods detection | `0.1` |\r\n| `KRAWL_VIOLATED_ROBOTS_THRESHOLD` | Threshold for robots.txt violations | `0.1` |\r\n| `KRAWL_UNEVEN_REQUEST_TIMING_THRESHOLD` | Coefficient of variation threshold for timing | `0.5` |\r\n| `KRAWL_UNEVEN_REQUEST_TIMING_TIME_WINDOW_SECONDS` | Time window for request timing analysis in seconds | `300` |\r\n| `KRAWL_USER_AGENTS_USED_THRESHOLD` | Threshold for detecting multiple user agents | `2` |\r\n| `KRAWL_ATTACK_URLS_THRESHOLD` | Threshold for attack URL detection | `1` |\r\n| `KRAWL_INFINITE_PAGES_FOR_MALICIOUS` | Serve infinite pages to malicious IPs | `true` |\r\n| `KRAWL_MAX_PAGES_LIMIT` | Maximum page limit for crawlers | `250` |\r\n| `KRAWL_BAN_DURATION_SECONDS` | Ban duration in seconds for rate-limited IPs | `600` |\r\n\r\nFor example\r\n\r\n```bash\r\n# Set canary token\r\nexport CONFIG_LOCATION=\"config.yaml\" \r\nexport KRAWL_CANARY_TOKEN_URL=\"http://your-canary-token-url\"\r\n\r\n# Set number of pages range (min,max format)\r\nexport KRAWL_LINKS_PER_PAGE_RANGE=\"5,25\"\r\n\r\n# Set analyzer thresholds\r\nexport KRAWL_HTTP_RISKY_METHODS_THRESHOLD=\"0.2\"\r\nexport KRAWL_VIOLATED_ROBOTS_THRESHOLD=\"0.15\"\r\n\r\n# Set custom dashboard path\r\nexport KRAWL_DASHBOARD_SECRET_PATH=\"/my-secret-dashboard\"\r\n```\r\n\r\nExample of a Docker run with env variables:\r\n\r\n```bash\r\ndocker run -d \\\r\n  -p 5000:5000 \\\r\n  -e KRAWL_PORT=5000 \\\r\n  -e KRAWL_DELAY=100 \\\r\n  -e KRAWL_CANARY_TOKEN_URL=\"http://your-canary-token-url\" \\\r\n  --name krawl \\\r\n  ghcr.io/blessedrebus/krawl:latest\r\n```\r\n\r\n### Configuration via config.yaml\r\nYou can use the [config.yaml](config.yaml) file for more advanced configurations, such as Docker Compose or Helm chart deployments.\r\n\r\n# Honeypot\r\nBelow is a complete overview of the Krawl honeypot’s capabilities\r\n\r\n## robots.txt\r\nThe actual (juicy) robots.txt configuration [is the following](src/templates/html/robots.txt). \r\n\r\n## Honeypot pages\r\nRequests to common admin endpoints (`/admin/`, `/wp-admin/`, `/phpMyAdmin/`) return a fake login page. Any login attempt triggers a 1-second delay to simulate real processing and is fully logged in the dashboard (credentials, IP, headers, timing).\r\n\r\n![admin page](img/admin-page.png)\r\n\r\n\r\nRequests to paths like `/backup/`, `/config/`, `/database/`, `/private/`, or `/uploads/` return a fake directory listing populated with “interesting” files, each assigned a random file size to look realistic.\r\n\r\n![directory-page](img/directory-page.png)\r\n\r\nThe `.env` endpoint exposes fake database connection strings, **AWS API keys**, and **Stripe secrets**. It intentionally returns an error due to the `Content-Type` being `application/json` instead of plain text, mimicking a “juicy” misconfiguration that crawlers and scanners often flag as information leakage.\r\n\r\nThe `/server` page displays randomly generated fake error information for each known server.\r\n\r\n![server and env page](img/server-and-env-page.png)\r\n\r\nThe pages `/api/v1/users` and `/api/v2/secrets` show fake users and random secrets in JSON format\r\n\r\n![users and secrets](img/users-and-secrets.png)\r\n\r\nThe pages `/credentials.txt` and `/passwords.txt` show fake users and random secrets \r\n\r\n![credentials and passwords](img/credentials-and-passwords.png)\r\n\r\nPages such as `/users`, `/search`, `/contact`, `/info`, `/input`, and `/feedback`, along with APIs like `/api/sql` and `/api/database`, are designed to lure attackers into performing attacks such as **SQL injection** or **XSS**. \r\n\r\n![sql injection](img/sql_injection.png)\r\n\r\nAutomated tools like **SQLMap** will receive a different randomized database error on each request, increasing scan noise and confusing the attacker. All detected attacks are logged and displayed in the dashboard.\r\n\r\n## Customizing the Canary Token\r\nTo create a custom canary token, visit https://canarytokens.org\r\n\r\nand generate a “Web bug” canary token.\r\n\r\nThis optional token is triggered when a crawler fully traverses the webpage until it reaches 0. At that point, a URL is returned. When this URL is requested, it sends an alert to the user via email, including the visitor’s IP address and user agent.\r\n\r\n\r\nTo enable this feature, set the canary token URL [using the environment variable](#configuration-via-environment-variables) `CANARY_TOKEN_URL`.\r\n\r\n## Customizing the wordlist \r\n\r\nEdit `wordlists.json` to customize fake data for your use case\r\n\r\n```json\r\n{\r\n  \"usernames\": {\r\n    \"prefixes\": [\"admin\", \"root\", \"user\"],\r\n    \"suffixes\": [\"_prod\", \"_dev\", \"123\"]\r\n  },\r\n  \"passwords\": {\r\n    \"prefixes\": [\"P@ssw0rd\", \"Admin\"],\r\n    \"simple\": [\"test\", \"password\"]\r\n  },\r\n  \"directory_listing\": {\r\n    \"files\": [\"credentials.txt\", \"backup.sql\"],\r\n    \"directories\": [\"admin/\", \"backup/\"]\r\n  }\r\n}\r\n```\r\n\r\nor **values.yaml** in the case of helm chart installation\r\n\r\n## Dashboard\r\n\r\nAccess the dashboard at `http://\u003cserver-ip\u003e:\u003cport\u003e/\u003cdashboard-path\u003e`\r\n\r\nThe dashboard shows:\r\n- Total and unique accesses\r\n- Suspicious activity and attack detection\r\n- Top IPs, paths, user-agents and GeoIP localization\r\n- Real-time monitoring\r\n\r\nThe attackers’ access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged. \r\n\r\nKrawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website.\r\n\r\n![dashboard-1](img/dashboard-1.png)\r\n\r\nThe top IP Addresses is shown along with top paths and User Agents\r\n\r\n![dashboard-2](img/dashboard-2.png)\r\n\r\n![dashboard-3](img/dashboard-3.png)\r\n\r\n## 🤝 Contributing\r\n\r\nContributions welcome! Please:\r\n1. Fork the repository\r\n2. Create a feature branch\r\n3. Make your changes\r\n4. Submit a pull request (explain the changes!)\r\n\r\n\r\n\u003cdiv align=\"center\"\u003e\r\n\r\n## ⚠️ Disclaimer\r\n\r\n**This is a deception/honeypot system.**  \r\nDeploy in isolated environments and monitor carefully for security events.  \r\nUse responsibly and in compliance with applicable laws and regulations.\r\n\r\n## Star History\r\n\u003cimg src=\"https://api.star-history.com/svg?repos=BlessedRebuS/Krawl\u0026type=Date\" width=\"600\" alt=\"Star History Chart\" /\u003e\r\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBlessedRebuS%2FKrawl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FBlessedRebuS%2FKrawl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBlessedRebuS%2FKrawl/lists"}