{"id":29878235,"url":"https://github.com/ctag07/sarracenia","last_synced_at":"2025-12-24T04:51:21.056Z","repository":{"id":306685083,"uuid":"1026941934","full_name":"CTAG07/Sarracenia","owner":"CTAG07","description":"A Go based anti-scraper tarpit inspired by Nepenthes.","archived":false,"fork":false,"pushed_at":"2025-07-27T02:01:01.000Z","size":63,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-27T03:42:35.737Z","etag":null,"topics":["anti-scraper","defensive-security","go","golang","html-generation","markov-chain","sqlite","tarpit","web-security"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CTAG07.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-27T00:41:53.000Z","updated_at":"2025-07-27T03:15:54.000Z","dependencies_parsed_at":"2025-07-27T03:42:37.166Z","dependency_job_id":"06056669-81b6-411f-b8f1-51cd24cc4484","html_url":"https://github.com/CTAG07/Sarracenia","commit_stats":null,"previous_names":["ctag07/sarracenia"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/CTAG07/Sarracenia","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CTAG07%2FSarracenia","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CTAG07%2FSarracenia/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CTAG07%2FSarracenia/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CTAG07%2FSarracenia/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CTAG07","download_url":"https://codeload.github.com/CTAG07/Sarracenia/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CTAG07%2FSarracenia/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268003604,"owners_count":24179292,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-31T02:00:08.723Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anti-scraper","defensive-security","go","golang","html-generation","markov-chain","sqlite","tarpit","web-security"],"created_at":"2025-07-31T07:01:29.501Z","updated_at":"2025-12-24T04:51:21.048Z","avatar_url":"https://github.com/CTAG07.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sarracenia\n\n[![AGPLv3 License](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)\n[![Go Report Card](https://goreportcard.com/badge/github.com/CTAG07/Sarracenia)](https://goreportcard.com/report/github.com/CTAG07/Sarracenia)\n[![Go Version](https://img.shields.io/github/go-mod/go-version/CTAG07/Sarracenia)](https://golang.org)\n[![GitHub release (latest by date)](https://img.shields.io/github/v/release/CTAG07/Sarracenia)](https://github.com/CTAG07/Sarracenia/releases/latest)\n[![Docker Image](https://img.shields.io/badge/ghcr.io-ctag07/sarracenia-blue?logo=docker)](https://github.com/CTAG07/Sarracenia/pkgs/container/sarracenia)\n[![Repo size](https://img.shields.io/github/repo-size/CTAG07/Sarracenia)](https://github.com/CTAG07/Sarracenia)\n\n[![Sarracenia Test / Build / Release](https://github.com/CTAG07/Sarracenia/actions/workflows/go.yml/badge.svg)](https://github.com/CTAG07/Sarracenia/actions/workflows/go.yml)\n[![CodeQL Advanced](https://github.com/CTAG07/Sarracenia/actions/workflows/codeql.yml/badge.svg)](https://github.com/CTAG07/Sarracenia/actions/workflows/codeql.yml)\n\nA high-performance, configurable anti-scraper tarpit server written in Go.\n\nSarracenia acts as a defensive countermeasure against web scrapers by serving generated, endless, and plausibly structured web content. Its primary goal is to trap automated agents in infinite loops of fake data, preventing them from accessing legitimate resources.\n\n---\n\n## Architecture \u0026 Components\n\nSarracenia is built on a modular architecture, with its core logic separated into reusable libraries.\n\n### Database Architecture\n\nSarracenia utilizes a split SQLite architecture running in WAL (Write-Ahead Logging) mode to ensure high concurrency and stability under load.\n\n*   **Markov DB:** Stores training data and chain models.\n*   **Stats DB:** Handles high-frequency write operations for request logging and analytics.\n*   **Auth DB:** Manages API keys, whitelists, and other low-frequency configuration data.\n\nThis separation ensures that heavy background tasks, such as model training, do not block real-time statistics logging or administrative actions.\n\n### Core Libraries\n\n*   **`pkg/markov`**: A persistent Markov chain library supporting streaming generation, database-backed storage, and advanced sampling techniques.\n    *   [Documentation](./pkg/markov/README.md)\n*   **`pkg/templating`**: A dynamic HTML generation engine capable of producing complex, randomized DOM structures and executing logic-heavy templates.\n    *   [Documentation](./pkg/templating/README.md)\n\n---\n\n## Installation\n\n### 1. From Release (Recommended)\n\n1.  Download the latest binary for your OS from the [Releases Page](https://github.com/CTAG07/Sarracenia/releases/latest).\n2.  Download the Source code archive (zip/tar.gz) from the same release.\n3.  Extract the archive and copy the `example` directory contents to your working folder:\n    ```\n    /your/app/dir/\n    ├── sarracenia              # The binary\n    ├── config.json             # From example/config.json\n    └── data/                   # From example/data/\n    ```\n4.  Run the binary:\n    *   Linux/macOS: `./sarracenia`\n    *   Windows: `.\\sarracenia.exe`\n\n### 2. Docker\n\nA pre-built image is available on the GitHub Container Registry.\n\n```yaml\nservices:\n  sarracenia:\n    image: ghcr.io/ctag07/sarracenia:latest\n    container_name: sarracenia\n    restart: unless-stopped\n    ports:\n      - \"7277:7277\" # Tarpit Server\n      - \"7278:7278\" # Dashboard \u0026 API\n    volumes:\n      - ./data:/app/data\n```\n\n### 3. From Source\n\n**Prerequisites:** Go 1.24+\n\n```sh\ngit clone https://github.com/CTAG07/Sarracenia.git\ncd Sarracenia\ngo build -o sarracenia ./cmd/main\n./sarracenia\n```\n\n---\n\n## Initial Setup\n\n1.  **Access the Dashboard**\n    By default, the dashboard runs on port `:7278`. Open a browser and navigate to `http://localhost:7278`.\n\n2.  **Create Master Credentials**\n    Upon first launch, the API is unsecured to allow initialization.\n    *   Navigate to the **API Keys** page.\n    *   Create a new key. The first key created is automatically assigned the Master (`*`) scope.\n    *   **Copy this key immediately.** It will not be shown again.\n    *   Once created, the API and Dashboard are immediately secured, and you will be logged in automatically.\n\n---\n\n## Configuration\n\nConfiguration is managed via `config.json`.\n\n### Server Configuration (`server_config`)\n\n| Key                     | Description                                           | Default                                                            |\n|:------------------------|:------------------------------------------------------|:-------------------------------------------------------------------|\n| `server_addr`           | Tarpit server listener address.                       | `:7277`                                                            |\n| `api_addr`              | API/Dashboard server listener address.                | `:7278`                                                            |\n| `log_level`             | Logging verbosity (`debug`, `info`, `warn`, `error`). | `info`                                                             |\n| `data_dir`              | Base directory for data files.                        | `./data`                                                           |\n| `markov_database_path`  | Path to the Markov chain database.                    | `./data/sarracenia_markov.db?_journal_mode=WAL\u0026_busy_timeout=5000` |\n| `auth_database_path`    | Path to the Auth/Whitelist database.                  | `./data/sarracenia_auth.db?_journal_mode=WAL\u0026_busy_timeout=5000`   |\n| `stats_database_path`   | Path to the Statistics database.                      | `./data/sarracenia_stats.db?_journal_mode=WAL\u0026_busy_timeout=5000`  |\n| `dashboard_tmpl_path`   | Path to dashboard templates.                          | `./data/dashboard/templates/`                                      |\n| `dashboard_static_path` | Path to dashboard static assets.                      | `./data/dashboard/static/`                                         |\n\n### Tarpit Configuration (`tarpit_config`)\n\nControls the behavior of the tarpit response mechanism.\n\n| Key                  | Description                                                          | Default |\n|:---------------------|:---------------------------------------------------------------------|:--------|\n| `enable_drip_feed`   | If true, responses are sent in slow chunks to hold connections open. | `false` |\n| `initial_delay_ms`   | Delay before sending the first byte.                                 | `0`     |\n| `drip_feed_delay_ms` | Delay between subsequent chunks.                                     | `500`   |\n| `drip_feed_chunks`   | Total chunks to split the response into.                             | `10`    |\n\n### Statistics Configuration (`stats_config`)\n\n| Key                  | Description                                      | Default |\n|:---------------------|:-------------------------------------------------|:--------|\n| `sync_interval_sec`  | Frequency of flushing stats from memory to disk. | `30`    |\n| `forget_threshold`   | Minimum hits required to retain an IP record.    | `10`    |\n| `forget_delay_hours` | Time without activity before a record is pruned. | `24`    |\n\n### Template Configuration (`template_config`)\n\nThis object configures the templating engine. See the [full documentation here](./pkg/templating/README.md).\n\n### Threat Configuration (`threat_config`)\n\nConfigures the heuristic threat assessment system.\n\n| Key                  | Description                                     | Default |\n|:---------------------|:------------------------------------------------|:--------|\n| `base_threat`        | Initial score for any request.                  | `0`     |\n| `ip_hit_factor`      | Score added per IP hit.                         | `1.0`   |\n| `ua_hit_factor`      | Score added per User Agent hit.                 | `0.5`   |\n| `ip_hit_rate_factor` | Multiplier for IP hit rate (hits/min).          | `10.0`  |\n| `ua_hit_rate_factor` | Multiplier for UA hit rate (hits/min).          | `5.0`   |\n| `max_threat`         | Maximum possible threat score.                  | `1000`  |\n| `fallback_level`     | Default threat stage (0-4) if no threshold met. | `0`     |\n\n**Threat Stages:**\nStages define thresholds for triggering increasingly aggressive tarpit templates.\n\n| Stage     | Enabled | Threshold |\n|:----------|:--------|:----------|\n| `stage_1` | `True`  | `0`       |\n| `stage_2` | `False` | `25`      |\n| `stage_3` | `False` | `50`      |\n| `stage_4` | `False` | `75`      |\n| `stage_5` | `False` | `100`     |\n\n---\n\n## API Reference\n\n**Note:** The API is designed for internal use by the dashboard. It does not implement rate limiting. Do not expose the API port directly to the public internet.\n\nAll endpoints require the `sarr-auth` header containing a valid API key.\n\n### Authentication (`/api/auth`)\n\n| Method   | Endpoint              | Scope         | Description                                        |\n|:---------|:----------------------|:--------------|:---------------------------------------------------|\n| `GET`    | `/api/auth/me`        | *Any*         | Validates current session.                         |\n| `GET`    | `/api/auth/keys`      | `auth:manage` | Lists API keys.                                    |\n| `POST`   | `/api/auth/keys`      | `auth:manage` | Creates a new key. **First key is always Master.** |\n| `DELETE` | `/api/auth/keys/{id}` | `auth:manage` | Deletes a key.                                     |\n\n### Markov Models (`/api/markov`)\n\n**⚠️ Concurrency Warning:** Only one model can be trained at a time. Simultaneous training jobs will result in database lock errors.\n\n| Method   | Endpoint                             | Scope          | Description                       |\n|:---------|:-------------------------------------|:---------------|:----------------------------------|\n| `GET`    | `/api/markov/models`                 | `markov:read`  | Lists available models.           |\n| `POST`   | `/api/markov/models`                 | `markov:write` | Creates a new model.              |\n| `DELETE` | `/api/markov/models/{name}`          | `markov:write` | Deletes a model.                  |\n| `POST`   | `/api/markov/models/{name}/train`    | `markov:write` | Trains a model (Text/Plain body). |\n| `POST`   | `/api/markov/models/{name}/prune`    | `markov:write` | Prunes model data.                |\n| `GET`    | `/api/markov/models/{name}/export`   | `markov:read`  | Exports model as JSON.            |\n| `POST`   | `/api/markov/models/{name}/generate` | `markov:read`  | Generates text.                   |\n| `POST`   | `/api/markov/import`                 | `markov:write` | Imports a model from JSON.        |\n| `POST`   | `/api/markov/vocabulary/prune`       | `markov:write` | Global vocabulary pruning.        |\n| `GET`    | `/api/markov/training/status`        | `markov:read`  | Checks training status.           |\n\n### Server Control (`/api/server`)\n\n| Method | Endpoint               | Scope            | Description          |\n|:-------|:-----------------------|:-----------------|:---------------------|\n| `GET`  | `/api/health`          | *None*           | Health check.        |\n| `GET`  | `/api/server/version`  | `stats:read`     | Server version info. |\n| `GET`  | `/api/server/config`   | `server:config`  | Get current config.  |\n| `PUT`  | `/api/server/config`   | `server:config`  | Update config.       |\n| `POST` | `/api/server/restart`  | `server:control` | Restart server.      |\n| `POST` | `/api/server/shutdown` | `server:control` | Shutdown server.     |\n\n### Statistics (`/api/stats`)\n\n| Method   | Endpoint                     | Scope            | Description               |\n|:---------|:-----------------------------|:-----------------|:--------------------------|\n| `GET`    | `/api/stats/summary`         | `stats:read`     | Global request summary.   |\n| `GET`    | `/api/stats/top_ips`         | `stats:read`     | Top 100 IPs by hit count. |\n| `GET`    | `/api/stats/top_user_agents` | `stats:read`     | Top 100 User Agents.      |\n| `DELETE` | `/api/stats/all`             | `server:control` | **Reset all statistics.** |\n\n### Templates (`/api/templates`)\n\n| Method   | Endpoint                 | Scope             | Description                 |\n|:---------|:-------------------------|:------------------|:----------------------------|\n| `GET`    | `/api/templates`         | `templates:read`  | List all templates.         |\n| `GET`    | `/api/templates/{name}`  | `templates:read`  | Get template content.       |\n| `PUT`    | `/api/templates/{name}`  | `templates:write` | Create/Update template.     |\n| `DELETE` | `/api/templates/{name}`  | `templates:write` | Delete template.            |\n| `POST`   | `/api/templates/refresh` | `templates:write` | Reload templates from disk. |\n| `POST`   | `/api/templates/test`    | `templates:read`  | Test template syntax.       |\n| `GET`    | `/api/templates/preview` | `templates:read`  | Render template preview.    |\n\n### Whitelist (`/api/whitelist`)\n\n| Method   | Endpoint                   | Scope             | Description                       |\n|:---------|:---------------------------|:------------------|:----------------------------------|\n| `GET`    | `/api/whitelist/ip`        | `whitelist:read`  | List whitelisted IPs.             |\n| `POST`   | `/api/whitelist/ip`        | `whitelist:write` | Add IP to whitelist.              |\n| `DELETE` | `/api/whitelist/ip`        | `whitelist:write` | Remove IP from whitelist.         |\n| `GET`    | `/api/whitelist/useragent` | `whitelist:read`  | List whitelisted User Agents.     |\n| `POST`   | `/api/whitelist/useragent` | `whitelist:write` | Add User Agent to whitelist.      |\n| `DELETE` | `/api/whitelist/useragent` | `whitelist:write` | Remove User Agent from whitelist. |\n\n---\n\n## License\n\nThis project is licensed under the AGPLv3.\n\n**Alternative Licensing:**\nIf you require a permissive license (e.g., MIT) for commercial or closed-source use, please contact the maintainer at **`82781942+CTAG07@users.noreply.github.com`**.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fctag07%2Fsarracenia","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fctag07%2Fsarracenia","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fctag07%2Fsarracenia/lists"}