{"id":19174592,"url":"https://github.com/equalitie/baskervillehall","last_synced_at":"2025-10-11T11:10:59.331Z","repository":{"id":185958566,"uuid":"666464463","full_name":"equalitie/baskervillehall","owner":"equalitie","description":"Bot mitigation ","archived":false,"fork":false,"pushed_at":"2025-06-21T10:12:49.000Z","size":210,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-06-21T11:23:59.322Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/equalitie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-07-14T15:28:25.000Z","updated_at":"2025-05-26T11:32:03.000Z","dependencies_parsed_at":"2023-12-19T18:47:59.260Z","dependency_job_id":"40c5be4f-5639-48ff-b3cc-3ec6a34f4729","html_url":"https://github.com/equalitie/baskervillehall","commit_stats":null,"previous_names":["equalitie/baskervillehall"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/equalitie/baskervillehall","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2Fbaskervillehall","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2Fbaskervillehall/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2Fbaskervillehall/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2Fbaskervillehall/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/equalitie","download_url":"https://codeload.github.com/equalitie/baskervillehall/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2Fbaskervillehall/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262608854,"owners_count":23336580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T10:18:26.865Z","updated_at":"2025-10-11T11:10:54.307Z","avatar_url":"https://github.com/equalitie.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## What is Baskerville\n\nBaskerville is an intelligent analytics engine designed to detect and mitigate Layer 7 (application layer) DDoS attacks by analyzing web request behavior in real time. Originally developed for the Deflect platform, it enables infrastructure to respond proactively to suspicious or malicious traffic before it causes disruption.\n\nBaskerville uses machine learning to distinguish between normal and abnormal traffic patterns at the session level. It classifies traffic into categories like human users, legitimate bots, malicious bots, and AI-driven crawlers.\n\n### Inputs and Outputs\n\nBaskerville receives structured web logs as input, typically from Deflect edge nodes, Cloudflare Workers, or AWS Lambda.\n\nUsing this data, Baskerville analyzes each session and, when necessary, emits challenge commands as output. These commands can then be consumed and executed by the same platforms — Deflect, Cloudflare, or AWS Lambda — to block, rate-limit, or further inspect suspicious IPs.\n\nIn addition, Baskerville offers a user-facing dashboard for monitoring traffic, analyzing attack patterns, reviewing challenge decisions, and managing configuration policies.\n\n### Key Challenges Baskerville Solves\n\n- Fast Detection  \n  Real-time analysis ensures suspicious activity is caught early enough to prevent damage.\n\n- Traffic Adaptability  \n  Designed to scale with volatile and diverse traffic loads using technologies like Apache Kafka (and optionally Apache Spark).\n\n- Actionable Predictions  \n  For each suspicious session or IP, Baskerville issues a prediction and can trigger a challenge, such as a block, redirect, or CAPTCHA—customizable per deployment.\n\n- Human vs Bot Identification  \n  Combines heuristics and ML to reliably distinguish human visitors from automated agents.\n\n- AI Crawler Detection  \n  Tracks advanced scraping and probing behaviors characteristic of LLM-based or stealth crawlers.\n\n- Prediction Reliability  \n  A feedback loop and probation period mechanism reduce false positives and improve model accuracy over time.\n\n- Learning from Imperfect Data  \n  With limited labeled data, Baskerville relies on unsupervised anomaly detection, trained on mostly normal behavior, yet robust to minor contamination.\n\n\n\n# Baskerville – Web Request Intelligence \u0026 Bot Mitigation Engine\n\nBaskerville is an intelligent traffic analysis engine that classifies incoming web traffic into categories such as:\n\n- Normal human connections\n- Verified crawlers (Google, Bing, DuckDuckGo, etc.)\n- Malicious bots\n- AI crawlers\n\nIt enables website operators to detect and mitigate suspicious or harmful sessions in real time.\n\n\n\n## How It Works\n\n### 1. Session Grouping\nIncoming web requests are grouped into sessions based on:\n\n- Requested host\n- Client IP address\n- Session cookie\n\n### 2. Feature Extraction\nEach session is analyzed through a set of computed features, including:\n\n- Average URL path depth  \n- Number of unique queries  \n- User agent reputation score  \n- HTML-to-image content ratio  \n- ...and more\n\n### 3. Traffic Classification\nSessions are classified into:\n\n- Human traffic\n- Automated traffic\n\nThis classification uses heuristic rules, logical checks, and supervised ML models.\n\n### 4. Anomaly Detection\nSeparate unsupervised anomaly detection models are trained for each class (human vs. automated) to identify outliers.\n\n\n\n## Operating Modes\n\nBaskerville supports two primary modes of operation:\n\n- War Mode  \n  Aggressively challenges or blocks all non-verified bots, allowing only trusted crawlers (e.g., Googlebot).\n\n- Peace Mode  \n  Targets only anomalous sessions, applying challenges selectively to reduce friction for normal users.\n\n\n\n## Architecture Overview\n\n- Input Sources:\n  - Deflect platform logs\n  - Cloudflare Workers\n  - AWS Lambda\n\n- Output Actions:\n  - Challenge or block commands\n  - Sent to Banjax, Cloudflare Worker, or AWS Lambda\n\n- Deployment:\n  - Run in online mode for real-time processing and mitigation\n\n\n\n## Key Features\n\n- Session-level analysis of web traffic\n- Supervised and unsupervised ML-based classification\n- AI crawler detection and blocking\n- VPN and Tor exit node detection\n- Configurable response strategy (challenge/block/allow)\n- Custom challenge engine support\n- REST dashboard for analytics, statistics, and configuration\n\n\n## Tech Stack\n\nBaskerville is built on a modern, scalable analytics stack designed for high-throughput environments:\n\n- **Apache Kafka** – Used for distributed messaging and stream processing\n- **PostgreSQL** – Stores session metadata, model outputs, and feedback\n- **Python** – Core language for orchestration, feature extraction, and machine learning logic\n- **Scikit-learn** – Used for supervised classification models (e.g. human vs bot)\n- **TensorFlow** – Used for anomaly detection and deep learning components\n- **Kubernetes** – Baskerville services are deployed and orchestrated within a Kubernetes cluster for resilience and scalability\n\n## Dashboard\n\nBaskerville includes a web dashboard for:\n\n- Visualizing live and historical traffic statistics\n- Monitoring detected threats\n- Reviewing challenge decisions\n- Managing configuration and response policies\n\n\n## Use Cases\n- Defending high-risk websites from botnet attacks\n- Protecting content from AI web scrapers\n- Reducing false positives by classifying sessions intelligently\n- Augmenting WAFs with behavioral intelligence\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fequalitie%2Fbaskervillehall","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fequalitie%2Fbaskervillehall","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fequalitie%2Fbaskervillehall/lists"}