{"id":41985164,"url":"https://github.com/wolffcatskyy/crowdsec-blocklist-import","last_synced_at":"2026-02-17T15:03:14.544Z","repository":{"id":334614368,"uuid":"1142058328","full_name":"wolffcatskyy/crowdsec-blocklist-import","owner":"wolffcatskyy","description":"Dockerized tool to import public threat feeds into CrowdSec - 28+ free blocklists, 60k+ IPs","archived":false,"fork":false,"pushed_at":"2026-01-26T09:27:10.000Z","size":68,"stargazers_count":109,"open_issues_count":12,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-01-26T13:55:16.210Z","etag":null,"topics":["blocklist","crowdsec","cybersecurity","docker","firewall","ip-blocklist","security","threat-feeds","threat-intelligence","tor-exit-nodes"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wolffcatskyy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":null,"ko_fi":null,"buy_me_a_coffee":null}},"created_at":"2026-01-25T22:07:14.000Z","updated_at":"2026-01-26T13:46:42.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/wolffcatskyy/crowdsec-blocklist-import","commit_stats":null,"previous_names":["wolffcatskyy/crowdsec-blocklist-import"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/wolffcatskyy/crowdsec-blocklist-import","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wolffcatskyy%2Fcrowdsec-blocklist-import","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wolffcatskyy%2Fcrowdsec-blocklist-import/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wolffcatskyy%2Fcrowdsec-blocklist-import/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wolffcatskyy%2Fcrowdsec-blocklist-import/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wolffcatskyy","download_url":"https://codeload.github.com/wolffcatskyy/crowdsec-blocklist-import/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wolffcatskyy%2Fcrowdsec-blocklist-import/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28970194,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T05:48:53.985Z","status":"ssl_error","status_checked_at":"2026-02-01T05:47:55.855Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blocklist","crowdsec","cybersecurity","docker","firewall","ip-blocklist","security","threat-feeds","threat-intelligence","tor-exit-nodes"],"created_at":"2026-01-26T00:11:16.678Z","updated_at":"2026-02-17T15:03:14.532Z","avatar_url":"https://github.com/wolffcatskyy.png","language":"Shell","funding_links":[],"categories":["Container Operations","Blocklist \u0026 Threat Intelligence"],"sub_categories":["Security","Other Bouncers"],"readme":"# CrowdSec Blocklist Import - Python Edition\n\nMemory-efficient Python 3.11+ implementation of [crowdsec-blocklist-import](https://github.com/wolffcatskyy/crowdsec-blocklist-import) using the LAPI HTTP API directly.\n\n## Features\n\n- **LAPI Mode Only**: Direct HTTP API calls, no Docker socket needed\n- **Memory Efficient**: Streaming downloads, line-by-line processing\n- **Batch Processing**: Configurable batch size (default 1000 IPs)\n- **Full IPv4/IPv6 Support**: Uses Python's `ipaddress` module\n- **Automatic Deduplication**: Skips existing CrowdSec decisions\n- **Retry Logic**: Exponential backoff for failed requests\n- **Type Hints**: Full type annotations for IDE support\n- **30+ Blocklists**: Same sources as the bash version\n- **Per-feed Control**: Enable/disable individual blocklist sources\n- **Prometheus Metrics**: Built-in metrics endpoint for monitoring\n\n## Quick Start\n\n### Prerequisites\n\nCrowdSec LAPI requires **machine credentials** to write decisions. Create them first:\n\n```bash\n# On your CrowdSec host (or docker exec crowdsec ...)\ncscli machines add blocklist-import --password 'YourSecurePassword'\n\n# Also create a bouncer key for reading existing decisions\ncscli bouncers add blocklist-import -o raw\n```\n\n### Docker Compose\n\n```yaml\nservices:\n  blocklist-import:\n    image: ghcr.io/wolffcatskyy/crowdsec-blocklist-import-python:latest\n    container_name: blocklist-import\n    restart: \"no\"\n    networks:\n      - crowdsec  # Must be on same network as CrowdSec\n    environment:\n      - CROWDSEC_LAPI_URL=http://crowdsec:8080\n      - CROWDSEC_LAPI_KEY=${CROWDSEC_LAPI_KEY}\n      - CROWDSEC_MACHINE_ID=blocklist-import\n      - CROWDSEC_MACHINE_PASSWORD=${CROWDSEC_MACHINE_PASSWORD}\n      - DECISION_DURATION=24h\n      - TZ=America/New_York\n\nnetworks:\n  crowdsec:\n    external: true\n```\n\n### Docker Compose with Secrets\n\nFor production deployments, use Docker secrets instead of environment variables:\n\n```yaml\nsecrets:\n  crowdsec_lapi_key:\n    file: ./secrets/crowdsec_lapi_key.txt\n  crowdsec_machine_password:\n    file: ./secrets/crowdsec_machine_password.txt\n\nservices:\n  blocklist-import:\n    image: ghcr.io/wolffcatskyy/crowdsec-blocklist-import-python:latest\n    container_name: blocklist-import\n    restart: \"no\"\n    networks:\n      - crowdsec\n    secrets:\n      - crowdsec_lapi_key\n      - crowdsec_machine_password\n    environment:\n      - CROWDSEC_LAPI_URL=http://crowdsec:8080\n      - CROWDSEC_LAPI_KEY_FILE=/run/secrets/crowdsec_lapi_key\n      - CROWDSEC_MACHINE_ID=blocklist-import\n      - CROWDSEC_MACHINE_PASSWORD_FILE=/run/secrets/crowdsec_machine_password\n      - DECISION_DURATION=24h\n      - TZ=America/New_York\n\nnetworks:\n  crowdsec:\n    external: true\n```\n\nCreate the secrets files:\n\n```bash\nmkdir -p ./secrets\necho \"your-bouncer-api-key\" \u003e ./secrets/crowdsec_lapi_key.txt\necho \"your-machine-password\" \u003e ./secrets/crowdsec_machine_password.txt\nchmod 600 ./secrets/*.txt\n```\n\n### Direct Execution\n\n```bash\n# Install dependencies\npip install -r requirements.txt\n\n# Configure\ncp .env.example .env\n# Edit .env with your credentials\n\n# Run\npython blocklist_import.py\n\n# Or dry-run first\npython blocklist_import.py --dry-run\n```\n\n## CLI Options\n\n### Bash Script (import.sh)\n\n```text\nUsage: import.sh [OPTIONS]\n\nOptions:\n  --help, -h          Show help message and exit\n  --version, -v       Show version number and exit\n  --list-sources      List all available blocklist sources with their toggle variables\n  --dry-run           Run without making changes (same as DRY_RUN=true)\n\nExamples:\n  ./import.sh --version           # Show version\n  ./import.sh --help              # Show help\n  ./import.sh --list-sources      # List all 30 blocklist sources\n  ./import.sh --dry-run           # Preview what would be imported\n  ENABLE_TOR_EXIT_NODES=false ./import.sh  # Run with Tor sources disabled\n```\n\n### Python Script (blocklist_import.py)\n\n```text\nusage: blocklist_import.py [-h] [-v] [-n] [-d] [--lapi-url LAPI_URL]\n                           [--lapi-key LAPI_KEY] [--duration DURATION]\n                           [--batch-size BATCH_SIZE] [--validate]\n                           [--list-sources] [--metrics-port PORT]\n                           [--no-metrics]\n\noptions:\n  -h, --help            show this help message and exit\n  -v, --version         show version and exit\n  -n, --dry-run         don't import, just show what would be done\n  -d, --debug           enable debug logging\n  --lapi-url LAPI_URL   CrowdSec LAPI URL\n  --lapi-key LAPI_KEY   CrowdSec LAPI key (bouncer)\n  --duration DURATION   decision duration (e.g., 24h, 48h)\n  --batch-size SIZE     IPs per import batch\n  --validate            validate configuration and exit\n  --list-sources        list all available blocklist sources\n  --metrics-port PORT   port for Prometheus metrics (default: 9102)\n  --no-metrics          disable Prometheus metrics endpoint\n```\n\n## Environment Variable Validation\n\nThe tool validates all `ENABLE_*` environment variables at startup:\n\n1. **Value validation**: All `ENABLE_*` variables must be valid boolean strings (`true`, `false`, `1`, `0`, `yes`, `no`, `on`, `off`)\n2. **Typo detection**: Unknown `ENABLE_*` variables generate warnings with suggestions for similar valid names\n\n### Validation Examples\n\n```bash\n# Validate configuration without running import\n./blocklist_import.py --validate\n\n# List all available blocklist sources and their status\n./blocklist_import.py --list-sources\n```\n\n### Error Messages\n\nInvalid values will cause the program to exit with a clear error message:\n\n```text\n[ERROR] Configuration validation failed:\n[ERROR]\n[ERROR]   Invalid value for ENABLE_IPSUM: 'maybe'\n[ERROR]     Expected one of: true, false, 1, 0, yes, no, on, off (case-insensitive)\n[ERROR]\n[ERROR] Fix the above errors and try again.\n[ERROR] Use --list-sources to see all valid ENABLE_* variables.\n```\n\nTypos in variable names generate warnings but don't stop execution:\n\n```text\n[WARNING] Unknown environment variable: ENABLE_IPSOM=false\n[WARNING]   Did you mean: ENABLE_IPSUM?\n```\n\n## Removing all blocked IPs\n\nAll added decisions have their origin set to `blocklist-import`, so they can be cleared by running:\n\n```bash\ncscli decisions delete --origin blocklist-import\n```\n\n## Environment Variables\n\n### Required\n\n| Variable | Description |\n|----------|-------------|\n| `CROWDSEC_LAPI_URL` | CrowdSec LAPI URL (default: `http://localhost:8080`) |\n| `CROWDSEC_LAPI_KEY` or `CROWDSEC_LAPI_KEY_FILE` | Bouncer API key / key file for reading decisions |\n| `CROWDSEC_MACHINE_ID` | Machine ID for writing decisions |\n| `CROWDSEC_MACHINE_PASSWORD` or `CROWDSEC_MACHINE_PASSWORD_FILE` | Machine password / password file for authentication |\n\n### Optional\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `ALLOWLIST` | `` | Comma-separated list of blocklist row data to ignore |\n| `DECISION_DURATION` | `24h` | How long decisions last |\n| `LOG_TIMESTAMPS` | `true` | Include timestamps in logs |\n| `DECISION_REASON` | `external_blocklist` | The decision identifier |\n| `DECISION_TYPE` | `ban` | The type of decision applied |\n| `DECISION_ORIGIN` | `blocklist-import` | The decision origin name |\n| `DECISION_SCENARIO` | `external/blocklist` | The decision scenario name |\n| `BATCH_SIZE` | `1000` | IPs per import batch |\n| `FETCH_TIMEOUT` | `60` | The fetch timeout in seconds |\n| `MAX_RETRIES` | `3` | How many times to retry fetching in case of error |\n| `LOG_LEVEL` | `INFO` | DEBUG, INFO, WARN, ERROR |\n| `DRY_RUN` | `false` | Set to true for dry run |\n| `TELEMETRY_ENABLED` | `true` | Anonymous usage telemetry |\n| `TELEMETRY_URL` | `https://bouncer-telemetry.ms2738.workers.dev/ping` | Anonymous usage telemetry URL |\n| `METRICS_ENABLED` | `true` | Enable Prometheus metrics endpoint |\n| `METRICS_PORT` | `9102` | Port for Prometheus metrics HTTP server |\n\n### Blocklist Toggles\n\nAll blocklists are enabled by default. Set to `false` to disable:\n\n| Variable | Source |\n|----------|--------|\n| `ENABLE_IPSUM` | IPsum (aggregated threat intel) |\n| `ENABLE_SPAMHAUS` | Spamhaus DROP/EDROP |\n| `ENABLE_BLOCKLIST_DE` | Blocklist.de (all feeds) |\n| `ENABLE_FIREHOL` | Firehol levels 1/2/3 |\n| `ENABLE_ABUSE_CH` | Feodo, URLhaus |\n| `ENABLE_EMERGING_THREATS` | Emerging Threats |\n| `ENABLE_BINARY_DEFENSE` | Binary Defense |\n| `ENABLE_BRUTEFORCE_BLOCKER` | Bruteforce Blocker |\n| `ENABLE_DSHIELD` | DShield |\n| `ENABLE_CI_ARMY` | CI Army |\n| `ENABLE_BOTVRIJ` | Botvrij |\n| `ENABLE_GREENSNOW` | GreenSnow |\n| `ENABLE_STOPFORUMSPAM` | StopForumSpam |\n| `ENABLE_TOR` | Tor exit nodes |\n| `ENABLE_SCANNERS` | Shodan/Censys/Maltrail |\n| `ENABLE_ABUSE_IPDB` | Abuse IPDB |\n| `ENABLE_CYBERCRIME_TRACKER` | Cybercrime tracker |\n| `ENABLE_MONTY_SECURITY_C2` | Monty Security C2 |\n| `ENABLE_VXVAULT` | VX Vault |\n\n## Authentication\n\nCrowdSec LAPI uses two types of authentication:\n\n1. **Bouncer API Key** (`X-Api-Key` header) - Read-only access to decisions\n2. **Machine Credentials** (JWT token via `/watchers/login`) - Full access including writing alerts/decisions\n\nThis tool requires both:\n\n- Bouncer key for checking existing decisions (deduplication)\n- Machine credentials for writing new decisions via the `/alerts` endpoint\n\n## Allow-lists\n\nThe `ALLOWLIST` environment variable can be used to specify block-list rows to ignore.\n\nIf the original row to ignore ends contains comment, it should not be included in the allow-list item\n\n## Prometheus Metrics\n\nThe blocklist importer exposes Prometheus metrics on port 9102 (configurable via `METRICS_PORT`).\n\n### Enabling Metrics\n\nMetrics are enabled by default. To expose them in Docker:\n\n```yaml\nservices:\n  blocklist-import:\n    image: ghcr.io/wolffcatskyy/crowdsec-blocklist-import-python:latest\n    ports:\n      - \"9102:9102\"\n    environment:\n      - METRICS_ENABLED=true\n      - METRICS_PORT=9102\n```\n\n### Available Metrics\n\n| Metric | Type | Description |\n|--------|------|-------------|\n| `blocklist_import_total_ips` | Gauge | Total number of IPs imported in the last run |\n| `blocklist_import_last_run_timestamp` | Gauge | Unix timestamp of the last import run |\n| `blocklist_import_sources_enabled` | Gauge | Number of enabled blocklist sources |\n| `blocklist_import_sources_successful` | Gauge | Number of sources successfully fetched |\n| `blocklist_import_sources_failed` | Gauge | Number of sources that failed to fetch |\n| `blocklist_import_existing_decisions` | Gauge | Number of existing CrowdSec decisions found |\n| `blocklist_import_new_ips` | Gauge | Number of new unique IPs added |\n| `blocklist_import_errors_total` | Counter | Total number of errors (labels: `error_type`) |\n| `blocklist_import_duration_seconds` | Histogram | Duration of import run in seconds |\n\n### Error Types\n\nThe `blocklist_import_errors_total` counter uses the `error_type` label:\n\n- `fetch` - Failed to download a blocklist source\n- `parse` - Failed to parse an IP address from blocklist\n- `encoding` - Character encoding errors\n- `import` - Failed to import IPs to CrowdSec LAPI\n\n### Example Prometheus Config\n\n```yaml\nscrape_configs:\n  - job_name: 'blocklist-import'\n    static_configs:\n      - targets: ['blocklist-import:9102']\n    scrape_interval: 5m\n```\n\n### Example Grafana Queries\n\n```promql\n# IPs imported over time\nblocklist_import_total_ips\n\n# Import success rate\nblocklist_import_sources_successful / blocklist_import_sources_enabled\n\n# Time since last run (for alerting on stale imports)\ntime() - blocklist_import_last_run_timestamp\n\n# Error rate by type\nrate(blocklist_import_errors_total[1h])\n```\n\n## Memory Efficiency\n\nThis implementation is designed to handle 500k+ IPs without memory issues:\n\n1. **Streaming Downloads**: Blocklists are processed line-by-line, never fully loaded\n2. **Batch Imports**: IPs are sent to LAPI in configurable batches\n3. **Set Deduplication**: Only unique IPs are tracked (O(1) lookup)\n\nTypical memory usage: ~50-100MB even with millions of IPs processed.\n\n## Scheduling\n\n### Cron (Linux)\n\n```cron\n# Daily at 4am\n0 4 * * * /path/to/blocklist_import.py \u003e\u003e /var/log/blocklist-import.log 2\u003e\u00261\n```\n\n### Docker Compose with Cron\n\n```yaml\nservices:\n  blocklist-import:\n    image: ghcr.io/wolffcatskyy/crowdsec-blocklist-import-python:latest\n    restart: \"no\"\n    environment:\n      - CROWDSEC_LAPI_URL=http://crowdsec:8080\n      - CROWDSEC_LAPI_KEY=${CROWDSEC_LAPI_KEY}\n      - CROWDSEC_MACHINE_ID=blocklist-import\n      - CROWDSEC_MACHINE_PASSWORD=${CROWDSEC_MACHINE_PASSWORD}\n```\n\nSchedule with:\n\n```bash\n0 4 * * * docker compose -f /path/to/compose.yaml up --abort-on-container-exit\n```\n\n## Comparison with Bash Version\n\n| Feature | Bash | Python |\n|---------|------|--------|\n| CrowdSec Access | Docker exec / Native cscli | LAPI HTTP only |\n| Memory Usage | ~200MB+ (temp files) | ~50-100MB (streaming) |\n| Dependencies | curl, awk, grep, sort | requests, python-dotenv |\n| IPv6 Support | Limited | Full (ipaddress module) |\n| Per-feed Control | No | Yes (ENABLE_* vars) |\n| Type Safety | No | Yes (type hints) |\n| Error Handling | Basic | Retry with backoff |\n| Authentication | None (uses cscli) | Machine JWT + Bouncer key |\n\n## Development\n\n```bash\n# Install dev dependencies\npip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Type checking\nmypy blocklist_import.py\n\n# Linting\nruff check blocklist_import.py\n```\n\n## Contributors\n\nA huge thanks to those who have helped make this project better!\n\n- **[@gaelj](https://github.com/gaelj)** - Major contributor to v3.3.0, implementing Docker secrets support (`_FILE` env vars), allowlist functionality, CLI enhancements, Prometheus metrics, environment validation with typo detection, and adding multiple new blocklist sources. [PR #30](https://github.com/wolffcatskyy/crowdsec-blocklist-import/pull/30)\n\nContributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n## License\n\nMIT License - see [LICENSE](LICENSE)\n\n### Allow-lists (v2.2.0)\n\nRemove specific IPs or CIDRs from blocklists before import:\n\n```bash\n# Inline allow-list\n-e ALLOWLIST=\"140.82.121.3,140.82.121.4,8.8.8.8\"\n\n# From URL\n-e ALLOWLIST_URL=\"https://example.com/my-allowlist.txt\"\n\n# From file\n-e ALLOWLIST_FILE=\"/path/to/allowlist.txt\"\n```\n\nAllow-list format: One IP or CIDR per line, `#` for comments.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwolffcatskyy%2Fcrowdsec-blocklist-import","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwolffcatskyy%2Fcrowdsec-blocklist-import","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwolffcatskyy%2Fcrowdsec-blocklist-import/lists"}