{"id":51237031,"url":"https://github.com/disposable/public-dns-crawler","last_synced_at":"2026-06-28T21:09:48.974Z","repository":{"id":349145022,"uuid":"1200870861","full_name":"disposable/public-dns-crawler","owner":"disposable","description":"DNS and DoH resolver inventory","archived":false,"fork":false,"pushed_at":"2026-06-14T19:04:29.000Z","size":556,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-14T20:23:08.936Z","etag":null,"topics":["crawler","dns","dns-over-https","doh","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/disposable.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-03T23:32:13.000Z","updated_at":"2026-06-14T19:04:26.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/disposable/public-dns-crawler","commit_stats":null,"previous_names":["disposable/public-dns-crawler"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/disposable/public-dns-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/disposable%2Fpublic-dns-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/disposable%2Fpublic-dns-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/disposable%2Fpublic-dns-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/disposable%2Fpublic-dns-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/disposable","download_url":"https://codeload.github.com/disposable/public-dns-crawler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/disposable%2Fpublic-dns-crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34903908,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-28T02:00:05.809Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","dns","dns-over-https","doh","python"],"created_at":"2026-06-28T21:09:48.402Z","updated_at":"2026-06-28T21:09:48.968Z","avatar_url":"https://github.com/disposable.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Public DNS and DoH resolver crawler\n\nAggregate, validate, score, and export public DNS and DoH resolvers.\n\n## Features\n\n- **Multi-source discovery** - plain DNS from public-dns.info, DoH from curl wiki and AdGuard provider lists, manual seed files\n- **Pre-validation filtering records** - source and normalization drops are exported as `filtered.json` with reason codes\n- **Full endpoint metadata** - DoH records preserve URL, host, port, path, TLS server name, bootstrap IPs, and provenance\n- **Active validation** - reachability, NXDOMAIN fidelity, latency, consistency, TLS validity\n- **Pluggable test corpus** - controlled zone, local external JSON corpus, or tiny built-in fallback\n- **Scored output** - component-based scoring (correctness, availability, performance, history) with separate confidence score, score caps, and derived metrics\n- **Multiple export formats** - JSON, plain text, dnsdist config, Unbound forward-zone\n\n## Source feeds\n\nDefault discovery sources configured in `configs/default.toml`:\n\n- `publicdns_info` (plain DNS): \u003chttps://public-dns.info/nameservers.csv\u003e\n  - default filter: `min_reliability = 0.50`\n- `curl_wiki` (DoH): \u003chttps://raw.githubusercontent.com/wiki/curl/curl/DNS-over-HTTPS.md\u003e\n- `adguard` (DoH): \u003chttps://raw.githubusercontent.com/AdguardTeam/KnowledgeBaseDNS/master/docs/general/dns-providers.md\u003e\n- `manual` seeds (local files):\n  - `configs/manual-dns.txt`\n  - `configs/manual-doh.toml`\n\n## Quick start\n\n```bash\n# Install with uv\nuv sync --group dev\n\n# Full pipeline (discover → validate → export)\nuv run resolver-inventory refresh --config configs/default.toml --output outputs/latest\n\n# Full pipeline with an external local probe corpus\nuv run resolver-inventory refresh \\\n  --config configs/default.toml \\\n  --probe-corpus tests/fixtures/probe-corpus-valid.json \\\n  --output outputs/latest\n\n# Validate a corpus file before using it\nuv run resolver-inventory validate-probe-corpus --input tests/fixtures/probe-corpus-valid.json\n\n# Inspect exported files\ncat outputs/latest/accepted.json\ncat outputs/latest/resolvers.txt\ncat outputs/latest/dnsdist.conf\n```\n\n## CLI\n\n```\nresolver-inventory discover   # gather raw candidates\nresolver-inventory validate   # run probes, emit scored records\nresolver-inventory refresh    # full pipeline (discover + validate + export)\nresolver-inventory split-candidates     # deterministic candidate sharding\nresolver-inventory materialize-results  # merge validated shards and export outputs\nresolver-inventory validate-probe-corpus --input FILE\nresolver-inventory generate-probe-corpus [--config FILE] [--seed-file FILE] [--output DIR]\nresolver-inventory export json     [--input FILE] [--output FILE]\nresolver-inventory export text     [--input FILE] [--output FILE]\nresolver-inventory export dnsdist  [--input FILE] [--output FILE]\nresolver-inventory export unbound  [--input FILE] [--output FILE]\n```\n\nGlobal flags: `--config FILE`, `--log-level {DEBUG,INFO,WARNING,ERROR}`.\n\nValidation commands also support `--probe-corpus FILE`, `--validation-parallelism N`, `--dns-backend {python,massdns}`, `--massdns-bin PATH`, and `--massdns-hashmap-size N`. When `--probe-corpus` is provided, the CLI sets `validation.corpus.mode = \"external\"` and loads probes from that local JSON file.\nJSON exports are written in compact form and sorted deterministically by endpoint identity to keep diffs stable.\nUse `--split-json-max-bytes N` on `refresh`, `materialize-results`, or `export json` to split large JSON outputs into `.part-XXXX` files.\n\n### Staged pipeline commands\n\nFor multi-VM flows (for example GitHub Actions matrix validation), use:\n\n1. `discover --output candidates.json --filtered-output filtered.json`\n2. `split-candidates --input candidates.json --output-dir chunks --shards 10`\n3. `validate --input chunks/chunk-XX.json --output shard-XX.json`\n4. `materialize-results --inputs-glob \"shards/*.json\" --filtered-input filtered.json --output outputs/latest`\n\n### Exported file meanings\n\n- `accepted.json` - resolvers with status `accepted`\n- `candidate.json` - resolvers with status `candidate`\n- `rejected.json` - resolvers with status `rejected`; only failed probes are kept, and `all_probes_failed` is set when every probe failed\n- `filtered.json` - candidates dropped before validation, including source filtering, normalization failures, duplicates, and historical quarantine\n- `resolvers.txt` - accepted plain DNS resolvers only, as `host:port`\n- `resolvers-doh.txt` - accepted DoH resolvers only, as full HTTPS endpoints\n- `dnsdist.conf` - dnsdist backends for all non-rejected resolvers\n- `unbound-forward.conf` - accepted plain DNS resolvers rendered as Unbound forward zones\n\nIf `--split-json-max-bytes` is used, large JSON outputs are written as `name.part-XXXX.json` chunks instead of a single large file.\n\n## Library API\n\n```python\nfrom resolver_inventory.sources import discover_candidates\nfrom resolver_inventory.validate import validate_candidates\nfrom resolver_inventory.export import export_dnsdist, export_json\nfrom resolver_inventory.settings import load_settings\n\nsettings = load_settings(\"configs/default.toml\")\ncandidates = discover_candidates(settings)\nresults = validate_candidates(candidates, settings)\nprint(export_json(results))\n```\n\n## Configuration\n\nCopy `configs/default.toml` and edit. Config format is **TOML** (stdlib `tomllib`, no extra deps):\n\n```toml\n[[sources.dns]]\ntype = \"publicdns_info\"        # fetch from public-dns.info CSV\nmin_reliability = 0.50         # drop unstable entries below this reliability score\n\n[[sources.dns]]\ntype = \"manual\"\npath = \"configs/manual-dns.txt\"\n\n[[sources.doh]]\ntype = \"curl_wiki\"             # scrape curl's DoH providers page\n\n[[sources.doh]]\ntype = \"adguard\"               # fetch AdGuard providers markdown list\n\n[[sources.doh]]\ntype = \"manual\"\npath = \"configs/manual-doh.toml\"\n\n[validation]\nrounds = 3\ntimeout_ms = 2000\nparallelism = 50\n\n[validation.dns_backend]\nkind = \"python\"                       # default backend; \"massdns\" is optional\nmassdns_bin = \"massdns\"\nhashmap_size = 2000\nprocesses = 1\nsocket_count = 1\ninterval_ms = 0\npredictable = true\nflush = true\nbatch_max_queries = 50000\nstderr_log_level = \"debug\"\nfallback_to_python_on_error = true\n\n[validation.corpus]\nmode = \"external\"                     # \"controlled\", \"fallback\", or \"external\"\nzone = \"dns-test.example.net\"         # controlled mode only\npath = \"tests/fixtures/probe-corpus-valid.json\"\nschema_version = 1\nallow_builtin_fallback = false\nstrict = true\n\n[scoring]\naccept_min_score = 80\ncandidate_min_score = 60\n\n[export]\nformats = [\"json\", \"text\", \"dnsdist\"]\noutput_dir = \"outputs/latest\"\n```\n\n`publicdns_info` accepts an optional `min_reliability` setting. Entries with a lower score are ignored before validation. The default is `0.50`.\n\n### Optional MassDNS backend\n\nPlain DNS validation supports two backends:\n\n- `python` (default): existing dnspython probe path\n- `massdns` (optional): batched subprocess path with streaming stdin/stdout pipes\n\nDoH validation is unchanged and never uses MassDNS.\n\nMassDNS phase-1 routing limitations:\n\n- Only `dns-udp` probes on port `53` are routed to MassDNS.\n- `dns-tcp` and non-53 plain DNS probes automatically use the python backend.\n- `doh` probes always use the existing DoH path.\n- `latency_ms` on MassDNS can have lower fidelity depending on output fields.\n\nInstall MassDNS before enabling it:\n\n```bash\nmassdns --help\n```\n\nStart with conservative tuning in CI. Increase `hashmap_size` (`-s`) only after validating memory and throughput tradeoffs in your environment.\n\n### Corpus modes\n\n| Mode | Description |\n|---|---|\n| `controlled` | Uses your own authoritative zone with fixed RRs. Best accuracy. Requires `zone` to be set. |\n| `external` | Loads a local JSON corpus file, validates schema/version, and converts it into validator probes. Requires `path`. |\n| `fallback` | Uses a tiny built-in low-variance fallback corpus. Intended as an emergency/dev fallback, not the main path. |\n\n### External corpus example\n\n```toml\n[validation.corpus]\nmode = \"external\"\npath = \"tests/fixtures/probe-corpus-valid.json\"\nschema_version = 1\nallow_builtin_fallback = false\nstrict = true\n```\n\nMinimal required external corpus shape:\n\n```json\n{\n  \"schema_version\": 1,\n  \"corpus_version\": \"test-001\",\n  \"generated_at\": \"2026-04-04T00:00:00Z\",\n  \"probes\": [\n    {\n      \"id\": \"pos-example-a\",\n      \"kind\": \"positive_consensus\",\n      \"qname\": \"example.com.\",\n      \"qtype\": \"A\",\n      \"expected_mode\": \"baseline_match\"\n    },\n    {\n      \"id\": \"neg-generated-a\",\n      \"kind\": \"negative_generated\",\n      \"qname_template\": \"{uuid}.com.\",\n      \"qtype\": \"A\",\n      \"expected_mode\": \"nxdomain\"\n    }\n  ]\n}\n```\n\n## Validation Flow\n\nThe corpus is built or loaded once per validation run, then reused for every candidate in that run.\n\n```mermaid\nflowchart TD\n    A[CLI command\u003cbr/\u003evalidate or refresh] --\u003e B[load_settings]\n    B --\u003e C{--probe-corpus provided?}\n    C --\u003e|yes| D[force corpus mode to external\u003cbr/\u003eset validation.corpus.path]\n    C --\u003e|no| E[use config as written]\n    D --\u003e F[validate_candidates]\n    E --\u003e F\n    F --\u003e G[build_corpus once per run]\n    G --\u003e H{corpus mode}\n    H --\u003e|controlled| I[build controlled corpus from zone]\n    H --\u003e|external| J[load local JSON file]\n    H --\u003e|fallback| K[build tiny built-in fallback corpus]\n    J --\u003e L[validate schema and schema_version]\n    L --\u003e M{load succeeded?}\n    M --\u003e|yes| N[convert probes to internal Corpus]\n    M --\u003e|no and allow_builtin_fallback=true| K\n    M --\u003e|no and allow_builtin_fallback=false| O[fail run]\n    I --\u003e P[reuse same Corpus for all candidates]\n    K --\u003e P\n    N --\u003e P\n    P --\u003e Q{candidate transport}\n    Q --\u003e|dns-udp or dns-tcp| R[validate_dns_candidate]\n    Q --\u003e|doh| S[validate_doh_candidate]\n\n    R --\u003e R1[run positive probes]\n    R1 --\u003e R2{expected_mode}\n    R2 --\u003e|exact_rrset| R3[query candidate over plain DNS\u003cbr/\u003enormalize RRset\u003cbr/\u003ecompare with expected_answers]\n    R2 --\u003e|consensus_match| R4[query candidate over plain DNS\u003cbr/\u003equery trusted baselines\u003cbr/\u003ecompare unordered normalized answers]\n    R2 --\u003e|nxdomain| R5[expand qname_template at runtime\u003cbr/\u003equery candidate over plain DNS\u003cbr/\u003erequire negative response without spoofing]\n    R3 --\u003e T[score results\u003cbr/\u003eload history\u003cbr/\u003ecompute components\u003cbr/\u003eapply caps]\n    R4 --\u003e T\n    R5 --\u003e T\n\n    S --\u003e S1[run positive probes]\n    S1 --\u003e S2{expected_mode}\n    S2 --\u003e|exact_rrset| S3[query candidate over DoH\u003cbr/\u003evalidate TLS / HTTP path\u003cbr/\u003enormalize RRset\u003cbr/\u003ecompare with expected_answers]\n    S2 --\u003e|consensus_match| S4[query candidate over DoH\u003cbr/\u003equery trusted baselines\u003cbr/\u003ecompare unordered normalized answers]\n    S2 --\u003e|nxdomain| S5[expand qname_template at runtime\u003cbr/\u003equery candidate over DoH\u003cbr/\u003erequire negative response without synthetic answers]\n    S3 --\u003e T\n    S4 --\u003e T\n    S5 --\u003e T\n\n    T --\u003e U[export outputs]\n```\n\n`validate_dns_candidate` and `validate_doh_candidate` both consume the same prebuilt corpus,\nbut they execute transport-specific query code. `exact_rrset` probes compare directly against\npinned answers, `consensus_match` probes compare the candidate against the configured trusted\nbaseline resolvers, and `negative_generated` probes keep the template in the corpus and expand a\nfresh query name at execution time.\n\n### Validation reason codes\n\n| Code | Meaning |\n|---|---|\n| `nxdomain_spoofing` | Resolver returned NOERROR for a nonexistent name |\n| `tls_name_mismatch` | DoH TLS certificate does not match the expected server name |\n| `timeout_rate_high` | More than 50% of probes timed out |\n| `latency_p95_high` | 95th-percentile latency exceeds 2 s |\n| `unexpected_nxdomain` | Resolver returned NXDOMAIN for a name that should exist |\n| `unexpected_rcode` | Resolver returned an unexpected RCODE |\n| `udp_only` | Only UDP probes ran (no TCP confirmation) |\n\n## Scoring System\n\nThe validation result includes a composite `score` (0-100) that reflects resolver quality, as well as a separate `confidence_score` (0-100) that reflects how certain we are about the measurement.\n\n### Score Components\n\nThe final score is a weighted sum of four components:\n\n| Component | Weight | Description |\n|-----------|--------|-------------|\n| `correctness` | 0-50 | Penalties for DNS/TLS errors, answer mismatches, NXDOMAIN spoofing |\n| `availability` | 0-20 | Based on probe success rate (100% = 20 pts, 50% = 10 pts) |\n| `performance` | 0-20 | Latency penalties for p50, p95, and jitter thresholds |\n| `history` | 0-10 | Rewards sustained stability, penalizes flapping and recent failures |\n\nComponent scores are included in the JSON export as `score_breakdown`.\n\n### Performance Penalty Thresholds\n\n**p50 (median) latency:**\n- \u003e100 ms: -3 points\n- \u003e300 ms: -8 points\n- \u003e700 ms: -18 points\n- \u003e1500 ms: -30 points\n\n**p95 (tail) latency:**\n- \u003e400 ms: -2 points\n- \u003e800 ms: -6 points\n- \u003e1500 ms: -12 points\n- \u003e2500 ms: -20 points\n\n**Jitter (p95 - p50):**\n- \u003e150 ms: -2 points\n- \u003e400 ms: -6 points\n- \u003e900 ms: -12 points\n\nReason codes: `latency_high`, `latency_very_high`, `latency_p95_high`, `latency_jitter_high`\n\n### Hard-Fail Correctness Issues\n\nSevere correctness problems cap the final score at ≤59 regardless of other factors:\n- `nxdomain_spoofing`\n- `tls_name_mismatch`\n- `answer_mismatch`\n- `unexpected_rcode_suspicious` (REFUSED/SERVFAIL patterns)\n\n### History-Based Caps\n\nWithout sufficient observation history, scores are capped:\n- \u003c3 runs observed: max 90\n- 3-6 runs observed: max 95\n- 7-13 runs observed: max 98\n- 14+ runs: no cap from history\n\n### Score of 100 Requirements\n\nA perfect score of 100 requires ALL of the following:\n- No correctness issues (no penalties)\n- No performance penalties (low latency)\n- 100% probe success rate\n- ≥14 observed runs in history\n- ≤2 status flaps in 30 days\n- No consecutive failure days\n- Confidence score ≥90\n\n### Confidence Score\n\nThe `confidence_score` (0-100) is computed separately and reflects measurement certainty, not resolver quality:\n- Probe count (max 30): more probes = higher confidence\n- Latency samples (max 20): more samples = higher confidence\n- Historical observations (max 35): more runs = higher confidence\n- Source metadata (max 15): reliability data present = higher confidence\n\nMissing source reliability reduces confidence but does not penalize the quality score.\n\n### Exported Score Fields\n\nJSON exports include these new fields:\n\n```json\n{\n  \"score\": 87,\n  \"score_breakdown\": {\n    \"correctness\": 50,\n    \"availability\": 18,\n    \"performance\": 12,\n    \"history\": 7\n  },\n  \"confidence_score\": 65,\n  \"score_caps_applied\": [\"insufficient_history\"],\n  \"derived_metrics\": {\n    \"p50_latency_ms\": 45.2,\n    \"p95_latency_ms\": 120.5,\n    \"jitter_ms\": 75.3,\n    \"latency_sample_count\": 10,\n    \"runs_seen_30d\": 5,\n    \"runs_seen_7d\": 3,\n    \"flaps_30d\": 0,\n    \"consecutive_success_days\": 5,\n    \"consecutive_fail_days\": 0\n  }\n}\n```\n\n## Development\n\n```bash\n# Install dev dependencies\nuv sync --group dev\n\n# Run all tests\nuv run pytest\n\n# Run only unit tests (fast, no I/O)\nuv run pytest tests/unit\n\n# Run only integration tests (local fixtures, no public network)\nuv run pytest -m integration tests/integration\n\n# Lint\nuv run ruff check .\nuv run ruff format .\n\n# Type-check (Python 3.13.x)\nuv run pyright\n\n# Build the package\nuv build\n```\n\nAll local test commands in this repository are expected to run through `uv run ...` after\n`uv sync --group dev`. No manual `PYTHONPATH=src` bootstrap is required.\n\n## Probe Corpus Docker Flow\n\nThe probe corpus generator can run in Docker and write its artifacts into\n`outputs/probe-corpus/`.\n\n```bash\n# Build the generator image\nmake probe-corpus-build-image\n\n# Generate the corpus into outputs/probe-corpus/\nmake probe-corpus-generate\n\n# Validate the generated JSON corpus\nmake probe-corpus-validate\n\n# Run the normal resolver refresh with that generated corpus\nmake refresh-with-probe-corpus\n```\n\nEquivalent direct Docker invocation:\n\n```bash\nmkdir -p outputs/probe-corpus\ndocker build -f docker/probe-corpus.Dockerfile -t resolver-inventory-probe-corpus .\ndocker run --rm -v \"$PWD/outputs/probe-corpus:/out\" resolver-inventory-probe-corpus\nuv run resolver-inventory validate-probe-corpus \\\n  --config configs/probe-corpus.toml \\\n  --input outputs/probe-corpus/probe-corpus.json \\\n  --schema-version 2\nuv run resolver-inventory refresh \\\n  --config configs/default.toml \\\n  --probe-corpus outputs/probe-corpus/probe-corpus.json \\\n  --output outputs/latest\n```\n\n## CI\n\n- **`ci.yml`** - lint, type-check, unit tests (matrix: Linux/macOS/Windows), integration tests, build\n- **`release.yml`** - builds and publishes to PyPI via trusted publishing on `v*` tags\n- **`refresh.yml`** - scheduled/manual multi-job pipeline:\n  1. build the Docker probe-corpus generator\n  2. generate `outputs/probe-corpus/probe-corpus.json`\n  3. validate the generated corpus\n  4. pass `probe-corpus` to the refresh job as a workflow artifact\n  5. run `refresh --probe-corpus ...`\n  6. upload `refreshed-resolver-data`\n  7. run optional non-blocking network canaries\n\nRequired PR checks never touch public resolvers.\n\n## History and reporting scripts\n\nThese helper scripts are used by the parent data repo workflow and are intentionally separate from the main CLI:\n\n- `scripts/apply_history_quarantine.py` - drops currently quarantined plain DNS hosts from discovered candidates and appends `historical_dns_quarantine` entries to `filtered.json`\n- `scripts/update_history.py` - updates `meta/history.duckdb` from `validated.json`, `filtered.json`, and build metadata\n- `scripts/generate_stats_report.py` - regenerates the `\u003c!-- GENERATED_STATS_* --\u003e` README statistics section from history data\n- `scripts/analyze_scores.py` - analyzes score distribution from validation results and can compare before/after runs\n- `scripts/analyze_history.py` - inspects history database and prints diagnostics about resolver history coverage\n\n## History System\n\nThe crawler maintains historical data in `meta/history.duckdb` using DuckDB. The history system tracks resolver status over time to support:\n\n1. **Score history caps** - resolvers with insufficient observation history get capped scores\n2. **DNS host quarantine** - hosts rejected for 14+ consecutive days are temporarily excluded\n3. **Stability metrics** - streaks, flapping detection, and success rate tracking\n\n### Schema Overview\n\nThe history database uses the current schema with these tables:\n\n**runs** - Run-level metadata with unique `run_id` supporting multiple runs per day:\n- `run_id`, `run_date`, `run_started_at`, `generated_at`\n- `github_run_id`, `repo_sha`, `crawler_sha`, `run_type`\n\n**resolver_run_status** - Per-resolver status for each run (allows intra-day runs):\n- `run_id`, `resolver_key`, `host`, `transport`, `endpoint_url`, `port`, `path`\n- `status`, `reasons_signature`, `reasons_json`\n- `accepted_probe_count`, `failed_probe_count`, `total_probe_count`\n- `p50_latency_ms`, `p95_latency_ms`, `jitter_ms`\n\n**resolver_daily** - Daily rollup aggregating all runs per day:\n- `run_date`, `resolver_key`, `host`, `transport`\n- `day_status`, `reasons_signature`, `reasons_json`\n- `runs_that_day`, `successful_runs_that_day`, `failed_runs_that_day`\n- `flapped_within_day`\n- `p50_latency_ms`, `p95_latency_ms`, `jitter_ms`\n\n**dns_host_quarantine** - Host-level DNS quarantine state derived from `resolver_daily`:\n- `host`, `first_quarantined_on`, `last_quarantined_on`, `retry_after`\n- `reasons_signature`, `reasons_json`, `cycles`\n\n### Resolver Identity (resolver_key)\n\nHistory is tracked per-endpoint using a canonical `resolver_key`:\n\n- **DNS UDP**: `dns-udp|host|port` (e.g., `dns-udp|1.1.1.1|53`)\n- **DNS TCP**: `dns-tcp|host|port` (e.g., `dns-tcp|1.1.1.1|53`)\n- **DoH**: `doh|url` (e.g., `doh|https://dns.example.com/dns-query`)\n\nThis allows:\n- Same host on different transports to have independent history\n- Different DoH endpoints on same host to be tracked separately\n- Accurate per-endpoint scoring and caps\n\n### History Availability\n\nScoring distinguishes between:\n- **No history rows for a resolver**: history analysis is skipped (no history score/caps/reasons).\n- **Sparse existing history**: history-based caps and stability logic can still apply.\n- **Meaningful history**: full history scoring behavior applies.\n\n### Intra-day Run Support\n\nThe schema supports multiple validation runs per day:\n- Multiple entries in `runs` with same `run_date` but different `run_id`\n- Multiple entries in `resolver_run_status` per run\n- `resolver_daily` aggregates all runs for a day into one row\n- History counting uses distinct days, not raw run counts\n\nThis allows future 4x/day scheduling without inflating \"days of history\" counts.\n\n## License\n\nMIT © disposable\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdisposable%2Fpublic-dns-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdisposable%2Fpublic-dns-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdisposable%2Fpublic-dns-crawler/lists"}