{"id":50182023,"url":"https://github.com/ipanalytics/geoforge","last_synced_at":"2026-05-25T07:05:07.800Z","repository":{"id":359074394,"uuid":"1244352103","full_name":"ipanalytics/GeoForge","owner":"ipanalytics","description":"GeoForge compiles a local IPv4 GeoIP database from multiple free or low-cost data sources. The builder uses DB-IP Lite as the prefix seed, merges location candidates from MaxMind GeoLite2, IP2Location LITE, Sypex Geo, and allowlisted operator geofeeds, then enriches the result with GeoNames postal and city reference data.","archived":false,"fork":false,"pushed_at":"2026-05-20T11:29:55.000Z","size":1490,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-20T12:20:37.367Z","etag":null,"topics":["consensus","data-pipeline","data-quality","db-ip","geofeed","geoip","geolite2","geolocation","geonames","ip-geolocation","ip2location","maxmind","mmdb","rir","sypexgeo","whois"],"latest_commit_sha":null,"homepage":"https://ipanalytics.github.io/GeoForge/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ipanalytics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-20T07:27:26.000Z","updated_at":"2026-05-20T11:29:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ipanalytics/GeoForge","commit_stats":null,"previous_names":["ipanalytics/geoforge"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/ipanalytics/GeoForge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipanalytics%2FGeoForge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipanalytics%2FGeoForge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipanalytics%2FGeoForge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipanalytics%2FGeoForge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ipanalytics","download_url":"https://codeload.github.com/ipanalytics/GeoForge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipanalytics%2FGeoForge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33464014,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-25T06:32:55.349Z","status":"ssl_error","status_checked_at":"2026-05-25T06:32:35.322Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["consensus","data-pipeline","data-quality","db-ip","geofeed","geoip","geolite2","geolocation","geonames","ip-geolocation","ip2location","maxmind","mmdb","rir","sypexgeo","whois"],"created_at":"2026-05-25T07:04:41.736Z","updated_at":"2026-05-25T07:05:07.781Z","avatar_url":"https://github.com/ipanalytics.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GeoForge\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/geoforge-banner.png\" alt=\"GeoForge banner\" width=\"100%\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/license-source--available-blue\" alt=\"License\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/ipanalytics/GeoForge/actions\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/actions/workflow/status/ipanalytics/GeoForge/build.yml?branch=main\" alt=\"CI\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/ipanalytics/GeoForge\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/last-commit/ipanalytics/GeoForge\" alt=\"Last Commit\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/ipanalytics/GeoForge\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/output-mmdb-success\" alt=\"Output\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/ipanalytics/GeoForge\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/geoip-consensus_engine-informational\" alt=\"Consensus Engine\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/ipanalytics/GeoForge\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/go-1.24+-informational\" alt=\"Go\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n---\n\nGeoForge is a reproducible GeoIP database compiler that builds a local MaxMind-compatible MMDB from multiple independent geolocation sources.\n\nThe pipeline combines DB-IP Lite, MaxMind GeoLite2, IP2Location LITE, Sypex Geo, operator geofeeds, RIR delegated statistics, and GeoNames reference datasets into a normalized consensus-based geolocation layer.\n\nPrimary outputs:\n\n* `release/geo.mmdb`\n* `release/geo.csv`\n* `release/geo-quality-report.txt`\n\n---\n\n## Overview\n\nFree GeoIP datasets typically optimize for broad coverage, not cross-source validation.\n\nGeoForge approaches geolocation as a consensus problem:\n\n```text id=\"v6v3l1\"\nprefix seed\n    -\u003e multi-source candidate collection\n    -\u003e confidence scoring\n    -\u003e normalization\n    -\u003e conflict resolution\n    -\u003e reproducible MMDB output\n```\n\nThe builder merges independent signals, downranks inconsistent records, applies conservative normalization rules, and produces an auditable local database suitable for gateways, analytics, enrichment services, fraud systems, and infrastructure tooling.\n\nThe project is designed for:\n\n* offline local lookups\n* deterministic rebuilds\n* source transparency\n* quality regression tracking\n* operational GeoIP enrichment\n\n---\n\n## Architecture\n\n```text id=\"w6u0a6\"\n                 Source Databases\n                         │\n      ┌──────────────────┼──────────────────┐\n      │                  │                  │\n      ▼                  ▼                  ▼\n   GeoLite2         IP2Location        Sypex Geo\n      │                  │                  │\n      └──────────────┬───┴──────────────────┘\n                     ▼\n              Consensus Engine\n         scoring / merge / weighting\n                     ▼\n             GeoNames Enrichment\n          postal / city normalization\n                     ▼\n               Output Cleanup\n         timezone / precision / QA\n                     ▼\n                 MMDB Export\n```\n\n---\n\n## Repository Layout\n\n```text id=\"qzjlwm\"\ncmd/\n├── builder/         Main database compiler\n└── qualitycheck/    Post-build validation\n\ninternal/\n├── consensus/       Merge and scoring logic\n├── geofeed/         RFC 8805 parser/index\n├── geozip/          GeoNames enrichment\n├── output/          Final normalization\n├── refdata/         Country/currency metadata\n├── rirstats/        RIR delegated statistics\n└── strnorm/         String normalization\n\ndata/\nrelease/\nscripts/\n```\n\nDownloaded source datasets and generated outputs are intentionally gitignored.\n\n---\n\n## Data Sources\n\n| Source                | Role                           |\n| --------------------- | ------------------------------ |\n| DB-IP Lite            | Required prefix seed           |\n| MaxMind GeoLite2 City | Consensus baseline             |\n| IP2Location LITE DB5  | Independent geo signal         |\n| Sypex Geo City        | CIS-focused enrichment         |\n| RFC 8805 geofeeds     | Operator-published corrections |\n| RIR delegated stats   | Registry country attribution   |\n| GeoNames postal dump  | Postal enrichment              |\n| GeoNames cities1000   | City geoname resolution        |\n\nGeoForge separates source collection from output generation, allowing partial builds when some datasets are unavailable.\n\n---\n\n## Build\n\nCreate a local environment file:\n\n```bash id=\"98c9ga\"\ncp admin.env.example admin.env\n```\n\nRun the pipeline:\n\n```bash id=\"91ol1k\"\n./geo.sh\n```\n\nThe build workflow:\n\n1. Acquires a release lock\n2. Downloads updated datasets\n3. Detects source changes\n4. Runs Go tests\n5. Compiles builders\n6. Generates MMDB + CSV outputs\n7. Runs quality validation\n\nForce rebuild:\n\n```bash id=\"kt8o26\"\nFORCE_BUILD=1 ./geo.sh\n```\n\nDisable downloads:\n\n```bash id=\"dffxx7\"\nAUTO_DOWNLOAD=0 ./geo.sh\n```\n\n---\n\n## Outputs\n\n| File                             | Description                       |\n| -------------------------------- | --------------------------------- |\n| `release/geo.mmdb`               | MaxMind-compatible GeoIP database |\n| `release/geo.csv`                | CSV audit/export copy             |\n| `release/geo-quality-report.txt` | Post-build quality analysis       |\n| `release/geo.previous.csv`       | Previous snapshot for diffing     |\n\n---\n\n## Record Schema\n\nEach MMDB entry contains top-level metadata plus a nested `location` object.\n\n### Top-Level Fields\n\n| Field               | Description                       |\n| ------------------- | --------------------------------- |\n| `matched_prefix`    | CIDR written into MMDB            |\n| `confidence`        | Consensus confidence score        |\n| `source_updated_at` | UTC build timestamp               |\n| `country_metadata`  | Country/currency/calling metadata |\n| `location`          | Final geolocation object          |\n\n### Location Fields\n\n| Field                   | Description                    |\n| ----------------------- | ------------------------------ |\n| `continent_code`        | Continent code                 |\n| `country_code`          | ISO country                    |\n| `registry_country_code` | RIR-derived registry country   |\n| `subdivision_name`      | Normalized admin region        |\n| `city_geoname_id`       | GeoNames city identifier       |\n| `city_name`             | Normalized city                |\n| `postal_code`           | Consensus postal code          |\n| `latitude`              | Rounded latitude               |\n| `longitude`             | Rounded longitude              |\n| `time_zone`             | Derived timezone               |\n| `accuracy_radius_km`    | Conservative accuracy estimate |\n\n---\n\n## Example Record\n\n```json id=\"k7v7uy\"\n{\n  \"ip\": \"1.208.10.20\",\n  \"matched_prefix\": \"1.208.0.0/12\",\n  \"confidence\": 85,\n  \"source_updated_at\": \"2026-05-20T00:00:00Z\",\n  \"location\": {\n    \"country_code\": \"KR\",\n    \"country_name\": \"South Korea\",\n    \"city_name\": \"Seoul\",\n    \"postal_code\": \"04524\",\n    \"latitude\": 37.56631,\n    \"longitude\": 126.9772,\n    \"time_zone\": \"Asia/Seoul\",\n    \"accuracy_radius_km\": 20\n  }\n}\n```\n\n---\n\n## Quality Model\n\nGeoForge is designed to improve operational quality through source consensus rather than raw source replacement.\n\nExpected improvements over single-source lite datasets:\n\n* better country stability\n* improved city consistency\n* stronger CIS coverage\n* cleaner normalization\n* more conservative precision signaling\n* reduced malformed text artifacts\n\nThe builder intentionally favors stable consensus over aggressive precision claims.\n\n---\n\n## Quality Gate\n\nAfter each build, the pipeline runs a post-build validation stage and generates:\n\n```text id=\"2ifjlwm\"\nrelease/geo-quality-report.txt\n```\n\nValidation includes:\n\n* coverage statistics\n* confidence distribution\n* added/removed prefixes\n* country/city regressions\n* mojibake detection\n* MMDB smoke lookups\n\nEnable strict mode:\n\n```bash id=\"0xx2kq\"\nQUALITY_STRICT=1 ./geo.sh\n```\n\nStrict mode fails the build on large regressions or suspicious output anomalies.\n\n---\n\n## Normalization\n\nFinal records are normalized immediately before export.\n\nNormalization includes:\n\n* coordinate rounding\n* mojibake repair\n* subdivision cleanup\n* duplicate collapse\n* timezone derivation\n* conservative multilingual cleanup\n\nNormalization intentionally avoids broad transliteration or aggressive geopolitical rewriting.\n\n---\n\n## Geofeeds\n\nAllowlisted RFC 8805 feeds are configured in:\n\n```text id=\"u6u1kq\"\ndata/geofeeds/allowlist.tsv\n```\n\nSupported formats:\n\n```text id=\"s5kzzf\"\nprefix,country,region,city\nprefix,country,region,city,postal\n```\n\nDefault IPv4 floor:\n\n```text id=\"zy7t6x\"\nGEOFEED_MAX_IPV4_BITS=24\n```\n\nThis prevents excessive host-level fragmentation from narrow geofeed entries.\n\n---\n\n## Update Semantics\n\nThe downloader uses content hashing and atomic replacement semantics.\n\nTracked state files:\n\n```text id=\"dcbx3n\"\ndata/download-state.tsv\ndata/download-changed.txt\n```\n\nIf no source changed, the builder preserves the existing MMDB unless forced.\n\n---\n\n## Use Cases\n\n| Domain          | Example                  |\n| --------------- | ------------------------ |\n| Fraud Detection | Geo consistency checks   |\n| SIEM Enrichment | Country/city attribution |\n| Analytics       | Geographic aggregation   |\n| Gateways        | Local GeoIP lookups      |\n| Data Pipelines  | IP enrichment            |\n| Infrastructure  | Region-aware routing     |\n\n---\n\n## Operational Notes\n\n* City-level geolocation remains probabilistic\n* Mobile and VPN accuracy may vary substantially\n* Prefix coverage depends on DB-IP Lite seed availability\n* Postal enrichment should be treated as opportunistic\n* Geofeeds improve operator-owned allocations but are uneven globally\n\n---\n\n## Publication\n\nThe repository is designed so code can be published independently from downloaded datasets.\n\nDo not commit:\n\n* `admin.env`\n* downloaded provider databases\n* generated release artifacts\n* API credentials or license tokens\n\nReview `THIRD_PARTY_DATA.md` before redistributing derived outputs.\n\n---\n\n## Roadmap\n\nPlanned additions:\n\n* ASN-aware geo heuristics\n* confidence-weighted source tuning\n* regional regression dashboards\n* IPv6 quality scoring\n* compressed bulk exports\n* build reproducibility attestations\n\n---\n\n## License\n\nSee [`LICENSE`](./LICENSE).\n\nAdditional redistribution guidance:\n\n```text id=\"4ft7ps\"\nTHIRD_PARTY_DATA.md\n```\n\n---\n\n## Disclaimer\n\nGeoForge aggregates third-party geolocation datasets into derived operational outputs. IP geolocation should be treated as probabilistic infrastructure metadata, not physical-user attribution.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fipanalytics%2Fgeoforge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fipanalytics%2Fgeoforge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fipanalytics%2Fgeoforge/lists"}