{"id":48025274,"url":"https://github.com/benzsevern/infermap","last_synced_at":"2026-04-10T13:01:18.830Z","repository":{"id":347849522,"uuid":"1195454752","full_name":"benzsevern/infermap","owner":"benzsevern","description":"Inference-driven schema mapping engine. Map messy source columns to a known target schema — accurately, explainably, and with zero config.","archived":false,"fork":false,"pushed_at":"2026-03-31T00:25:01.000Z","size":143,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-04T13:54:57.897Z","etag":null,"topics":["cli","column-mapping","data-engineering","data-integration","data-pipeline","data-quality","etl","fuzzy-matching","hungarian-algorithm","open-source","polars","python","schema-inference","schema-mapping","type-inference"],"latest_commit_sha":null,"homepage":"https://benzsevern.github.io/infermap/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/benzsevern.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"benzsevern"}},"created_at":"2026-03-29T17:34:58.000Z","updated_at":"2026-03-31T00:25:04.000Z","dependencies_parsed_at":null,"dependency_job_id":"991da6a1-8399-43a2-b2a1-6920b63349af","html_url":"https://github.com/benzsevern/infermap","commit_stats":null,"previous_names":["benzsevern/infermap"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/benzsevern/infermap","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benzsevern%2Finfermap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benzsevern%2Finfermap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benzsevern%2Finfermap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benzsevern%2Finfermap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/benzsevern","download_url":"https://codeload.github.com/benzsevern/infermap/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benzsevern%2Finfermap/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31437927,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T13:13:19.330Z","status":"ssl_error","status_checked_at":"2026-04-05T13:13:17.778Z","response_time":75,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","column-mapping","data-engineering","data-integration","data-pipeline","data-quality","etl","fuzzy-matching","hungarian-algorithm","open-source","polars","python","schema-inference","schema-mapping","type-inference"],"created_at":"2026-04-04T13:49:59.426Z","updated_at":"2026-04-05T14:01:11.183Z","avatar_url":"https://github.com/benzsevern.png","language":"Python","readme":"[![PyPI](https://img.shields.io/pypi/v/infermap?color=d4a017)](https://pypi.org/project/infermap/)\n[![CI](https://github.com/benzsevern/infermap/actions/workflows/test.yml/badge.svg)](https://github.com/benzsevern/infermap/actions/workflows/test.yml)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue)](https://python.org)\n[![License: MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE)\n\n# infermap\n\nInference-driven schema mapping engine — automatically maps source fields to target fields using a composable scorer pipeline.\n\n## Install\n\n```bash\npip install infermap\n```\n\nInstall extras for additional database support:\n\n```bash\npip install infermap[postgres]   # psycopg2-binary\npip install infermap[mysql]      # mysql-connector-python\npip install infermap[duckdb]     # duckdb\npip install infermap[all]        # all extras\n```\n\n## Quick Start\n\n```python\nimport infermap\n\n# Map a CRM export CSV to a canonical customer schema\nresult = infermap.map(\"crm_export.csv\", \"canonical_customers.csv\")\n\nfor m in result.mappings:\n    print(f\"{m.source} -\u003e {m.target}  ({m.confidence:.0%})\")\n# fname -\u003e first_name  (97%)\n# lname -\u003e last_name   (95%)\n# email_addr -\u003e email  (91%)\n\n# Apply mappings to rename DataFrame columns\nimport polars as pl\ndf = pl.read_csv(\"crm_export.csv\")\nrenamed = result.apply(df)\n\n# Save mappings to a reusable config file\nresult.to_config(\"my_mapping.yaml\")\n\n# Reload later — no re-inference needed\nsaved = infermap.from_config(\"my_mapping.yaml\")\n```\n\n## CLI Examples\n\n```bash\n# Map two files and print a report\ninfermap map crm_export.csv canonical_customers.csv\n\n# Map and save the config\ninfermap map crm_export.csv canonical_customers.csv --save mapping.yaml\n\n# Apply a saved mapping config to a DataFrame (prints renamed column list)\ninfermap apply crm_export.csv mapping.yaml\n\n# Inspect the schema of a file or database table\ninfermap inspect crm_export.csv\ninfermap inspect sqlite:///mydb.db --table customers\n\n# Validate a mapping config file\ninfermap validate mapping.yaml\n```\n\n## How It Works\n\ninfermap runs each field pair through a pipeline of **5 scorers**. Each scorer returns a score between 0.0 and 1.0 (or abstains with `None`). The engine combines scores via weighted average (requiring at least 2 contributing scorers), then uses the Hungarian algorithm for optimal one-to-one assignment.\n\n| Scorer | Weight | What it detects |\n|---|---|---|\n| **ExactScorer** | 1.0 | Case-insensitive exact name match |\n| **AliasScorer** | 0.9 | Known field aliases (e.g. `fname` == `first_name`, `tel` == `phone`) |\n| **PatternTypeScorer** | 0.7 | Semantic type from sample values — email, date_iso, phone, uuid, url, zip, currency |\n| **ProfileScorer** | 0.6 | Statistical profile similarity — null rate, unique rate, value count |\n| **FuzzyNameScorer** | 0.5 | Token-level fuzzy string similarity on field names |\n\n## Features\n\n- Maps CSV, Parquet, XLSX, Polars DataFrames, Pandas DataFrames, SQLite, and schema YAML files\n- Composable scorer pipeline — disable, reweight, or add custom scorers via config or code\n- Optimal one-to-one assignment via the Hungarian algorithm\n- `required` parameter warns when critical target fields go unmapped\n- `MapResult.apply()` renames DataFrame columns in one call\n- `to_config()` / `from_config()` roundtrip for repeatable pipelines\n- CLI for quick inspection, mapping, and validation\n\n## Custom Scorers\n\nRegister a scorer function with the `@infermap.scorer` decorator:\n\n```python\nimport infermap\nfrom infermap.types import FieldInfo, ScorerResult\n\n@infermap.scorer(\"my_prefix_scorer\", weight=0.8)\ndef my_prefix_scorer(source: FieldInfo, target: FieldInfo) -\u003e ScorerResult | None:\n    src = source.name.lower()\n    tgt = target.name.lower()\n    # Abstain if neither name starts with a common prefix\n    if not (src[:3] == tgt[:3]):\n        return None\n    return ScorerResult(score=0.85, reasoning=f\"Shared prefix '{src[:3]}'\")\n\nfrom infermap.engine import MapEngine\nfrom infermap.scorers import default_scorers\n\nengine = MapEngine(scorers=[*default_scorers(), my_prefix_scorer])\nresult = engine.map(\"source.csv\", \"target.csv\")\n```\n\nYou can also use a plain class with `name`, `weight`, and `score()`:\n\n```python\nclass DomainScorer:\n    name = \"DomainScorer\"\n    weight = 0.75\n\n    def score(self, source: FieldInfo, target: FieldInfo) -\u003e ScorerResult | None:\n        ...\n```\n\n## Config Reference\n\nLoad an `infermap.yaml` at engine creation to override scorer weights, disable scorers, or add domain aliases:\n\n```python\nengine = MapEngine(config_path=\"infermap.yaml\")\n```\n\nSee `infermap.yaml.example` for a full annotated example.\n\n## License\n\nMIT\n","funding_links":["https://github.com/sponsors/benzsevern"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenzsevern%2Finfermap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenzsevern%2Finfermap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenzsevern%2Finfermap/lists"}