{"id":18821145,"url":"https://github.com/compwright/match-columns-to-schema","last_synced_at":"2025-04-14T00:26:24.413Z","repository":{"id":28028784,"uuid":"104938352","full_name":"compwright/match-columns-to-schema","owner":"compwright","description":"Schema field matching algorithm for spreadsheet uploads","archived":false,"fork":false,"pushed_at":"2024-06-21T21:15:52.000Z","size":1505,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-27T14:47:43.533Z","etag":null,"topics":["algorithms","csv","observable","rxjs","schema"],"latest_commit_sha":null,"homepage":"https://www.npmjs.com/package/match-columns-to-schema","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/compwright.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"compwright"}},"created_at":"2017-09-26T21:13:27.000Z","updated_at":"2024-06-21T21:15:48.000Z","dependencies_parsed_at":"2023-01-14T07:58:45.838Z","dependency_job_id":"7f171c71-8f27-400a-9325-17dd23dd0148","html_url":"https://github.com/compwright/match-columns-to-schema","commit_stats":{"total_commits":60,"total_committers":3,"mean_commits":20.0,"dds":"0.23333333333333328","last_synced_commit":"1166fadc5e6aa707e2521a8af7a0bee45136deb5"},"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/compwright%2Fmatch-columns-to-schema","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/compwright%2Fmatch-columns-to-schema/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/compwright%2Fmatch-columns-to-schema/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/compwright%2Fmatch-columns-to-schema/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/compwright","download_url":"https://codeload.github.com/compwright/match-columns-to-schema/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248691422,"owners_count":21146348,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","csv","observable","rxjs","schema"],"created_at":"2024-11-08T00:34:34.705Z","updated_at":"2025-04-14T00:26:24.362Z","avatar_url":"https://github.com/compwright.png","language":"JavaScript","funding_links":["https://github.com/sponsors/compwright"],"categories":[],"sub_categories":[],"readme":"# Column-to-Schema matching algorithm for CSV file uploads\n\n[![Build Status](https://travis-ci.org/compwright/match-columns-to-schema.png?branch=master)](https://travis-ci.org/compwright/match-columns-to-schema)\n[![Code Climate](https://codeclimate.com/github/compwright/match-columns-to-schema/badges/gpa.svg)](https://codeclimate.com/github/compwright/match-columns-to-schema)\n[![Test Coverage](https://codeclimate.com/github/compwright/match-columns-to-schema/badges/coverage.svg)](https://codeclimate.com/github/compwright/match-columns-to-schema/coverage)\n[![Dependency Status](https://img.shields.io/david/compwright/match-columns-to-schema.svg?style=flat-square)](https://david-dm.org/compwright/match-columns-to-schema)\n[![Download Status](https://img.shields.io/npm/dm/match-columns-to-schema.svg?style=flat-square)](https://www.npmjs.com/package/match-columns-to-schema)\n[![Sponsor on GitHub](https://img.shields.io/static/v1?label=Sponsor\u0026message=❤\u0026logo=GitHub\u0026link=https://github.com/sponsors/compwright)](https://github.com/sponsors/compwright)\n\n## About This Project\n\nImplemented using [RxJS observables](http://reactivex.io/rxjs/), because:\n\n1. I needed practice\n2. Observables are fast and efficient\n\nH/T to @jfairbank for his excellent talk at [Connect.tech 2017](https://speakerdeck.com/jfairbank/connect-dot-tech-2017-dive-into-rxjs-observables) for inspiring me to commence this exercise.\n\n### The Problem\n\nWe need to parse a user-supplied CSV data file, examine the column names in the first row and the contents of each column, and match field of a pre-defined schema to the column that is the best match.\n\nWe define \"best match\" as follows:\n\n1. No more than 5% of the column values fail schema field validation, or 100% of the column values are blank\n2. The column name is the most similar to the schema field name\n3. The column appears in the file earliest\n\n### Implementation\n\n* Used take(), map(), and mergeAll() operators with ReplaySubject for streaming CSV column headers and row data\n* ReplaySubject ensures that no items are missed when data is subscribed to repeatedly\n* Since rows are represented by observables, only enough rows are read to supply the data needed for reading the header and computing the column validation score\n* Column names are compared to the schema field name and aliases using the Dice Coefficient algorithm, which works better than Levenshtein\n\n## Example\n\nTo demonstrate, run `npm run demo`.\n\nList.csv:\n\n```csv\n\"HHName\",\"LastName\",\"FirstName\",\"MiddleName\",\"SuffixName\",\"PrimaryAddress1\",\"PrimaryCity\",\"PrimaryState\",\"PrimaryZip\",\"PrimaryZip4\",\"PrimaryOddEvenCode\",\"PrimaryHouseNumber\",\"PrimaryHouseHalf\",\"PrimaryStreetPre\",\"PrimaryStreetName\",\"PrimaryStreetType\",\"PrimaryStreetPost\",\"PrimaryUnit\",\"PrimaryUnitNumber\",\"PrimaryPhone\",\"TelephoneReliabilityCode\",\"HasPrimaryPhone\",\"EMail\",\"DOB\",\"AgeRange\",\"Age\",\"Gender\",\"OfficialParty\",\"CalculatedParty\",\"RegistrationDate\",\"GeneralFrequency\",\"PrimaryFrequency\",\"OverAllFrequency\",\"GeneralAbsenteeStatus\",\"PrimaryAbsenteeStatus\",\"Moved\",\"CDName\",\"LDName\",\"SDName\",\"CountyName\",\"CountyNumber\",\"PrecinctNumber\",\"PrecinctName\",\"DMA\",\"Turf\",\"CensusBlock\",\"VoterKey\",\"HHRecId\",\"HHMemberId\",\"HHCode\",\"JurisdictionalVoterId\",\"ClientId\",\"StateVoterId\",\"Latitude\",\"Longitude\",\"MD1Name\",\"MD2Name\",\"CellularPhone\",\"HomePhone\",\"OtherPhone\"\n\"MICHAEL KASHA\",\"KASHA\",\"MICHAEL\",\"C\",\"\",\"1621 S McDuffie St\",\"Anderson\",\"SC\",\"29624\",\"3367\",\"O\",\"1621\",\"\",\"S\",\"McDuffie\",\"St\",\"\",\"\",\"\",\"8642221492\",\"\",\"True\",\"\",\"6/19/1945\",\"6\",\"70\",\"M\",\"Unaffiliated\",\"2 - Weak Republican\",\"1/2/1996\",\"4\",\"4\",\"4\",\"\",\"\",\"False\",\"3\",\"8\",\"4\",\"Anderson\",\"4\",\"081\",\"ANDERSON 4/2\",\"GREENVLL-SPART-ASHEVLL-AND\",\"None\",\"45007000600\",\"2157948\",\"1018908\",\"1\",\"S\",\"\",\"11524251517\",\"041904888\",\"34.489912\",\"-82.645473\",\"Anderson- CC 02\",\"Anderson- SB 55\",\"\",\"8642221492\",\"\"\n\"MAY HEMBREE\",\"HEMBREE\",\"MAY\",\"B\",\"\",\"1506 White St\",\"Anderson\",\"SC\",\"29624\",\"3414\",\"E\",\"1506\",\"\",\"\",\"White\",\"St\",\"\",\"\",\"\",\"8646100148\",\"6\",\"True\",\"\",\"1/14/1961\",\"4\",\"55\",\"F\",\"Unaffiliated\",\"3 - Swing\",\"4/21/2015\",\"5\",\"5\",\"5\",\"\",\"\",\"False\",\"3\",\"8\",\"4\",\"Anderson\",\"4\",\"081\",\"ANDERSON 4/2\",\"GREENVLL-SPART-ASHEVLL-AND\",\"None\",\"45007000600\",\"2157422\",\"834964\",\"1\",\"S\",\"\",\"11524250991\",\"235723185\",\"34.491968\",\"-82.638496\",\"Anderson- CC 02\",\"Anderson- SB 55\",\"\",\"8646100148\",\"\"\n\"STEVE CHEEK\",\"CHEEK\",\"STEVE\",\"R\",\"\",\"215 Beaty Sq\",\"Anderson\",\"SC\",\"29624\",\"1101\",\"O\",\"215\",\"\",\"\",\"Beaty\",\"Sq\",\"\",\"\",\"\",\"8642269903\",\"9\",\"True\",\"\",\"2/11/1951\",\"6\",\"65\",\"M\",\"Unaffiliated\",\"1 - Hard Republican\",\"5/25/1972\",\"4\",\"3\",\"4\",\"\",\"\",\"False\",\"3\",\"8\",\"4\",\"Anderson\",\"4\",\"082\",\"ANDERSON 5/A\",\"GREENVLL-SPART-ASHEVLL-AND\",\"None\",\"45007000700\",\"2161864\",\"334231\",\"1\",\"S\",\"\",\"11524255433\",\"041239469\",\"34.502244\",\"-82.678952\",\"Anderson- CC 02\",\"Anderson- SB 55\",\"\",\"8642269903\",\"\"\n\"STEPHANIE MARZETTE\",\"MARZETTE\",\"STEPHANIE\",\"D\",\"\",\"103 Brown St\",\"Anderson\",\"SC\",\"29624\",\"1301\",\"O\",\"103\",\"\",\"\",\"Brown\",\"St\",\"\",\"\",\"\",\"\",\"\",\"False\",\"\",\"11/11/1957\",\"5\",\"58\",\"F\",\"Unaffiliated\",\"3 - Swing\",\"9/4/2015\",\"5\",\"5\",\"5\",\"\",\"\",\"False\",\"3\",\"8\",\"4\",\"Anderson\",\"4\",\"082\",\"ANDERSON 5/A\",\"GREENVLL-SPART-ASHEVLL-AND\",\"None\",\"45007000700\",\"2183505\",\"1196504\",\"1\",\"S\",\"\",\"11524277074\",\"235791278\",\"34.503126\",\"-82.665886\",\"Anderson- CC 02\",\"Anderson- SB 55\",\"\",\"\",\"\"\n\"JESSIE FLEMING\",\"FLEMING\",\"JESSIE\",\"L\",\"\",\"106 Brown St\",\"Anderson\",\"SC\",\"29624\",\"1302\",\"E\",\"106\",\"\",\"\",\"Brown\",\"St\",\"\",\"\",\"\",\"8647600652\",\"9\",\"True\",\"\",\"8/8/1973\",\"3\",\"42\",\"M\",\"Unaffiliated\",\"1 - Hard Republican\",\"3/9/1992\",\"4\",\"3\",\"3\",\"\",\"\",\"False\",\"3\",\"8\",\"4\",\"Anderson\",\"4\",\"082\",\"ANDERSON 5/A\",\"GREENVLL-SPART-ASHEVLL-AND\",\"None\",\"45007000700\",\"2162530\",\"604089\",\"1\",\"S\",\"\",\"11524256099\",\"044216943\",\"34.503094\",\"-82.665886\",\"Anderson- CC 02\",\"Anderson- SB 55\",\"\",\"8647600652\",\"\"\n\n```\n\nOutput:\n\n```\nMatched columns to schema:\n[\n  {\n    \"field\": \"firstName\",\n    \"header\": \"FirstName\",\n    \"index\": 2\n  },\n  {\n    \"field\": \"lastName\",\n    \"header\": \"LastName\",\n    \"index\": 1\n  },\n  {\n    \"field\": \"email\",\n    \"header\": \"EMail\",\n    \"index\": 22\n  },\n  {\n    \"field\": \"latitude\",\n    \"header\": \"Latitude\",\n    \"index\": 53\n  },\n  {\n    \"field\": \"longitude\",\n    \"header\": \"Longitude\",\n    \"index\": 54\n  },\n  {\n    \"field\": \"address\",\n    \"header\": \"PrimaryAddress1\",\n    \"index\": 5\n  },\n  {\n    \"field\": \"city\",\n    \"header\": \"PrimaryCity\",\n    \"index\": 6\n  },\n  {\n    \"field\": \"state\",\n    \"header\": \"PrimaryState\",\n    \"index\": 7\n  },\n  {\n    \"field\": \"zip\",\n    \"header\": \"PrimaryZip\",\n    \"index\": 8\n  }\n]\n\nFirst three rows of data matched to schema:\n[\n  {\n    \"$index\": 1,\n    \"firstName\": \"MICHAEL\",\n    \"lastName\": \"KASHA\",\n    \"latitude\": \"34.489912\",\n    \"longitude\": \"-82.645473\",\n    \"address\": \"1621 S McDuffie St\",\n    \"city\": \"Anderson\",\n    \"state\": \"SC\",\n    \"zip\": \"29624\",\n    \"$unmatched\": {\n      \"HHName\": \"MICHAEL KASHA\",\n      \"MiddleName\": \"C\",\n      \"PrimaryZip4\": \"3367\",\n      \"PrimaryOddEvenCode\": \"O\",\n      \"PrimaryHouseNumber\": \"1621\",\n      \"PrimaryStreetPre\": \"S\",\n      \"PrimaryStreetName\": \"McDuffie\",\n      \"PrimaryStreetType\": \"St\",\n      \"PrimaryPhone\": \"8642221492\",\n      \"HasPrimaryPhone\": \"True\",\n      \"DOB\": \"6/19/1945\",\n      \"AgeRange\": \"6\",\n      \"Age\": \"70\",\n      \"Gender\": \"M\",\n      \"OfficialParty\": \"Unaffiliated\",\n      \"CalculatedParty\": \"2 - Weak Republican\",\n      \"RegistrationDate\": \"1/2/1996\",\n      \"GeneralFrequency\": \"4\",\n      \"PrimaryFrequency\": \"4\",\n      \"OverAllFrequency\": \"4\",\n      \"Moved\": \"False\",\n      \"CDName\": \"3\",\n      \"LDName\": \"8\",\n      \"SDName\": \"4\",\n      \"CountyName\": \"Anderson\",\n      \"CountyNumber\": \"4\",\n      \"PrecinctNumber\": \"081\",\n      \"PrecinctName\": \"ANDERSON 4/2\",\n      \"DMA\": \"GREENVLL-SPART-ASHEVLL-AND\",\n      \"Turf\": \"None\",\n      \"CensusBlock\": \"45007000600\",\n      \"VoterKey\": \"2157948\",\n      \"HHRecId\": \"1018908\",\n      \"HHMemberId\": \"1\",\n      \"HHCode\": \"S\",\n      \"ClientId\": \"11524251517\",\n      \"StateVoterId\": \"041904888\",\n      \"MD1Name\": \"Anderson- CC 02\",\n      \"MD2Name\": \"Anderson- SB 55\",\n      \"HomePhone\": \"8642221492\"\n    }\n  },\n  {\n    \"$index\": 2,\n    \"firstName\": \"MAY\",\n    \"lastName\": \"HEMBREE\",\n    \"latitude\": \"34.491968\",\n    \"longitude\": \"-82.638496\",\n    \"address\": \"1506 White St\",\n    \"city\": \"Anderson\",\n    \"state\": \"SC\",\n    \"zip\": \"29624\",\n    \"$unmatched\": {\n      \"HHName\": \"MAY HEMBREE\",\n      \"MiddleName\": \"B\",\n      \"PrimaryZip4\": \"3414\",\n      \"PrimaryOddEvenCode\": \"E\",\n      \"PrimaryHouseNumber\": \"1506\",\n      \"PrimaryStreetName\": \"White\",\n      \"PrimaryStreetType\": \"St\",\n      \"PrimaryPhone\": \"8646100148\",\n      \"TelephoneReliabilityCode\": \"6\",\n      \"HasPrimaryPhone\": \"True\",\n      \"DOB\": \"1/14/1961\",\n      \"AgeRange\": \"4\",\n      \"Age\": \"55\",\n      \"Gender\": \"F\",\n      \"OfficialParty\": \"Unaffiliated\",\n      \"CalculatedParty\": \"3 - Swing\",\n      \"RegistrationDate\": \"4/21/2015\",\n      \"GeneralFrequency\": \"5\",\n      \"PrimaryFrequency\": \"5\",\n      \"OverAllFrequency\": \"5\",\n      \"Moved\": \"False\",\n      \"CDName\": \"3\",\n      \"LDName\": \"8\",\n      \"SDName\": \"4\",\n      \"CountyName\": \"Anderson\",\n      \"CountyNumber\": \"4\",\n      \"PrecinctNumber\": \"081\",\n      \"PrecinctName\": \"ANDERSON 4/2\",\n      \"DMA\": \"GREENVLL-SPART-ASHEVLL-AND\",\n      \"Turf\": \"None\",\n      \"CensusBlock\": \"45007000600\",\n      \"VoterKey\": \"2157422\",\n      \"HHRecId\": \"834964\",\n      \"HHMemberId\": \"1\",\n      \"HHCode\": \"S\",\n      \"ClientId\": \"11524250991\",\n      \"StateVoterId\": \"235723185\",\n      \"MD1Name\": \"Anderson- CC 02\",\n      \"MD2Name\": \"Anderson- SB 55\",\n      \"HomePhone\": \"8646100148\"\n    }\n  },\n  {\n    \"$index\": 3,\n    \"firstName\": \"STEVE\",\n    \"lastName\": \"CHEEK\",\n    \"latitude\": \"34.502244\",\n    \"longitude\": \"-82.678952\",\n    \"address\": \"215 Beaty Sq\",\n    \"city\": \"Anderson\",\n    \"state\": \"SC\",\n    \"zip\": \"29624\",\n    \"$unmatched\": {\n      \"HHName\": \"STEVE CHEEK\",\n      \"MiddleName\": \"R\",\n      \"PrimaryZip4\": \"1101\",\n      \"PrimaryOddEvenCode\": \"O\",\n      \"PrimaryHouseNumber\": \"215\",\n      \"PrimaryStreetName\": \"Beaty\",\n      \"PrimaryStreetType\": \"Sq\",\n      \"PrimaryPhone\": \"8642269903\",\n      \"TelephoneReliabilityCode\": \"9\",\n      \"HasPrimaryPhone\": \"True\",\n      \"DOB\": \"2/11/1951\",\n      \"AgeRange\": \"6\",\n      \"Age\": \"65\",\n      \"Gender\": \"M\",\n      \"OfficialParty\": \"Unaffiliated\",\n      \"CalculatedParty\": \"1 - Hard Republican\",\n      \"RegistrationDate\": \"5/25/1972\",\n      \"GeneralFrequency\": \"4\",\n      \"PrimaryFrequency\": \"3\",\n      \"OverAllFrequency\": \"4\",\n      \"Moved\": \"False\",\n      \"CDName\": \"3\",\n      \"LDName\": \"8\",\n      \"SDName\": \"4\",\n      \"CountyName\": \"Anderson\",\n      \"CountyNumber\": \"4\",\n      \"PrecinctNumber\": \"082\",\n      \"PrecinctName\": \"ANDERSON 5/A\",\n      \"DMA\": \"GREENVLL-SPART-ASHEVLL-AND\",\n      \"Turf\": \"None\",\n      \"CensusBlock\": \"45007000700\",\n      \"VoterKey\": \"2161864\",\n      \"HHRecId\": \"334231\",\n      \"HHMemberId\": \"1\",\n      \"HHCode\": \"S\",\n      \"ClientId\": \"11524255433\",\n      \"StateVoterId\": \"041239469\",\n      \"MD1Name\": \"Anderson- CC 02\",\n      \"MD2Name\": \"Anderson- SB 55\",\n      \"HomePhone\": \"8642269903\"\n    }\n  }\n]\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompwright%2Fmatch-columns-to-schema","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcompwright%2Fmatch-columns-to-schema","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompwright%2Fmatch-columns-to-schema/lists"}