{"id":19782356,"url":"https://github.com/infinitifall/soft-string-compare","last_synced_at":"2026-05-10T12:20:49.686Z","repository":{"id":252501565,"uuid":"840302469","full_name":"Infinitifall/Soft-String-Compare","owner":"Infinitifall","description":"C++ functions to judge the similarity of strings. Originally created to correct messy human input. Also compiles to Web Assembly.","archived":false,"fork":false,"pushed_at":"2024-08-21T06:17:56.000Z","size":47,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-11T02:32:42.317Z","etag":null,"topics":["cmake","cpp","data-cleaning","string-matching","string-similarity","wasm"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Infinitifall.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-09T12:02:40.000Z","updated_at":"2024-08-21T06:17:59.000Z","dependencies_parsed_at":"2025-01-17T19:02:36.524Z","dependency_job_id":null,"html_url":"https://github.com/Infinitifall/Soft-String-Compare","commit_stats":null,"previous_names":["infinitifall/soft-string-compare"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitifall%2FSoft-String-Compare","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitifall%2FSoft-String-Compare/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitifall%2FSoft-String-Compare/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitifall%2FSoft-String-Compare/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Infinitifall","download_url":"https://codeload.github.com/Infinitifall/Soft-String-Compare/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241113257,"owners_count":19911857,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cmake","cpp","data-cleaning","string-matching","string-similarity","wasm"],"created_at":"2024-11-12T06:04:57.533Z","updated_at":"2026-05-10T12:20:49.605Z","avatar_url":"https://github.com/Infinitifall.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Soft String Compare\n\nC++ functions to judge the similarity of strings. Originally created to correct messy human input. Also compiles to WebAssembly.\n\n\n## Install\n\nYou need `cmake` and a C++ compiler (such as `g++`) installed.\n\n```bash\n# clone repo\ngit clone https://github.com/Infinitifall/Soft-String-Compare\ncd Soft-String-Compare\n\n# make\ncd build_native\ncmake ../ -DCMAKE_BUILD_TYPE=Release\n# cmake ../ -DCMAKE_BUILD_TYPE=Debug\ncmake --build ./\n```\n\n## Use\n\nThe code in [example.cpp](./example.cpp) will try to correct the messy product names in [input_list.txt](data_dummy/input_list.txt) (see below) by matching them to a catalog of products.\n\n```\niphne 13 pro maks\nnkie ar jrdan 1\nsamsng 55\" qled 4k smrt tv\ndysin v11 absloote\ninstnt pot dou plus 6 qt\nplaystaton 5 digitl editon\nfitbt versa 3 smrt wach\nkeurigk-cup coffe makr\nbose quetcomfort 45 hdphnes\nnutribullt pro - 13-pce hi-sped blndr\nroomba i3+ evo self-emtying robt vacum\nninja foodie 10-in1 xl pro ar fyer \u0026 othr\nlgo star wars milenim falcn\n\n...\n```\n\nRun the `example` executable\n\n```bash\n# run\ncd build_native\n./example\n```\n\n```\n✅ iPhone 13 Pro Max\n✅ Nike Air Jordan 1\n✅ Samsung 55\" QLED 4K Smart TV\n✅ Dyson V11 Absolute\n✅ Instant Pot Duo Plus 6 Qt\n✅ PlayStation 5 Digital Edition\n✅ Fitbit Versa 3 Smartwatch\n✅ Keurig K-Cup Coffee Maker\n✅ Bose QuietComfort 45 Headphones\n✅ NutriBullet Pro - 13-Piece High-Speed Blender\n✅ iRobot Roomba i3+ EVO Self-Emptying Robot Vacuum\n✅ Ninja Foodi 10-in-1 XL Pro Air Fryer \u0026 Other\n✅ LEGO Star Wars Millennium Falcon\n✅ Amazon Echo Dot (4th Gen) Smart Speaker\n✅ VIZIO 5.1 Surround Sound Bar System\n✅ Logitech MX Master 3S Wireless Mouse\n✅ KitchenAid Artisan Series 5 Qt. Mixer\n✅ GoPro HERO11 Black\n✅ Apple Watch Series 7 GPS + Cellular\n✅ Sonos One SL Wi-Fi Speaker\n✅ Microsoft Surface Pro 8 Laptop\n✅ LG 65\" C1 Series OLED 4K UHD Smart TV\n✅ Breville The Barista Express Espresso Machine\n✅ Garmin Forerunner 945 GPS Running Watch\n✅ Whirlpool 4.8 cu. ft. Front Load Washer\n✅ Canon EOS R6 Mirrorless Camera\n✅ Beats by Dr. Dre Studio3 Wireless Over-Ear Headphones\n✅ Theragun Prime Deep Tissue Massage Gun\n✅ Philips Norelco Multigroom All-in-One Trimmer\n✅ Nespresso Vertuo Next Coffee \u0026 Espresso Maker\n✅ Ring Video Doorbell Pro 2\n✅ Brita Longlast Water Filter Pitcher\n✅ Vitamix E310 Explorian Series Blender\n✅ Traeger Pro 575 Wood Pellet Grill\n✅ Oculus Quest 2 Advanced All-in-One Virtual Reality Headset\n✅ Sunbeam Osmo 3 Reverse Osmosis Water Filter System\n✅ Merax 10' Trampoline with Enclosure\n✅ Klipsch HT-G700 3.1ch Dolby Atmos Soundbar\n✅ YETI Tundra 45 Hard Cooler\n✅ RTIC UltraLight 52 Qt Cooler\n✅ HidrateSpark 3.0 32oz Insulated Water Bottle\n❌ Bose QuietComfort 45 Headphones (lumin ultra-comfortble coper-infusd matres = Leesa Original Mattress)\n✅ Anova Culinary Sous Vide Precision Cooker\n✅ AeroGarden Harvest Elite Indoor Garden\n✅ Waterpik Aquarius Water Flosser\n✅ eufy RoboVac 11S (Slim) Robot Vacuum\n✅ SKIL 1/4\" Hex Electric Screwdriver\n✅ SMOK Novo 4 Pod System Vape Kit\n✅ Anker PowerCore 10000 Portable Charger\n\n✅ count = 48\n☑️  count = 0\n❌ count = 1\n\n🎯 ratio = 97.959 %\n```\n\nWe see it is able to match severely misspelled product names to a very high degree (`\u003e90%`).\n\nWe can also enter arbitrary strings to see how the system ranks items. For example, to see why `lumin ultra-comfortble coper-infusd matres` wasn't correctly matched:\n\n```\nEnter name: lumin ultra-comfortble coper-infusd matres\n```\n\n```\n...\n\n0.000 rating: Logitech MX Master 3S Wireless Mouse\n0.000 rating: VIZIO 5.1 Surround Sound Bar System\n0.360 rating: Vitamix E310 Explorian Series Blender\n0.360 rating: NutriBullet Pro - 13-Piece High-Speed Blender\n0.847 rating: Merax 10' Trampoline with Enclosure\n0.847 rating: Philips Norelco Multigroom All-in-One Trimmer\n2.357 rating: Anker PowerCore 10000 Portable Charger\n2.659 rating: Nespresso Vertuo Next Coffee \u0026 Espresso Maker\n3.701 rating: RTIC UltraLight 52 Qt Cooler\n3.924 rating: Traeger Pro 575 Wood Pellet Grill\n5.997 rating: Breville The Barista Express Espresso Machine\n7.131 rating: Leesa Original Mattress\n9.128 rating: Garmin Forerunner 945 GPS Running Watch\n14.926 rating: Bose QuietComfort 45 Headphones\n\n1: lumin ultra-comfortble coper-infusd matres\n++ ____________comfort_______________________\n++ ________________________________________es\n++ _______________________co_________________\n++ _____________________e ___________________\n-- ___e __________________________\n-- __________Co___________________\n-- _____________________________es\n-- __________Comfort______________\n2: Bose QuietComfort 45 Headphones\n```\n\nWe see that the correct choice `Leesa Original Mattress` is given the third highest rating. This is a particularly tricky example because it doesn't have much in common with the input string `lumin ultra-comfortble coper-infusd matres`.\n\n\n## Performance\n\nIn [example.cpp](./example.cpp): In function `int main(int argc, char** argv)`: Comment out all lines except `real_run_wrapper(argc, argv);`. Compile with `Release` to enable optimization flags\n\n```bash\n# alternatively, run the `cmake release` task in VS Codium\n\ncd build_native\ncmake ../ -DCMAKE_BUILD_TYPE=Release\ncmake --build ./\n```\n\n```bash\n# using bash\ntime ./example ../data_dummy/all_list.txt ../data_dummy/input_list.txt ../data_dummy/output_list.txt ../data_dummy/all_list.txt\n```\n\n```bash\nreal    0m0.059s\nuser    0m0.052s\nsys     0m0.003s\n```\n\nTo correct 50 product names on a Dell XPS 15 (9510, 2021) equivalent.\n\n\n## Compile to WebAssembly\n\nYou need a Wasm compiler (like `Emscripten`) and a local development server (such as `http-server`)\n\n```bash\n# make\ncd build_wasm\nemcmake cmake ../ -DCMAKE_BUILD_TYPE=Release\n# emcmake cmake ../ -DCMAKE_BUILD_TYPE=Debug\nemmake make\n\n\n# run a local development server\nnpx http-server ./ -o -p 9999\n# visit http://localhost:9999 in your web browser\n```\n\nUseful guides on C/C++ to Wasm:\n- https://developer.mozilla.org/en-US/docs/WebAssembly/C_to_Wasm\n- https://marcoselvatici.github.io/WASM_tutorial/\n\n\n## Programming in VS Codium\n\n- `clangd` extension for completions, references, errors and hints\n- `CodeLLDB` extension works great with the provided [launch.json](./.vscode/launch.json) and [tasks.json](./.vscode/tasks.json)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinitifall%2Fsoft-string-compare","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finfinitifall%2Fsoft-string-compare","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinitifall%2Fsoft-string-compare/lists"}