{"id":24862643,"url":"https://github.com/4rlm/crm_formatter","last_synced_at":"2026-04-29T07:35:04.051Z","repository":{"id":59152556,"uuid":"133171720","full_name":"4rlm/crm_formatter","owner":"4rlm","description":"Ruby Gem: CrmFormatter is perfect for curating high-volume enterprise-scale web scraping, and integrates well with Nokogiri, Mechanize, and asynchronous jobs via Delayed_job or SideKick, to name a few.","archived":false,"fork":false,"pushed_at":"2018-07-09T13:52:55.000Z","size":3761,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-03-26T23:47:44.335Z","etag":null,"topics":["4rlm","adam-booth","adam-john-booth","address","booth","crm-data","csv","curation","data-hash","database","filter","formatter","phone","phone-number","proper-strings","ruby-gem","ruby-methods","scrub","url"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/4rlm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-05-12T18:08:21.000Z","updated_at":"2018-07-09T13:52:56.000Z","dependencies_parsed_at":"2022-09-13T10:50:28.436Z","dependency_job_id":null,"html_url":"https://github.com/4rlm/crm_formatter","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/4rlm/crm_formatter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4rlm%2Fcrm_formatter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4rlm%2Fcrm_formatter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4rlm%2Fcrm_formatter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4rlm%2Fcrm_formatter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/4rlm","download_url":"https://codeload.github.com/4rlm/crm_formatter/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4rlm%2Fcrm_formatter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32416146,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T06:29:02.080Z","status":"ssl_error","status_checked_at":"2026-04-29T06:29:00.631Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["4rlm","adam-booth","adam-john-booth","address","booth","crm-data","csv","curation","data-hash","database","filter","formatter","phone","phone-number","proper-strings","ruby-gem","ruby-methods","scrub","url"],"created_at":"2025-01-31T22:59:20.689Z","updated_at":"2026-04-29T07:35:04.035Z","avatar_url":"https://github.com/4rlm.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# CrmFormatter\n\n[![Build Status](https://travis-ci.org/4rlm/crm_formatter.svg?branch=master)](https://travis-ci.org/4rlm/crm_formatter)\n[![Gem Version](https://badge.fury.io/rb/crm_formatter.svg)](https://badge.fury.io/rb/crm_formatter)\n[![MIT License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n\n#### Efficiently Reformat, Normalize, and Scrub CRM Contact Data, such as Addresses, Phones and URLs.\n\nCrmFormatter is perfect for curating high-volume enterprise-scale web scraping, and integrates well with Nokogiri, Mechanize, and asynchronous jobs via Delayed_job or SideKick, to name a few.  Web Scraping and Harvesting often gathers a lot of junk to sift through; presenting unexpected edge cases around each corner.  CrmFormatter has been developed and refined during the past few years to focus on improving that task.\n\nIt's also perfect for processing API data, Web Forms, and routine DB normalizing and scrubbing processes.  Not only does it reformat Address, Phone, and Web data, it can also accept lists to scrub against, then providing detailed reports about how each piece of data compares with your criteria lists.\n\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'crm_formatter'\n```\n\nAnd then execute:\n```\n  $ bundle\n```\n\nOr install it yourself as:\n```\n  $ gem install crm_formatter\n```\n\n## Usage\n\n### I. Basic Usage\nBasic methods available are:\n```\nformat_addresses(array_of_addresses)\nformat_phones(array_of_phones)\nformat_propers(array_of_propers)\nformat_urls(array_of_urls)\n```\n\n1a. Format Proper String:\n\nUse `format_proper` to format a string with proper nouns, such as (but not limited to):\n\n* Business Account Name (123 bmw-world =\u003e 123 BMW-World),\n* Proper Name (adam john booth =\u003e Adam John Booth),\n* Job Title (marketing director =\u003e Marketing Director),\n* Article Title (the 15 most useful ruby methods =\u003e The 15 Most Useful Ruby Methods)\n\n```\nproper_string = 'the gmc and bmw-world of AUSTIN tx'\nformatted_proper = CrmFormatter.format_proper(proper_string)\n```\n\nResult in Hash Format:\n```\nformatted_proper = {\n  :proper_status=\u003e\"formatted\",\n  :proper=\u003e\"the gmc and bmw-world of AUSTIN tx\",\n  :proper_f=\u003e\"The GMC and BMW-World of Austin TX\"\n}\n```\n\n1b. Format Array of Proper Strings:\n\nUse `format_propers` to format an array of proper strings with proper nouns:\n\n```\narray_of_propers = [\n  'the gmc and bmw-world of AUSTIN tx',\n  '123 Car-world Kia OF CHICAGO IL',\n  'Main Street Ford in DALLAS tX',\n  'broad st fiat of houston',\n  'hot-deal auto insurance',\n  'BUDGET - AUTOMOTORES ZONA \u0026 FRANCA, INC',\n  'DOWNTOWN CAR REPAIR, INC',\n  'Young Gmc Trucks',\n  'TEXAS TRAVEL, CO',\n  'youmans Chevrolet',\n  'quick auto approval, inc',\n  'yazell chevy',\n  'quick cAr LUBE',\n  'yAtEs AuTo maLL',\n  'YADKIN VALLEY COLLISION CO',\n  'XIT FORD INC'\n]\n\nformatted_proper_hashes = CrmFormatter.format_propers(array_of_propers)\n```\n\nFormatted Proper Strings:\n\n```\nformatted_proper_hashes =\n[\n  {\n    proper_status: 'formatted',\n    proper: 'the gmc and bmw-world of AUSTIN tx',\n    proper_f: 'The GMC and BMW-World of Austin TX'\n  },\n  {\n    proper_status: 'formatted',\n    proper: '123 Car-world Kia OF CHICAGO IL',\n    proper_f: '123 Car-World Kia of Chicago IL'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'Main Street Ford in DALLAS tX',\n    proper_f: 'Main Street Ford in Dallas TX'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'broad st fiat of houston',\n    proper_f: 'Broad St Fiat of Houston'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'hot-deal auto insurance',\n    proper_f: 'Hot-Deal Auto Insurance'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'BUDGET - AUTOMOTORES ZONA \u0026 FRANCA, INC',\n    proper_f: 'Budget - Automotores Zona \u0026 Franca, Inc'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'DOWNTOWN CAR REPAIR, INC',\n    proper_f: 'Downtown Car Repair, Inc'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'Young Gmc Trucks',\n    proper_f: 'Young GMC Trucks'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'TEXAS TRAVEL, CO',\n    proper_f: 'Texas Travel, Co'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'youmans Chevrolet',\n    proper_f: 'Youmans Chevrolet'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'quick auto approval, inc',\n    proper_f: 'Quick Auto Approval, Inc'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'yazell chevy',\n    proper_f: 'Yazell Chevy'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'quick cAr LUBE',\n    proper_f: 'Quick Car Lube'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'yAtEs AuTo maLL',\n    proper_f: 'Yates Auto Mall'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'YADKIN VALLEY COLLISION CO',\n    proper_f: 'Yadkin Valley Collision Co'\n  },\n  {\n    proper_status: 'formatted',\n    proper: 'XIT FORD INC',\n    proper_f: 'Xit Ford Inc'\n  }\n]\n```\n\n2. Format Array of Phone Numbers:\n\n```\narray_of_phones = %w[\n  555-457-4391\n  555-888-4391\n  555-457-4334\n  555-555\n  555.555.1234\n  not_a_number\n]\n\nformatted_phone_hashes = CrmFormatter.format_phones(array_of_phones)\n```\n\nFormatted Phone Numbers:\n\n```\nformatted_phone_hashes = [\n  {\n    phone_status: 'formatted',\n    phone: '555-457-4391',\n    phone_f: '(555) 457-4391'\n  },\n  {\n    phone_status: 'formatted',\n    phone: '555-888-4391',\n    phone_f: '(555) 888-4391'\n  },\n  {\n    phone_status: 'formatted',\n    phone: '555-457-4334',\n    phone_f: '(555) 457-4334'\n  },\n  {\n    phone_status: 'invalid',\n    phone: '555-555',\n    phone_f: nil\n  },\n  {\n    phone_status: 'formatted',\n    phone: '555.555.1234',\n    phone_f: '(555) 555-1234'\n  },\n  {\n    phone_status: 'invalid',\n    phone: 'not_a_number',\n    phone_f: nil\n  }\n]\n```\n\n3. Format Array of URLs:\n\n```\narray_of_urls = %w[\n  sample01.com/staff\n  www.sample02.net.com\n  http://www.sample3.net\n  www.sample04.net/contact_us\n  http://sample05.net\n  www.sample06.sofake\n  www.sample07.com.sofake\n  example08.not.real\n  www.sample09.net/staff/management\n  www.www.sample10.com\n]\n\nformatted_url_hashes = CrmFormatter.format_urls(array_of_urls)\n```\n\nFormatted URLs:\n\n```\nformatted_url_hashes = [\n  {\n    web_status: 'invalid',\n    url: 'www.sample01.net.com',\n    url_f: nil,\n    url_path: nil,\n    web_neg: 'error: ext.valid \u003e 1 [com, net]'\n  },\n  {\n    web_status: 'formatted',\n    url: 'sample02.com',\n    url_f: 'http://www.sample02.com',\n    url_path: nil,\n    web_neg: nil\n  },\n  {\n    web_status: 'unchanged',\n    url: 'http://www.sample3.net',\n    url_f: 'http://www.sample3.net',\n    url_path: nil,\n    web_neg: nil\n  },\n  {\n    web_status: 'formatted',\n    url: 'www.sample04.net/contact_us',\n    url_f: 'http://www.sample04.net',\n    url_path: '/contact_us',\n    web_neg: nil\n  },\n  {\n    web_status: 'formatted',\n    url: 'http://sample05.net',\n    url_f: 'http://www.sample05.net',\n    url_path: nil,\n    web_neg: nil\n  },\n  {\n    web_status: 'invalid',\n    url: 'www.sample06.sofake',\n    url_f: nil,\n    url_path: nil,\n    web_neg: 'error: ext.invalid [sofake]'\n  },\n  {\n    web_status: 'formatted',\n    url: 'www.sample07.com.sofake',\n    url_f: 'http://www.sample07.com',\n    url_path: nil,\n    web_neg: nil\n  },\n  {\n    web_status: 'invalid',\n    url: 'example08.not.real',\n    url_f: nil,\n    url_path: nil,\n    web_neg: 'error: ext.invalid [not, real]'\n  },\n  {\n    web_status: 'formatted',\n    url: 'www.sample09.net/staff/management',\n    url_f: 'http://www.sample09.net',\n    url_path: '/staff/management',\n    web_neg: nil\n  },\n  {\n    web_status: 'formatted',\n    url: 'www.www.sample10.com',\n    url_f: 'http://www.sample10.com',\n    url_path: nil,\n    web_neg: nil\n  }\n]\n```\n\n4. Format Array of Addresses (each as a hash):\n\n```\narray_of_addresses = [\n  { street: '1234 EAST FAIR BOULEVARD', city: 'AUSTIN', state: 'TEXAS', zip: '78734' },\n  { street: '5678 North Lake Shore Drive', city: '555-123-4567', state: 'Illinois', zip: '610' },\n  { street: '9123 West Flagler Street', city: '1233144', state: 'NotAState', zip: 'Miami' }\n]\nformatted_address_hashes = CrmFormatter.format_addresses(array_of_addresses)\n```\n\nFormatted Addresses:\n\n```\nformatted_address_hashes = [\n  {\n    address_status: 'formatted',\n    full_addr: '1234 East Fair Boulevard, Austin, Texas, 78734',\n    full_addr_f: '1234 E Fair Blvd, Austin, TX, 78734',\n    street_f: '1234 E Fair Blvd',\n    city_f: 'Austin',\n    state_f: 'TX',\n    zip_f: '78734'\n  },\n  {\n    address_status: 'formatted',\n    full_addr: '5678 North Lake Shore Drive, 555-123-4567, Illinois, 610',\n    full_addr_f: '5678 N Lake Shore Dr, IL',\n    street_f: '5678 N Lake Shore Dr',\n    city_f: nil,\n    state_f: 'IL',\n    zip_f: nil\n  },\n  {\n    address_status: 'formatted',\n    full_addr: '9123 West Flagler Street, 1233144, NotAState, Miami',\n    full_addr_f: '9123 W Flagler St',\n    street_f: '9123 W Flagler St',\n    city_f: nil,\n    state_f: nil,\n    zip_f: nil\n  }\n]\n```\n\n### II. Advanced Usage\nAdvanced usage has ability to parse a CSV file or pass large data sets.  It also leverages the Utf8Sanitizer gem to check for and remove any non-UTF8 characters and extra whitespace (double spaces, new line, new paragraph, carriage returns, etc.).  The results will include a detailed report including the line numbers of altered data, along with the before and after for comparison.  Then, it passes that data to the CrmFormatter gem's advanced usage to format all parts of the CRM data together (Address, Phone, Web)\n\nAccess advanced usage via `format_with_report(args)` method and pass a csv file_path or data hashes.\n\n1. Parse and Format CSV via File Path (Must be absolute path to root and follow the syntax as below)\n\n```\nformatted_csv_results = CrmFormatter.format_with_report(file_path: './path/to/your/csv.csv')\n```\n\nParsed \u0026 Formatted CSV Results:\n\n```\nformatted_csv_results = {\n  stats:\n  {\n    total_rows: 2,\n    header_row: 1,\n    valid_rows: 1,\n    error_rows: 0,\n    defective_rows: 0,\n    perfect_rows: 0,\n    encoded_rows: 1,\n    wchar_rows: 0\n  },\n  data:\n  {\n    valid_data:\n    [\n      {\n        row_id: 1,\n        act_name: 'Courtesy Ford',\n        street: '1410 West Pine Street Hattiesburg',\n        city: 'Wexford',\n        state: 'MS',\n        zip: '39401',\n        full_addr: '1410 West Pine Street Hattiesburg, Wexford, MS, 39401',\n        phone: '512-555-1212',\n        url: 'http://www.courtesyfordsales.com',\n        street_f: '1410 W Pine St Hattiesburg',\n        city_f: 'Wexford',\n        state_f: 'MS',\n        zip_f: '39401',\n        full_addr_f: '1410 W Pine St Hattiesburg, Wexford, MS, 39401',\n        phone_f: '(512) 555-1212',\n        url_f: 'http://www.courtesyfordsales.com',\n        url_path: nil,\n        web_neg: nil,\n        address_status: 'formatted',\n        phone_status: 'formatted',\n        web_status: 'unchanged',\n        utf_status: 'encoded'\n      }\n    ],\n    encoded_data:\n    [\n      { row_id: 1,\n        text: \"http://www.courtesyfordsales.com,Courtesy Ford,__\\xD5\\xCB\\xEB\\x8F\\xEB__\\xD5\\xCB\\xEB\\x8F\\xEB____1410 West Pine Street Hattiesburg,Wexford,MS,39401,512-555-1212\" }\n    ],\n    defective_data: [],\n    error_data: []\n    },\n    file_path: './path/to/your/csv.csv'\n  }\n```\n\n2. Format Data Hashes\n\n```\ndata_hashes_array = [{ row_id: '1', url: 'abcacura.com/twitter', act_name: \"Stanley Chevrolet Kaufman\\x99_\\xCC\", street: '825 East Fair Street', city: 'Kaufman', state: 'Texas', zip: '75142', phone: \"555-457-4391\\r\\n\" }]\n\nformatted_data_hash_results = CrmFormatter.format_with_report(data: data_hashes_array)\n```\n\nFormatted Data Hashes Results:\n\n```\nformatted_data_hash_results = { stats:\n  {\n    total_rows: '1',\n    header_row: 1,\n    valid_rows: 1,\n    error_rows: 0,\n    defective_rows: 0,\n    perfect_rows: 0,\n    encoded_rows: 1,\n    wchar_rows: 1\n  },\n  data:\n  {\n    valid_data:\n    [\n      {\n        row_id: '1',\n        act_name: 'Stanley Chevrolet Kaufman',\n        street: '825 East Fair Street',\n        city: 'Kaufman',\n        state: 'Texas',\n        zip: '75142',\n        full_addr: '825 East Fair Street, Kaufman, Texas, 75142',\n        phone: '555-457-4391',\n        url: 'abcacura.com/twitter',\n        street_f: '825 E Fair St',\n        city_f: 'Kaufman',\n        state_f: 'TX',\n        zip_f: '75142',\n        full_addr_f: '825 E Fair St, Kaufman, TX, 75142',\n        phone_f: '(555) 457-4391',\n        url_f: 'http://www.abcacura.com',\n        url_path: '/twitter',\n        web_neg: nil,\n        address_status: 'formatted',\n        phone_status: 'formatted',\n        web_status: 'formatted',\n        utf_status: 'encoded, wchar'\n      }\n    ],\n    encoded_data:\n        [\n          {\n            row_id: '1',\n            text: \"1,abcacura.com/twitter,Stanley Chevrolet Kaufman\\x99_\\xCC,825 East Fair Street,Kaufman,Texas,75142,555-457-4391\\r\\n\"\n          }\n        ],\n    defective_data: [],\n    error_data: []\n  },\n  file_path: nil\n}\n```\n\n## Author\n\nAdam J Booth  - [4rlm](https://github.com/4rlm)\n\n## Development\n\nAfter checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.\n\nTo install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/4rlm/crm_formatter. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.\n\n## License\n\nThe gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).\n\n## Code of Conduct\n\nEveryone interacting in the CrmFormatter project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/4rlm/crm_formatter/blob/master/CODE_OF_CONDUCT.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4rlm%2Fcrm_formatter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F4rlm%2Fcrm_formatter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4rlm%2Fcrm_formatter/lists"}