{"id":13878918,"url":"https://github.com/reidmorrison/data_cleansing","last_synced_at":"2025-04-30T09:31:10.004Z","repository":{"id":9461124,"uuid":"11343095","full_name":"reidmorrison/data_cleansing","owner":"reidmorrison","description":"Cleanse data received via Rails, APIs, files, or inside plain ruby objects.","archived":false,"fork":false,"pushed_at":"2024-09-06T00:21:38.000Z","size":66,"stargazers_count":11,"open_issues_count":0,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-06T00:24:32.152Z","etag":null,"topics":["cleaners","rails","ruby","transform"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/reidmorrison.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-07-11T13:50:28.000Z","updated_at":"2025-02-17T00:19:22.000Z","dependencies_parsed_at":"2024-11-19T05:01:10.629Z","dependency_job_id":null,"html_url":"https://github.com/reidmorrison/data_cleansing","commit_stats":{"total_commits":35,"total_committers":1,"mean_commits":35.0,"dds":0.0,"last_synced_commit":"86c73fc2232d4cf98a95b616acfba4d814805300"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reidmorrison%2Fdata_cleansing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reidmorrison%2Fdata_cleansing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reidmorrison%2Fdata_cleansing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reidmorrison%2Fdata_cleansing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/reidmorrison","download_url":"https://codeload.github.com/reidmorrison/data_cleansing/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251676709,"owners_count":21626056,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cleaners","rails","ruby","transform"],"created_at":"2024-08-06T08:02:04.207Z","updated_at":"2025-04-30T09:31:09.704Z","avatar_url":"https://github.com/reidmorrison.png","language":"Ruby","readme":"data_cleansing\n==============\n\nData Cleansing framework for Ruby.\n\n* http://github.com/reidmorrison/data_cleansing\n\n## Introduction\n\nIt is important to keep internal data free of unwanted escape characters, leading\nor trailing blanks and even newlines.\nSimilarly it would be useful to be able to attach a cleansing solution to a field\nin a model and have the data cleansed transparently when required.\n\nDataCleansing is a framework that allows data cleansing to be applied to\nspecific attributes or fields.\n\n## Features\n\n* Supports global cleansing definitions that can be associated with any Ruby,\n  Rails, Mongoid, or other model\n* Supports custom cleansing definitions that can be defined in-line\n* A cleansing block can access the other attributes in the model while cleansing\n  the current attribute\n* In a cleansing block other attributes in the model can be modified at the\n  same time\n* Cleansers are executed in the order they are defined. As a result multiple\n  cleansers can be run against the same field and the order is preserved\n* Multiple cleansers can be specified for a list of attributes at the same time\n* Inheritance is supported. The cleansers for parent classes are run before\n  the child's cleansers\n* Cleansers can be called outside of a model instance for cases where fields\n  need to be cleansed before the model is created, or needs to be found\n* To aid troubleshooting the before and after values of cleansed attributes\n  is logged. The level of detail is fine-tuned using the log level\n\n## ActiveRecord (ActiveModel) Features\n\n* Passes the value of the attribute before the Rails type cast so that the\n  original text can be cleansed before passing back to rails for type conversion.\n  This is important for numeric and date fields where spaces and control characters\n  can have undesired effects\n\n## Examples\n\n### Ruby Example\n```ruby\nrequire 'data_cleansing'\n\n# Define a global cleaner\nDataCleansing.register_cleaner(:strip) {|string| string.strip}\n\nclass User\n  include DataCleansing::Cleanse\n\n  attr_accessor :first_name, :last_name\n\n  # Strip leading and trialing whitespace from first_name and last_name\n  cleanse :first_name, :last_name, :cleaner =\u003e :strip\nend\n\nu = User.new\nu.first_name = '    joe   '\nu.last_name = \"\\n  black\\n\"\nputs \"Before data cleansing #{u.inspect}\"\n# Before data cleansing #\u003cUser:0x007fc9f1081980 @first_name=\"    joe   \", @last_name=\"\\n  black\\n\"\u003e\n\nu.cleanse_attributes!\nputs \"After data cleansing #{u.inspect}\"\n# After data cleansing #\u003cUser:0x007fc9f1081980 @first_name=\"joe\", @last_name=\"black\"\u003e\n```\n\n### Rails Example\n\n```ruby\n# Define a global cleanser\nDataCleansing.register_cleaner(:strip) {|string| string.strip}\n\n# 'users' table has the following columns :first_name, :last_name, :address1, :address2\nclass User \u003c ActiveRecord::Base\n  include DataCleansing::Cleanse\n\n  # Use a global cleaner\n  cleanse :first_name, :last_name, :cleaner =\u003e :strip\n\n  # Define a once off cleaner\n  cleanse :address1, :address2, :cleaner =\u003e Proc.new {|string| string.strip}\n\n  # Automatically cleanse data before validation\n  before_validation :cleanse_attributes!\nend\n\n# Create a User instance\nu = User.new(:first_name =\u003e '    joe   ', :last_name =\u003e \"\\n  black\\n\", :address1 =\u003e \"2632 Brown St   \\n\")\nputs \"Before data cleansing #{u.attributes.inspect}\"\nu.validate\nputs \"After data cleansing #{u.attributes.inspect}\"\nu.save!\n```\n\n### Advanced Ruby Example\n\n```ruby\nrequire 'data_cleansing'\n\n# Define a global cleaners\nDataCleansing.register_cleaner(:strip) {|string| string.strip}\nDataCleansing.register_cleaner(:upcase) {|string| string.upcase}\n\nclass User\n  include DataCleansing::Cleanse\n\n  attr_accessor :first_name, :last_name, :title, :address1, :address2, :gender\n\n  # Use a global cleaner\n  cleanse :first_name, :last_name, :cleaner =\u003e :strip\n\n  # Define a once off cleaner\n  cleanse :address1, :address2, :cleaner =\u003e Proc.new {|string| string.strip}\n\n  # Use multiple cleaners, and a custom block\n  cleanse :title, :cleaner =\u003e [:strip, :upcase, Proc.new {|string| \"#{string}.\" unless string.end_with?('.')}]\n\n  # Change the cleansing rule based on the value of other attributes in that instance of user\n  # The 'title' is retrieved from the current instance of the user\n  cleanse :gender, :cleaner =\u003e [\n    :strip,\n    :upcase,\n    Proc.new do |gender|\n      if (gender == \"UNKNOWN\") \u0026\u0026 (title == \"MR.\")\n        \"Male\"\n      else\n        \"Female\"\n      end\n    end\n  ]\nend\n\nu = User.new\nu.first_name = '    joe   '\nu.last_name = \"\\n  black\\n\"\nu.address1 = \"2632 Brown St   \\n\"\nu.title = \"   \\nmr   \\n\"\nu.gender = \" Unknown  \"\nputs \"Before data cleansing #{u.inspect}\"\n# Before data cleansing #\u003cUser:0x007fdd5a83a8f8 @first_name=\"    joe   \", @last_name=\"\\n  black\\n\", @address1=\"2632 Brown St   \\n\", @title=\"   \\nmr   \\n\", @gender=\" Unknown  \"\u003e\n\nu.cleanse_attributes!\nputs \"After data cleansing #{u.inspect}\"\n# After data cleansing #\u003cUser:0x007fdd5a83a8f8 @first_name=\"joe\", @last_name=\"black\", @address1=\"2632 Brown St\", @title=\"MR.\", @gender=\"Male\"\u003e\n```\n\n## After Cleansing\n\nIt is sometimes useful to read or write multiple fields as part of a cleansing, or\nwhere attributes need to be manipulated automatically once they have been cleansed.\nFor this purpose instance methods on the model can be registered for invocation once\nall the attributes have been cleansed according to their :cleanse specifications.\nMultiple methods can be registered and they are called in the order they are registered.\n\n```ruby\nafter_cleanse \u003cinstance_method_name\u003e, \u003cinstance_method_name\u003e, ...\n```\n\nExample:\n```ruby\n# Define a global cleanser\nDataCleansing.register_cleaner(:strip) {|string| string.strip}\n\n# 'users' table has the following columns :first_name, :last_name, :address1, :address2\nclass User \u003c ActiveRecord::Base\n  include DataCleansing::Cleanse\n\n  # Use a global cleaner\n  cleanse :first_name, :last_name, :cleaner =\u003e :strip\n\n  # Define a once off cleaner\n  cleanse :address1, :address2, :cleaner =\u003e Proc.new {|string| string.strip}\n\n  # Once the above cleansing is complete call the instance method\n  after_cleanse :check_address\n\n  protected\n\n  # Method to be called once data cleansing is complete\n  def check_address\n    # Move address2 to address1 if Address1 is blank and address2 has a value\n    address2 = address1 if address1.blank? \u0026\u0026 !address2.blank?\n  end\n\nend\n\n# Create a User instance\nu = User.new(:first_name =\u003e '    joe   ', :last_name =\u003e \"\\n  black\\n\", :address2 =\u003e \"2632 Brown St   \\n\")\nputs \"Before data cleansing #{u.attributes.inspect}\"\nu.cleanse_attributes!\nputs \"After data cleansing #{u.attributes.inspect}\"\nu.save!\n```\n\n## Recommendations\n\n:data_cleanse block are ideal for cleansing a single attribute, and applying any\nglobal or common cleansing algorithms.\n\nEven though multiple attributes can be read or written in a single :data_cleanse\nblock, it is recommended to use the :after_cleanse method for working with multiple\nattributes. It is much easier to read and understand the interactions between multiple\nattributes in the :after_cleanse methods.\n\n## Rails configuration\n\nWhen DataCleansing is used in a Rails environment it can be configured using the\nregular Rails configuration mechanisms. For example:\n\n```ruby\nmodule MyApplication\n  class Application \u003c Rails::Application\n\n   # Data Cleansing Configuration\n\n   # Attributes who's values are to be masked out during logging\n   config.data_cleansing.register_masked_attributes :bank_account_number, :social_security_number\n\n   # Optionally override the default log level\n   #   Set to :trace or :debug to log all fields modified\n   #   Set to :info to log only those fields which were nilled out\n   #   Set to :warn or higher to disable logging of cleansing actions\n   config.data_cleansing.logger.level = :info\n\n   # Register any global cleaners\n   config.data_cleansing.register_cleaner(:strip) {|string| string.strip}\n\n  end\nend\n```\n\n## Logging\n\nDataCleansing uses SemanticLogger for logging due to it's excellent integration\nwith Rails and its ability to log data in it's raw form to Mongo and to files.\n\nIf running a Rails application it is recommended to install the gem\nrails_semantic_logger which replaces the default Rails logger. It is however\npossible to configure the semantic_logger gem to use the existing Rails logger\nin a Rails initializer as follows:\n\n```ruby\nSemanticLogger.default_level = Rails.logger.level\nSemanticLogger.add_appender(logger: Rails.logger)\n```\n\nBy changing the log level of DataCleansing itself the type of output for data\ncleansing can be controlled:\n\n* :trace or :debug to log all fields modified\n* :info to log only those fields which were nilled out\n* :warn or higher to disable logging of cleansing actions\n\nNote:\n\n* The logging of changes made to attributes only includes attributes cleansed\n  with :data_cleanse blocks. Attributes modified within :after_cleanse methods\n  are not logged\n\n* It is not necessary to change the global log level to affect the logging detail\n  level in DataCleansing. DataCleansing log level is changed independently\n\nTo change the log level, either use the Rails configuration approach, or set it\ndirectly:\n\n```ruby\nDataCleansing.logger.level = :info\n```\n\n## Notes\n\n* Cleaners are called in the order in which they are defined, so subsequent cleaners\n  can assume that the previous cleaners have run and can therefore access or even\n  modify previously cleaned attributes\n\n## Installation\n\n### Add to an existing Rails project\n\nAdd the following line to Gemfile\n\n```ruby\ngem 'data_cleansing'\n```\n\nInstall the Gem with bundler\n\n    bundle install\n\n## Dependencies\n\nDataCleansing requires the following dependencies\n\n* Ruby V1.9.3, V2 and greater\n* Rails V3.2 (Active Model) or greater for Rails integration ( Only if Rails is being used )\n* Mongoid and Mongomapper supporting Active Model V3.2 or greater ( Only if Mongoid or MongoMapper is being used )\n\n## Meta\n\n* Code: `git clone git://github.com/reidmorrison/data_cleansing.git`\n* Home: \u003chttps://github.com/reidmorrison/data_cleansing\u003e\n* Issues: \u003chttp://github.com/reidmorrison/data_cleansing/issues\u003e\n* Gems: \u003chttp://rubygems.org/gems/data_cleansing\u003e\n\nThis project uses [Semantic Versioning](http://semver.org/).\n\n## Authors\n\nReid Morrison :: reidmo@gmail.com :: @reidmorrison\n\n## License\n\nCopyright 2013, 2014, 2015, 2016 Reid Morrison\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n","funding_links":[],"categories":["Ruby"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freidmorrison%2Fdata_cleansing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Freidmorrison%2Fdata_cleansing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freidmorrison%2Fdata_cleansing/lists"}