{"id":15642094,"url":"https://github.com/fatkodima/data_checks","last_synced_at":"2025-06-18T16:33:21.608Z","repository":{"id":46604146,"uuid":"483983873","full_name":"fatkodima/data_checks","owner":"fatkodima","description":"Regression testing for data","archived":false,"fork":false,"pushed_at":"2024-08-10T11:25:48.000Z","size":49,"stargazers_count":66,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-06-07T09:19:42.456Z","etag":null,"topics":["activerecord","gem","rails","regression-testing","ruby"],"latest_commit_sha":null,"homepage":"https://www.rubydoc.info/gems/data_checks","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fatkodima.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-21T09:14:58.000Z","updated_at":"2025-05-15T19:52:47.000Z","dependencies_parsed_at":"2024-08-10T12:55:22.107Z","dependency_job_id":null,"html_url":"https://github.com/fatkodima/data_checks","commit_stats":{"total_commits":23,"total_committers":1,"mean_commits":23.0,"dds":0.0,"last_synced_commit":"00d92405ff8d26841b6598027d137ef3e82fcd5c"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/fatkodima/data_checks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fatkodima%2Fdata_checks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fatkodima%2Fdata_checks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fatkodima%2Fdata_checks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fatkodima%2Fdata_checks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fatkodima","download_url":"https://codeload.github.com/fatkodima/data_checks/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fatkodima%2Fdata_checks/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260589549,"owners_count":23032912,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["activerecord","gem","rails","regression-testing","ruby"],"created_at":"2024-10-03T11:54:23.488Z","updated_at":"2025-06-18T16:33:16.484Z","avatar_url":"https://github.com/fatkodima.png","language":"Ruby","readme":"# DataChecks\n\nThis gem provides a small DSL to check your data for inconsistencies and anomalies.\n\n[![Build Status](https://github.com/fatkodima/data_checks/actions/workflows/test.yml/badge.svg?branch=master)](https://github.com/fatkodima/data_checks/actions/workflows/test.yml)\n\n## Requirements\n\n- ruby 3.0+\n- activerecord 7.0+\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem \"data_checks\"\n```\n\n    $ bundle install\n    $ bin/rails generate data_checks:install\n\n## Motivation\n\nMaking sure that data stays valid is not a trivial task. For simple requirements, like \"this column is not null\" or \"this column is unique\", you of course just use the database constraints and that's it. Same goes for type validation or reference integrity.\n\nHowever, when you want to check for something more complex, then it all changes. Depending on your DBMS, you can use stored procedures, but this is often harder to write, version and maintain.\n\nYou could also assume that your data will never get corrupted, and validations directly in the code can do the trick ... but that'd be way too optimistic. Bugs happen all the time, and it's best to plan for the worst.\n\nThis gem doesn't aim to replace those tools, but provides something else that could serve a close purpose: *ensure that you work with the data you expect*.\n\nThis gem helps you to schedule some verifications on your data and get alerts when something is unexpected.\n\n`data_checks` can help to catch:\n\n* 🐛 **Bugs due to race conditions** (e.g. user accidentally double clicks a button to delete an email and ends up without emails due to a race condition bug in the app)\n* 🐛 **Invalid persisted data**\n* 🐛 **Unexpected changes in behavior and data** (e.g. too many (too less) of something is created/deleted/imported/enqueued/..., etc)\n\nThis idea is nicely presented at RailsConf: [RailsConf 2018: The Doctor Is In: Using checkups to find bugs in production by Ryan Laughlin](https://www.youtube.com/watch?v=gEAlhKaK2I4)\n\n## Usage\n\nA small DSL is provided to help express predicates and an easy way to configure notifications.\n\nYou will be notified when a check starts failing, and when it starts passing again.\n\n### Checking for inconsistencies\n\nFor example, we expect every image attachment to have previews in 3 sizes. It is possible, that when a new image was attached, some previews were not generated because of some failure. What we would like to ensure is that no image ends up without a full set of previews. We could write something like:\n\n```ruby\nDataChecks.configure do\n  ensure_no :users_without_emails, tag: \"minutely\" do\n    User.where.missing(:email_addresses)\n  end\n\n  ensure_no :images_without_previews, tag: \"hourly\" do\n    Attachment.images\n      .left_joins(:previews)\n      .group(:attachment_id)\n      .having(\"COUNT(previews.id) \u003c 3\")\n  end\n\n  notifier :email,\n    from: \"production@company.com\",\n    to: \"developer@company.com\"\nend\n```\n\n### Checking for anomalies\n\nThis gem can be also used to detect anomalies in the data. For example, you expect to have some number of new orders in the system in some period of time. Otherwise, this can hint at some bug in the order placing system worth investigating.\n\n```ruby\nensure_more :new_orders_per_hour, than: 10, tag: \"hourly\" do\n  Order.where(\"created_at \u003e= ?\", 1.hour.ago).count\nend\n```\n\n## Configuration\n\nCustom configurations should be placed in a `data_checks.rb` initializer.\n\n```ruby\n# config/initializers/data_checks.rb\n\nDataChecks.configure do\n  # ...\nend\n```\n\n### Notifiers\n\nCurrently, the following notifiers are supported:\n\n- `:email`: Uses `ActionMailer` to send emails. You can pass it any `ActionMailer` options.\n- `:slack`: Sends notifications to Slack. Accepts the following options:\n  - `webhook_url`: The webhook url to send notifications to\n- `:logger`: Uses `Logger` to output notifications to the log. Accepts the following params:\n  - `logdev`: The log device. This is a filename (String) or IO object (typically STDOUT, STDERR, or an open file).\n  - `level`: Logging severity threshold (e.g. Logger::INFO)\n\nEach of them accepts a `formatter_class` config to configure the used formatter when generating a notification.\n\nYou can create custom notifiers by creating a subclass of [Notifier](https://github.com/fatkodima/data_checks/blob/master/lib/data_checks/notifiers/notifier.rb).\n\nCreate a notifier:\n\n```ruby\nnotifier :email,\n  from: \"production@company.com\",\n  to: \"developer@company.com\"\n```\n\nCreate multiple notifiers of the same type:\n\n```ruby\nnotifier \"developers\",\n  type: :email,\n  from: \"production@company.com\",\n  to: [\"developer1@company.com\", \"developer2@company.com\"]\n\nnotifier \"tester\",\n  type: :email,\n  from: \"production@company.com\",\n  to: \"tester@company.com\"\n\nensure_no :images_without_previews, notify: \"developers\" do # notify only developers\n  # ...\nend\n```\n\n### Checks\n\n* `ensure_no` will check that the result of a given block is `zero?`, `empty?` or `false`\n* `ensure_any` will check that the result of a given block is `\u003e 0`\n* `ensure_more` will check that the result of a given block is `\u003e` than a given number or that it contains more than a given number of items\n* `ensure_less` will check that the result of a given block is `\u003c` than a given number or that it contains less than a given number of items\n* `ensure_equal` will check that the result of a given block is `==` to the given number or that it contains a given number of items\n\n```ruby\nensure_no :images_without_previews do\n  # ...\nend\n\nensure_any :facebook_logins_per_hour do\n  # ...\nend\n\nensure_more :new_orders_per_hour, than: 10 do\n  # ...\nend\n```\n\n### Customizing the error handler\n\nExceptions raised while a check runs are rescued and information about the error is persisted in the database.\n\nIf you want to integrate with an exception monitoring service (e.g. Bugsnag), you can define an error handler:\n\n```ruby\n# config/initializers/data_checks.rb\n\nDataChecks.config.error_handler = -\u003e(error, check_context) do\n  Bugsnag.notify(error) do |notification|\n    notification.add_metadata(:data_checks, check_context)\n  end\nend\n```\n\nThe error handler should be a lambda that accepts 2 arguments:\n\n* `error`: The exception that was raised.\n* `check_context`: A hash with additional information about the check:\n  * `check_name`: The name of the check that errored\n  * `ran_at`: The time when the check ran\n\n### Customizing the backtrace cleaner\n\n`DataChecks.config.backtrace_cleaner` can be configured to specify a backtrace cleaner to use when a check errors and the backtrace is cleaned and persisted. An `ActiveSupport::BacktraceCleaner` should be used.\n\n```ruby\n# config/initializers/data_checks.rb\n\ncleaner = ActiveSupport::BacktraceCleaner.new\ncleaner.add_silencer { |line| line =~ /ignore_this_dir/ }\n\nDataChecks.config.backtrace_cleaner = cleaner\n```\n\nIf none is specified, the default `Rails.backtrace_cleaner` will be used to clean backtraces.\n\n### Schedule checks\n\nSchedule checks to run (with cron, [Heroku Scheduler](https://elements.heroku.com/addons/scheduler), etc).\n\n```sh\nrake data_checks:run_checks TAG=\"5 minutes\"  # run checks with tag=\"5 minutes\"\nrake data_checks:run_checks TAG=\"hourly\"     # run checks with tag=\"hourly\"\nrake data_checks:run_checks TAG=\"daily\"      # run checks with tag=\"daily\"\nrake data_checks:run_checks                  # run all checks\n```\n\nHere's what it looks like with cron.\n\n```\n*/5 * * * * rake data_checks:run_checks TAG=\"5 minutes\"\n0   * * * * rake data_checks:run_checks TAG=\"hourly\"\n30  7 * * * rake data_checks:run_checks TAG=\"daily\"\n```\n\nYou can also manually get a status of all the checks by running:\n\n```sh\nrake data_checks:status\n```\n\n## Credits\n\nThanks to [checker_jobs gem](https://github.com/drivy/checker_jobs) for the original idea.\n\n## Development\n\nAfter checking out the repo, run `bundle install` to install dependencies. Then, run `rake test` to run the tests.\n\nTo install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/fatkodima/data_checks.\n\n## License\n\nThe gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffatkodima%2Fdata_checks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffatkodima%2Fdata_checks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffatkodima%2Fdata_checks/lists"}