{"id":16955567,"url":"https://github.com/rcdexta/databender","last_synced_at":"2025-04-14T06:11:41.486Z","repository":{"id":62556742,"uuid":"105425429","full_name":"rcdexta/databender","owner":"rcdexta","description":"Generate database subset using a configuration based rule-engine","archived":false,"fork":false,"pushed_at":"2017-10-02T10:49:27.000Z","size":1866,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-27T19:54:32.805Z","etag":null,"topics":["database","mysql","scripts","test-tools","testdata"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rcdexta.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-01T07:09:50.000Z","updated_at":"2018-10-25T00:37:36.000Z","dependencies_parsed_at":"2022-11-03T06:00:50.393Z","dependency_job_id":null,"html_url":"https://github.com/rcdexta/databender","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcdexta%2Fdatabender","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcdexta%2Fdatabender/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcdexta%2Fdatabender/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcdexta%2Fdatabender/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rcdexta","download_url":"https://codeload.github.com/rcdexta/databender/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248830395,"owners_count":21168272,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","mysql","scripts","test-tools","testdata"],"created_at":"2024-10-13T22:12:40.661Z","updated_at":"2025-04-14T06:11:41.460Z","avatar_url":"https://github.com/rcdexta.png","language":"Ruby","readme":"# Databender\n\nRuby script to generate a database subset driven by configuration based rule-engine\n\n[![Gem Version](https://badge.fury.io/rb/databender.svg)](http://badge.fury.io/rb/databender)\n\n#### Demo\n\n![alt tag](https://github.com/rcdexta/databender/raw/master/assets/demo.gif)\n\n#### Why\n\nIf you have to quickly boot up a micro-service or any application in your local machine and you are stuck because the service has dependent seed data that needs to be present in the database before starting up, you have couple options:\n\n* automate data generation using tools like [bobcat](https://github.com/ThoughtWorksStudios/bobcat)\n* use the fixtures that power your testing suite to generate the seed data\n* generate a subset of the data from one of the working environments (staging, uat)\n\nDatabender aims to offer an easy and seamless solution to solve the last option.\n\n#### Features\n\n* configuration driven rule engine\n* can add filters at table level or globally at column level\n* can resolve sequence of tables to import based on referential integrity (foreign key dependencies)\n\n#### Installation\n\nInstall the gem to install the command-line cli\n\n```bash\n$ gem install databender\n```\n\nand the type \n\n```shell\n$ databender --help\n```\n\nto know the list of available commands.\n\n#### Usage\n\nFirst initialise the configuration for the database you would like to take a subset of\n\n```powershell\n$ databender init --db-name=employees\n```\n\n\u003e Note: I have taken the MySQL public dataset available here: https://github.com/datacharmer/test_db as the sample dataset to illustrate the gem\n\nThis should create a `config` folder and a `database.yml` file. Specify the connection params to the source database in `database.yml` file. Inspect `filters/employees.yml`  to specify the rules for generating the subset. The comments in the file should serve as good documentation to specify the table and column filters. Find a sample filter configuration below.\n\n```yaml\ntables:\n  # Tables with rows lesser than min_row_count will be fully imported with no filters applied\n  min_row_count: 20\n\n  # For tables with no filters, the maximum number of rows to import\n  max_row_count: 1000\n\n  # specify table specific filters here\n  filters:\n    employees: hire_date \u003e= '1994-01-01'\n    departments: dept_name in ('d004', 'd005')\n\ncolumns:\n  # specify column filters applicable to all tables that contain that column\n  filters:\n    birth_date: birth_date \u003e= '1950-01-01'\n\n```\n\nNow you can run the generator using\n\n```shell\n$ databender generate --db-name=employees\n```\n\nThis should generate another database called `employees_subset` with the subset data and also create a dump of the file gzipped.\n\n\n#### License\n\nMIT\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frcdexta%2Fdatabender","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frcdexta%2Fdatabender","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frcdexta%2Fdatabender/lists"}