{"id":23023198,"url":"https://github.com/kibitan/masking","last_synced_at":"2025-05-16T12:10:02.789Z","repository":{"id":52300092,"uuid":"129277743","full_name":"kibitan/masking","owner":"kibitan","description":"Command line tool for generating anonymized database for MySQL/MariaDB","archived":false,"fork":false,"pushed_at":"2024-12-30T14:12:18.000Z","size":508,"stargazers_count":119,"open_issues_count":24,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-02T06:09:23.879Z","etag":null,"topics":["command-line-tool","database","gdpr","mariadb","mysql","rdbms","ruby"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kibitan.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["kibitan"]}},"created_at":"2018-04-12T16:04:56.000Z","updated_at":"2025-03-11T07:35:29.000Z","dependencies_parsed_at":"2023-11-19T07:24:10.273Z","dependency_job_id":"d175376a-1e29-4fe2-9673-a4ff69106c9a","html_url":"https://github.com/kibitan/masking","commit_stats":{"total_commits":235,"total_committers":3,"mean_commits":78.33333333333333,"dds":0.008510638297872353,"last_synced_commit":"e7870bb9d9e896c9dd8cfbe26b904bfc68cc3c74"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kibitan%2Fmasking","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kibitan%2Fmasking/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kibitan%2Fmasking/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kibitan%2Fmasking/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kibitan","download_url":"https://codeload.github.com/kibitan/masking/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247999859,"owners_count":21031046,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line-tool","database","gdpr","mariadb","mysql","rdbms","ruby"],"created_at":"2024-12-15T13:12:47.982Z","updated_at":"2025-04-09T08:02:44.255Z","avatar_url":"https://github.com/kibitan.png","language":"Ruby","readme":"# MasKING🤴\n\n[![CircleCI](https://circleci.com/gh/kibitan/masking/tree/main.svg?style=svg)](https://circleci.com/gh/kibitan/masking/tree/main)\n[![Acceptance Test MySQL Status](https://github.com/kibitan/masking/workflows/Acceptance%20Test%20MySQL/badge.svg?branch=main)](https://github.com/kibitan/masking/actions?query=workflow%3A%22Acceptance+Test+MySQL%22+branch%3Amain)\n[![Acceptance Test MariaDB Status](https://github.com/kibitan/masking/workflows/Acceptance%20Test%20MariaDB/badge.svg?branch=main)](https://github.com/kibitan/masking/actions?query=workflow%3A%22Acceptance+Test+MariaDB%22+branch%3Amain)\n\n[![codecov](https://codecov.io/gh/kibitan/masking/branch/main/graph/badge.svg)](https://codecov.io/gh/kibitan/masking)\n[![Maintainability](https://api.codeclimate.com/v1/badges/290b3005ecc193a3d138/maintainability)](https://codeclimate.com/github/kibitan/masking/maintainability)\n[![CodeScene Code Health](https://codescene.io/projects/38627/status-badges/code-health)](https://codescene.io/projects/38627)\n[![Gem Version](https://badge.fury.io/rb/masking.svg)](https://badge.fury.io/rb/masking)\n\u003c!--\n[![CodeScene System Mastery](https://codescene.io/projects/38627/status-badges/system-mastery)](https://codescene.io/projects/38627)\n[![CodeScene Missed Goals](https://codescene.io/projects/38627/status-badges/missed-goals)](https://codescene.io/projects/38627)\n--\u003e\n\nThe command line tool for anonymizing database records by parsing a SQL dump file and build a new SQL dump file with masking sensitive/credential data.\n\n## Design Concept\n\n### KISS ~ keep it simple, stupid ~\n\nNo connection to the database, No handling files, Only dealing with stdin/stdout. ~ Do One Thing and Do It Well ~ inspired by Unix Philosophy.\n\n### No External Dependency\n\nDepend on only pure language standard libraries, no external libraries\n\n### Quality of Code\n\nHeavily inspired by TDD. please see detail in presentation below.\n\n## Presentation / Demo\n\n[![presentation](https://img.youtube.com/vi/tnGLUhmHclI/0.jpg)](https://www.youtube.com/watch?v=oml7dcDo_jo)\n\n[demo](https://www.youtube.com/watch?v=tnGLUhmHclI) / slide: [Generate anonymised database with MasKING](https://speakerdeck.com/kibitan/generate-anonymised-database-with-masking-2023-dot-09-dot-21-euruko-unconference-talk)\n\n\n## Installation\n\n```bash\ngem install masking\n```\n\n## Requirement\n\n* Ruby 2.6/2.7/3.0/3.1/3.2/3.3\n\n## Supporting RDBMS\n\n* MySQL: 5.7, 8.0, 8.1\n* MariaDB: 10.2\u003csup\u003e[1](#footnote1)\u003c/sup\u003e, 10.3\u003csup\u003e[1](#footnote1)\u003c/sup\u003e, 10.4, 10.5, 10.6, 10.7\u003csup\u003e[1](#footnote1)\u003c/sup\u003e, 10.8\u003csup\u003e[1](#footnote1)\u003c/sup\u003e, 10.9\u003csup\u003e[1](#footnote1)\u003c/sup\u003e, 10.10, 10.11, 11.0, 11.1\n\n## Usage\n\n1. Setup configuration for anonymizing target tables/columns to `masking.yml`\n\n    *NOTE: the columns which doesn't mention here will be NOT anonymized, it stays as it is.*\n\n    ```yaml\n    # table_name:\n    #   column_name: masked_value\n\n    users:\n      string: anonymized string\n      email: anonymized+%{n}@example.com # %{n} will be replaced with sequential number\n      integer: 12345\n      float: 123.45\n      boolean: true\n      null_column: null\n      date: 2018-08-24\n      time: 2018-08-24 15:54:06\n      binary_or_blob: !binary | # Binary Data Language-Independent Type for YAML™ Version 1.1: http://yaml.org/type/binary.html\n        R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5\n        OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+\n        +f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC\n        AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs=\n      # When a column name is suffixed with `?`, the original NULL value will not be anonymized.\n      # This option can be beneficial for simulating SQL execution that closely resembles the original data.\n      nullable_string?: anonymized nullable %{n} string\n    ```\n\n    A value will be implicitly converted to a compatible type. If you prefer to explicitly convert, you could use a tag as defined in [YAML Version 1.1](http://yaml.org/spec/current.html#id2503753)\n\n    ```yaml\n    not-date: !!str 2002-04-28\n    ```\n\n    String should be matched with [MySQL String Type]( https://dev.mysql.com/doc/refman/8.0/en/string-type-overview.html). Integer/Float should be matched with [MySQL Numeric Type](https://dev.mysql.com/doc/refman/8.0/en/numeric-type-overview.html). Date/Time should be matched with [MySQL Date and Time Type](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-type-overview.html).\n\n    *NOTE: MasKING doesn't check actual schema's type from the dump. If you put incompatible value it will cause an error during restoring to the database.*\n\n1. Dump database with anonymizing\n\n    MasKING works with `mysqldump --complete-insert`\n\n    ```bash\n      mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking \u003e anonymized_dump.sql\n    ```\n\n1. Restore from the anonymized dump file\n\n    ```bash\n      mysql -u USERNAME ANONYMIZED_DATABASE_NAME \u003c anonymized_dump.sql\n    ```\n\n    Tip: If you don't need to have an anonymized dump file, you can directly insert it from the stream. It can be faster because it has less IO interaction.\n\n      ```bash\n        mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking | mysql -u USERNAME ANONYMIZED_DATABASE_NAME\n      ```\n\n### options\n\n```bash\n$ masking -h\nUsage: masking [options]\n    -c, --config=FILE_PATH           specify config file. default: masking.yml\n    -v, --version                    version\n```\n\n## Use case of anonymized (production) database\n\n* Analyzing production databases for BI, Machine Learning, troubleshooting with respecting GDPR\n* Stress test / Integration test\n* Performance optimization for slow query\n\n  The analyzing slow query often needs a similar amount of records/cardinality with production, the anonymized database help to analyze and tune the slow query.\n\n* Simulating database migration\n\n  Some schema migration locks table and it causes trouble during the execution. With a smaller amount of database, the migration will finish in a short time and easy to overlook the problem. With the anonymized production database, it is easy to simulate the migration as the real release and makes it easy to find the problem.\n\n* Better feature development flow\n\n  Using similar data with the production database makes better development experience. It makes easy to find out the things which should be changed/fixed. Also, some bugs are related to unexpected data in production, it makes easy to find them too.\n\n* And… your idea here!\n\n## Development\n\n```bash\ngit clone git@github.com:kibitan/masking.git\nbin/setup\n```\n\nYou can also run `bin/console` for an interactive prompt that will allow you to experiment.\n\nTo install this gem onto your local machine, run `bundle exec rake install`.\n\n### boot\n\n```bash\n  bundle exec exe/masking\n```\n\n### Run test \u0026 rubocop \u0026 notes\n\n```bash\n  bundle exec rake\n```\n\n#### acceptance test\n\n```bash\n./acceptance/run_test.sh\n```\n\navailable option via environment variable:\n\n* `MYSQL_HOST`: database host(default: `localhost`)\n* `MYSQL_USER`: mysql user name(default: `mysqluser`}\n* `MYSQL_PASSWORD`: password for user(default: `password`)\n* `MYSQL_DBNAME`: database name(default: `mydb`)\n\nNOTE: run with `TRACE=1` will show debug print. for the CI, `TRACE` environment variable on [setting field in the repository](https://github.com/kibitan/masking/settings/variables/actions/TRACE)\n\n##### with docker-compose\n\n```bash\ndocker-compose -f docker-compose.yml -f docker-compose/mysql80.yml run -e MYSQL_HOST=mysql80 app acceptance/run_test.sh\n```\n\nor\n\n```bash\ndocker-compose/acceptance_test.sh mysql80\n```\n\nThe docker-compose file names for other database versions, specify that file.\n\n* MySQL 8.1: [`docker-compose/mysql80.yml`](./docker-compose/mysql81.yml)\n* MySQL 8.0: [`docker-compose/mysql80.yml`](./docker-compose/mysql80.yml)\n* MySQL 5.7: [`docker-compose/mysql57.yml`](./docker-compose/mysql57.yml)\n* MariaDB 11.1: [`docker-compose/mariadb111.yml`](./docker-compose/mariadb111.yml)\n* MariaDB 11.0: [`docker-compose/mariadb110.yml`](./docker-compose/mariadb110.yml)\n* MariaDB 10.11: [`docker-compose/mariadb1011.yml`](./docker-compose/mariadb1011.yml)\n* MariaDB 10.10: [`docker-compose/mariadb1010.yml`](./docker-compose/mariadb1010.yml)\n* MariaDB 10.9\u003csup\u003e[1](#footnote1): [`docker-compose/mariadb109.yml`](./docker-compose/mariadb109.yml)\n* MariaDB 10.8\u003csup\u003e[1](#footnote1): [`docker-compose/mariadb108.yml`](./docker-compose/mariadb108.yml)\n* MariaDB 10.7\u003csup\u003e[1](#footnote1): [`docker-compose/mariadb107.yml`](./docker-compose/mariadb107.yml)\n* MariaDB 10.6: [`docker-compose/mariadb106.yml`](./docker-compose/mariadb106.yml)\n* MariaDB 10.5: [`docker-compose/mariadb105.yml`](./docker-compose/mariadb105.yml)\n* MariaDB 10.4: [`docker-compose/mariadb104.yml`](./docker-compose/mariadb104.yml)\n* MariaDB 10.3\u003csup\u003e[1](#footnote1): [`docker-compose/mariadb103.yml`](./docker-compose/mariadb103.yml)\n* MariaDB 10.2\u003csup\u003e[1](#footnote1)\u003c/sup\u003e: [`docker-compose/mariadb102.yml`](./docker-compose/mariadb102.yml)\n\n#### [Markdown lint](https://github.com/markdownlint/markdownlint)\n\n```bash\nbundle exec mdl *.md\n```\n\n## Development with Docker\n\n```bash\ndocker build . -t masking\necho \"sample stdout\" | docker run -i masking\ndocker run masking -v\ndocker run --entrypoint sh -it masking # inside of docker container\n```\n\n## Profiling\n\nuse `bin/masking_profile`\n\n```bash\n $ cat your_sample.sql | bin/masking_profile\nflat result is saved at /your/repo/profile/flat.txt\ngraph result is saved at /your/repo/profile/graph.txt\ngraph html is saved at /your/repo/profile/graph.html\n\n $ open profile/flat.txt\n```\n\nsee also: [ruby-prof/ruby-prof: ruby-prof: a code profiler for MRI rubies](https://github.com/ruby-prof/ruby-prof)\n\n### Benchmark\n\nuse `benchmark/run.rb`\n\n```bash\n$ benchmark/run.rb\n       user     system      total        real\n   1.103012   0.009460   1.112472 (  1.123093)\n```\n\n## Future Todo\n\n* Pluggable/customizable for a mask way  e.g. integrate with [Faker](https://github.com/stympy/faker)\n* Parse the schema information and validate the target columns value\n* Performance optimization\n  * Write in the streaming process\n  * rewrite by another language?\n  * establish benchmark\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at [https://github.com/kibitan/masking](https://github.com/kibitan/masking).\nThis project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.\n\n## License\n\nThe gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).\n\n## Code of Conduct\n\nEveryone interacting in the Masking project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/kibitan/masking/blob/main/CODE_OF_CONDUCT.md).\n\n\u003ca name=\"footnote1\"\u003e1\u003c/a\u003e: \u003csmall\u003e MariaDB 10.2, 10.3, 10.7, 10.8, 10.9 is already not supported by [official](https://mariadb.org/about/maintenance-policy/)\u003c/small\u003e\n","funding_links":["https://github.com/sponsors/kibitan"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkibitan%2Fmasking","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkibitan%2Fmasking","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkibitan%2Fmasking/lists"}