{"id":16380576,"url":"https://github.com/edyan/neuralyzer","last_synced_at":"2025-04-04T08:09:32.073Z","repository":{"id":47168924,"uuid":"49074522","full_name":"edyan/neuralyzer","owner":"edyan","description":"Neuralyzer is a library and a command line tool to anonymize databases (by updating existing data or populating a table with fake data)","archived":false,"fork":false,"pushed_at":"2024-11-06T19:07:13.000Z","size":59564,"stargazers_count":51,"open_issues_count":3,"forks_count":11,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-28T07:09:20.835Z","etag":null,"topics":["anonymisation","anonymization","anonymize","data-generation","data-generator","data-privacy","database","dgpr","private-life","rgpd"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/edyan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-01-05T15:32:32.000Z","updated_at":"2025-02-04T14:21:07.000Z","dependencies_parsed_at":"2024-11-18T01:47:03.306Z","dependency_job_id":null,"html_url":"https://github.com/edyan/neuralyzer","commit_stats":{"total_commits":249,"total_committers":11,"mean_commits":"22.636363636363637","dds":0.4257028112449799,"last_synced_commit":"c994f01388e918d5724b653f866c0afe87f2f0cf"},"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edyan%2Fneuralyzer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edyan%2Fneuralyzer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edyan%2Fneuralyzer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edyan%2Fneuralyzer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/edyan","download_url":"https://codeload.github.com/edyan/neuralyzer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247142074,"owners_count":20890653,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymisation","anonymization","anonymize","data-generation","data-generator","data-privacy","database","dgpr","private-life","rgpd"],"created_at":"2024-10-11T03:51:48.274Z","updated_at":"2025-04-04T08:09:32.056Z","avatar_url":"https://github.com/edyan.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/edyan/neuralyzer/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/edyan/neuralyzer/?branch=master)\n[![Code Coverage](https://scrutinizer-ci.com/g/edyan/neuralyzer/badges/coverage.png?b=master)](https://scrutinizer-ci.com/g/edyan/neuralyzer/?branch=master)\n[![Build Status](https://scrutinizer-ci.com/g/edyan/neuralyzer/badges/build.png?b=master)](https://scrutinizer-ci.com/g/edyan/neuralyzer/build-status/master)\n[![Build Status](https://travis-ci.com/edyan/neuralyzer.svg?branch=master)](https://travis-ci.com/edyan/neuralyzer)\n\n\n\nedyan/neuralyzer\n=====\n\n## Summary\nThis project is a library and a command line tool that **anonymizes** a database by updating data\nor generating fake data (update vs insert). It uses [Faker](https://github.com/fakerphp/faker)\nto generate data from rules defined in a configuration file.\n\nAs it can do row per row or use batch mechanisms, you can load tables with\ndozens of millions of fake records.\n\nIt uses [Doctrine DBAL](https://github.com/doctrine/dbal) to abstract interactions with\ndatabases. It's then supposed to be able to work with any database type.\nCurrently it works (tested extensively) with MySQL, PostgreSQL and SQLServer.\n\n~~Neuralyzer has an option to clean tables by injecting a `DELETE FROM` with a `WHERE` critera\nbefore launching the anonymization (see the config parameters `delete` and `delete_where`).~~\n\nNeuralyzer had an option to clean tables but it's now managed by pre and post actions :\n```yaml\nentities:\n    books:\n        cols:\n            title: { method: sentence, params: [8], unique: true }\n        action: update\n        pre_actions:\n            - db.query(\"DELETE FROM books\")\npost_actions:\n    - db.query(\"DELETE FROM books WHERE title LIKE '%war%'\")\n\n```\n\n\n## Installation as a library\n```bash\ncomposer require edyan/neuralyzer\n```\n\n\n## Installation as an executable\nYou can even download the executable directly (example with v3.1):\n```bash\n$ wget https://github.com/edyan/neuralyzer/raw/v4.0/neuralyzer.phar\n$ sudo mv neuralyzer.phar /usr/local/bin/neuralyzer\n$ sudo chmod +x /usr/local/bin/neuralyzer\n$ neuralyzer\n```\n\n\n## Usage\nThe easiest way to use that tool is to start with the command line tool.\nAfter cloning the project and running a `composer install`, try:\n```bash\n$ bin/neuralyzer\n```\n\n\n### Generate the configuration automatically\nNeuralyzer is able to read a database and generate the configuration for you.\nThe command `config:generate` accepts the following options:\n```\nOptions:\n    -D, --driver=DRIVER              Driver (check Doctrine documentation to have the list) [default: \"pdo_mysql\"]\n    -H, --host=HOST                  Host [default: \"127.0.0.1\"]\n    -d, --db=DB                      Database Name\n    -u, --user=USER                  User Name [default: \"www-data\"]\n    -p, --password=PASSWORD          Password (or it'll be prompted)\n    -f, --file=FILE                  File [default: \"neuralyzer.yml\"]\n        --protect                    Protect IDs and other fields\n        --ignore-table=IGNORE-TABLE  Table to ignore. Can be repeated (multiple values allowed)\n        --ignore-field=IGNORE-FIELD  Field to ignore. Regexp in the form \"table.field\". Can be repeated (multiple values allowed)\n```\n\n#### Example\n```bash\nbin/neuralyzer config:generate --db test_db -u root -p root --ignore-table config --ignore-field \".*\\.id.*\"\n```\n\nThat produces a file which looks like:\n```yaml\nentities:\n    authors:\n        cols:\n            first_name: { method: firstName, unique: false }\n            last_name: { method: lastName, unique: false }\n        action: update # Will update existing data, \"insert\" would create new data\n        pre_actions: {  }\n        post_actions: {  }\n\n    books:\n        cols:\n            name: { method: sentence, params: [8] }\n            date_modified: { method: date, params: ['Y-m-d H:i:s', now] }\n        action: update\n        pre_actions: {  }\n        post_actions: {  }\n\nguesser: Edyan\\Neuralyzer\\Guesser\nguesser_version: '3.0'\nlanguage: en_US\n```\n\nYou have to modify the file to change its configuration. For example, if you need to remove data\nwhile anonymizing and change the language\n(see [Faker's doc](https://fakerphp.github.io/) for available languages), do :\n\n```yaml\n# be careful that some languages have only a few methods.\n# Example : https://github.com/FakerPHP/Faker/tree/v1.14.1/src/Faker/Provider/fr_FR\nlanguage: fr_FR\n```\n\n**INFO**: You can also use delete in standalone, without anonymizing anything. That will delete everything in books:\n```yaml\nentities:\n    authors:\n        cols:\n            first_name: { method: firstName, unique: false }\n            last_name: { method: lastName, unique: false }\n        action: update\n    books:\n        pre_actions:\n            - db.query(\"DELETE FROM books\")\n```\n\nIf you wanted to delete everything then insert 1000 new books:\n```yaml\nguesser_version: '3.0'\nentities:\n    authors:\n        cols:\n            first_name: { method: firstName, unique: false }\n            last_name: { method: lastName, unique: false }\n        action: update\n    books:\n        cols:\n            name: { method: sentence, params: [8] }\n        action: insert\n        pre_actions:\n            - db.query(\"DELETE FROM books\")\n        limit: 1000\n```\n\n\n### Run the anonymizer\nTo run the anonymizer, the command is simply \"run\" and expects:\n```\nOptions:\n    -D, --driver=DRIVER      Driver (check Doctrine documentation to have the list) [default: \"pdo_mysql\"]\n    -H, --host=HOST          Host [default: \"127.0.0.1\"]\n    -d, --db=DB              Database Name\n    -u, --user=USER          User Name [default: \"www-data\"]\n    -p, --password=PASSWORD  Password (or prompted)\n    -c, --config=CONFIG      Configuration File [default: \"neuralyzer.yml\"]\n    -t, --table=TABLE        Do a single table\n        --pretend            Don't run the queries\n    -s, --sql                Display the SQL\n\n    -m, --mode=MODE          Set the mode : batch or queries [default: \"batch\"]\n```\n#### Example\n```bash\nbin/neuralyzer run --db test_db -u root -p root\n```\n\nThat produces that kind of output:\n```bash\nAnonymizing authors\n 2/2 [============================] 100%\n\nQueries:\nUPDATE authors SET first_name = 'Don', last_name = 'Wisoky' WHERE id = '1'\nUPDATE authors SET first_name = 'Sasha', last_name = 'Denesik' WHERE id = '2'\n\n....\n```\n\n**WARNING**: On a huge table, `--sql` will produce a HUGE output. Use it for debugging purpose.\n\n\n## Library\nThe library is made to be integrated with any Tool such as a CLI tool. It contains:\n* A Configuration Reader and a Configuration Writer\n* A Guesser\n* A DB Anonymizer\n\n\n### Guesser\nThe guesser is the central piece of the config generator.\nIt guesses, according to the field name or field type what type of faker method to apply.\n\nIt can be extended very easily as it has to be injected to the Writer.\n\n\n### Configuration Writer\nThe writer is helpful to generate a yaml file that contains all tables and fields from a DB. A basic usage could be the following:\n\n```php\n\u003c?php\n\nrequire_once 'vendor/autoload.php';\n\n// Create a container\n$container = Edyan\\Neuralyzer\\ContainerFactory::createContainer();\n// Configure DB Utils, required\n$dbUtils = $container-\u003eget('Edyan\\Neuralyzer\\Utils\\DBUtils');\n// See Doctrine DBAL configuration :\n// https://www.doctrine-project.org/projects/doctrine-dbal/en/2.7/reference/configuration.html\n$dbUtils-\u003econfigure([\n    'driver' =\u003e 'pdo_mysql',\n    'host' =\u003e '127.0.0.1',\n    'dbname' =\u003e 'test_db',\n    'user' =\u003e 'root',\n    'password' =\u003e 'root',\n]);\n\n$writer = new \\Edyan\\Neuralyzer\\Configuration\\Writer;\n$data = $writer-\u003egenerateConfFromDB($dbUtils, new \\Edyan\\Neuralyzer\\Guesser);\n$writer-\u003esave($data, 'neuralyzer.yml');\n```\n\n\nIf you need, you can protect some cols (with regexp) or tables:\n```php\n\u003c?php\n// ...\n$writer = new \\Edyan\\Neuralyzer\\Configuration\\Writer;\n$writer-\u003eprotectCols(true); // will protect primary keys\n// define cols to protect (must be prefixed with the table name)\n$writer-\u003esetProtectedCols([\n    '.*\\.id',\n    '.*\\..*_id',\n    '.*\\.date_modified',\n    '.*\\.date_entered',\n    '.*\\.date_created',\n    '.*\\.deleted',\n]);\n// Define tables to ignore, also with regexp\n$writer-\u003esetIgnoredTables([\n    'acl_.*',\n    'config',\n    'email_cache',\n]);\n// Write the configuration\n$data = $writer-\u003egenerateConfFromDB($dbUtils, new \\Edyan\\Neuralyzer\\Guesser);\n$writer-\u003esave($data, 'neuralyzer.yml');\n```\n\n\n### Configuration Reader\nThe configuration Reader is the exact opposite of the Writer. Its main job is to validate that the configuration\nof the yaml file is correct then to provide methods to access its parameters. Example:\n```php\n\u003c?php\nrequire_once 'vendor/autoload.php';\n\n// will throw an exception if it's not valid\n$reader = new Edyan\\Neuralyzer\\Configuration\\Reader('neuralyzer.yml');\n$tables = $reader-\u003egetEntities();\n```\n\n\n### DB Anonymizer\nThe only anonymizer currently available is the DB one. It expects a PDO and a Configuration Reader objects:\n```php\n\u003c?php\n\nrequire_once 'vendor/autoload.php';\n\n// Create a container\n$container = Edyan\\Neuralyzer\\ContainerFactory::createContainer();\n$expression = $container-\u003eget('Edyan\\Neuralyzer\\Utils\\Expression');\n// Configure DB Utils, required\n$dbUtils = $container-\u003eget('Edyan\\Neuralyzer\\Utils\\DBUtils');\n// See Doctrine DBAL configuration :\n// https://www.doctrine-project.org/projects/doctrine-dbal/en/2.7/reference/configuration.html\n$dbUtils-\u003econfigure([\n    'driver' =\u003e 'pdo_mysql',\n    'host' =\u003e '127.0.0.1',\n    'dbname' =\u003e 'test_db',\n    'user' =\u003e 'root',\n    'password' =\u003e 'root',\n]);\n\n$db = new \\Edyan\\Neuralyzer\\Anonymizer\\DB($expression, $dbUtils);\n$db-\u003esetConfiguration(\n    new \\Edyan\\Neuralyzer\\Configuration\\Reader('neuralyzer.yml')\n);\n\n```\n\n\nOnce initialized, the method that anonymize the table is the following:\n```php\n\u003c?php\npublic function processEntity(string $entity, callable $callback = null): array;\n```\n\nParameters:\n* `Entity`: such as table name (required)\n* `Callback` (callable / optional) to use a progress bar for example\n\nA few options can be set by calling :\n```php\n\u003c?php\n// Limit of fake generated records for updates and creates.\n// Default : 0 = everything to update / nothing to insert\npublic function setLimit(int $limit);\n// Don't do anything, default true\npublic function setPretend(bool $pretend);\n// Return or not a result, default false\npublic function setReturnRes(bool $returnRes);\n```\n\n\nFull Example:\n```php\n\u003c?php\n\nrequire_once 'vendor/autoload.php';\n\n// Create a container\n$container = Edyan\\Neuralyzer\\ContainerFactory::createContainer();\n$expression = $container-\u003eget('Edyan\\Neuralyzer\\Utils\\Expression');\n// Configure DB Utils, required\n$dbUtils = $container-\u003eget('Edyan\\Neuralyzer\\Utils\\DBUtils');\n// See Doctrine DBAL configuration :\n// https://www.doctrine-project.org/projects/doctrine-dbal/en/2.7/reference/configuration.html\n$dbUtils-\u003econfigure([\n    'driver' =\u003e 'pdo_mysql',\n    'host' =\u003e 'mysql',\n    'dbname' =\u003e 'test_db',\n    'user' =\u003e 'root',\n    'password' =\u003e 'root',\n]);\n\n$reader = new \\Edyan\\Neuralyzer\\Configuration\\Reader('neuralyzer.yml');\n\n$db = new \\Edyan\\Neuralyzer\\Anonymizer\\DB($expression, $dbUtils);\n$db-\u003esetConfiguration($reader);\n$db-\u003esetPretend(false);\n// Get tables\n$tables = $reader-\u003egetEntities();\nforeach ($tables as $table) {\n    $total = $dbUtils-\u003ecountResults($table);\n\n    if ($total === 0) {\n        fwrite(STDOUT, \"$table is empty\" . PHP_EOL);\n        continue;\n    }\n    fwrite(STDOUT, \"$table anonymized\" . PHP_EOL);\n\n    $db-\u003eprocessEntity($table);\n}\n\n```\n\n\n## Pre and Post Actions\nYou can set an array of `pre_actions` and `post_actions` that will be\nexecuted *before* and *after* neuralyzer starts to anonymize an entity.\n\nThese actions are actually symfony expressions (see [Symfony Expression Language](https://))\nthat rely on *Services*. These Services are loaded from the `Service/` directory.\n\nFor now there is only one service : `Database` that contains a method `query` usable like that :\n`db.query(\"DELETE FROM table\")`.\n\n\n## Configuration Reference\n`bin/neuralyzer config:example` provides a default configuration with all parameters explained :\n```yaml\nconfig:\n\n    # Set the guesser class\n    guesser:              Edyan\\Neuralyzer\\Guesser\n\n    # Set the version of the guesser the conf has been written with\n    guesser_version:      '3.0'\n\n    # Faker's language, make sure all your methods have a translation\n    language:             en_US\n\n    # List all entities, theirs cols and actions\n    entities:             # Required, Example: people\n\n        # Prototype\n        -\n\n            # Either \"update\" or \"insert\" data\n            action:               update\n\n            # Should we delete data with what is defined in \"delete_where\" ?\n            delete:               ~ # Deprecated (delete and delete_where have been deprecated. Use now pre and post_actions)\n\n            # Condition applied in a WHERE if delete is set to \"true\"\n            delete_where:         ~ # Deprecated (delete and delete_where have been deprecated. Use now pre and post_actions), Example: '1 = 1'\n            cols:\n\n                # Examples:\n                first_name:\n                    method:              firstName\n                last_name:\n                    method:              lastName\n\n                # Prototype\n                -\n\n                    # Faker method to use, see doc : https://fakerphp.github.io/\n                    method:               ~ # Required\n\n                    # Set this option to true to generate unique values for that field (see faker-\u003eunique() generator)\n                    unique:               false\n\n                    # Faker's parameters, see Faker's doc\n                    params:               []\n\n            # Limit the number of written records (update or insert). 100 by default for insert\n            limit:                0\n\n            # The list of expressions language actions to executed before neuralyzing. Be careful that \"pretend\" has no effect here.\n            pre_actions:          []\n\n            # The list of expressions language actions to executed after neuralyzing. Be careful that \"pretend\" has no effect here.\n            post_actions:         []\n\n```\n\n## Custom application logic\n\nWhen using custom doctrine types doctrine will produce an error that the type is not know.\nThis can be solved by providing a bootstrap file to register the custom doctrine type.\n\nbootstrap.php\n```php\n\u003c?php\n\nrequire_once '../vendor/autoload.php';\n\n\\Doctrine\\DBAL\\Types\\Type::addType('custom_type', 'Namespace\\Of\\The\\Custom\\Type');\n```\n\nThen provide the bootstrap file to the run command:\n\n```bash\nbin/neuralyzer run --db test_db -u root -p root -b bootstrap.php\n```\n\n\n\n## Development\nNeuralyzer uses [Robo](https://robo.li) to run its tests (via Docker) and build its phar.\n\nClone the project, run `composer install` then...\n\n### Run the tests\n* Change the `--wait` option if you have a lot of errors because DB is not ready.\n* Change the `--php` option for `7.2` or `7.4`\n* Set `--no-coverage` if you want to disable PHPUnit Code Coverage.\n\n#### With MySQL\n```bash\n$ vendor/bin/robo test --php 7.2 --wait 10 --db mysql --db-version 5\n$ vendor/bin/robo test --php 7.3 --wait 10 --db mysql --db-version 8\n$ vendor/bin/robo test --php 7.4 --wait 10 --db mysql --db-version 8\n$ vendor/bin/robo test --php 8.0 --wait 10 --db mysql --db-version 8\n```\n#### With PostgreSQL 9, 10 and 11 (12 also works)\n```bash\n$ vendor/bin/robo test --php 7.2 --wait 10 --db pgsql --db-version 10\n$ vendor/bin/robo test --php 7.3 --wait 10 --db pgsql --db-version 11\n$ vendor/bin/robo test --php 7.4 --wait 10 --db pgsql --db-version 12\n$ vendor/bin/robo test --php 8.0 --wait 10 --db pgsql --db-version 13\n```\n#### With SQL Server\n**Warning** : 2 tests *fail*, because of strange behaviors of SQL Server ... or Doctrine / Dbal. PHPUnit can't compare 2 Datasets because the fields are not in the same order.\n```bash\n$ vendor/bin/robo test --php 7.2 --wait 15 --db sqlsrv\n$ vendor/bin/robo test --php 7.3 --wait 15 --db sqlsrv\n$ vendor/bin/robo test --php 7.4 --wait 15 --db sqlsrv\n$ vendor/bin/robo test --php 8.0 --wait 15 --db sqlsrv\n```\n\n### Build a release (with a phar and a git tag)\n```bash\n$ php -d phar.readonly=0 vendor/bin/robo release\n```\n\n### Build the phar only\n```bash\n$ php -d phar.readonly=0 vendor/bin/robo phar\n```\n\n\n### Improve code quality with phpinsights\n```bash\ndocker run -it --rm -v $(pwd):/app nunomaduro/phpinsights analyse --fix\n```\n\n### Update dependencies to make sure it'll work with PHP 7.2\n```bash\nvendor/bin/robo composer:update\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fedyan%2Fneuralyzer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fedyan%2Fneuralyzer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fedyan%2Fneuralyzer/lists"}