{"id":29880447,"url":"https://github.com/itsolutionsfactory/dbcut","last_synced_at":"2025-07-31T09:42:49.462Z","repository":{"id":57417969,"uuid":"223444123","full_name":"itsolutionsfactory/dbcut","owner":"itsolutionsfactory","description":"Extract a lightweight subset of your relational production database for development and testing purpose.","archived":false,"fork":false,"pushed_at":"2021-08-10T09:26:49.000Z","size":1115,"stargazers_count":22,"open_issues_count":2,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-07-12T02:11:13.368Z","etag":null,"topics":["database","datadumper","development","productivity","reproducibility"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/itsolutionsfactory.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-11-22T16:34:19.000Z","updated_at":"2025-01-29T11:30:05.000Z","dependencies_parsed_at":"2022-09-03T08:52:10.165Z","dependency_job_id":null,"html_url":"https://github.com/itsolutionsfactory/dbcut","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/itsolutionsfactory/dbcut","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsolutionsfactory%2Fdbcut","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsolutionsfactory%2Fdbcut/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsolutionsfactory%2Fdbcut/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsolutionsfactory%2Fdbcut/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/itsolutionsfactory","download_url":"https://codeload.github.com/itsolutionsfactory/dbcut/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsolutionsfactory%2Fdbcut/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268017357,"owners_count":24181669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-31T02:00:08.723Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","datadumper","development","productivity","reproducibility"],"created_at":"2025-07-31T09:42:47.024Z","updated_at":"2025-07-31T09:42:49.219Z","avatar_url":"https://github.com/itsolutionsfactory.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"DBcut\n=====\n\n.. image:: https://img.shields.io/pypi/v/dbcut.svg\n    :target: https://pypi.python.org/pypi/dbcut\n\n.. image:: https://travis-ci.org/itsolutionsfactory/dbcut.svg?branch=master\n    :target: https://travis-ci.org/itsolutionsfactory/dbcut\n    :alt: CI Status\n\n\n.. image:: docs/db-cute-small.png\n   :alt: DBcut logo\n   :align: center\n\n\nDBcut aims to allow the extraction of lightweight subset of relational production database for development and testing\npurpose.\n\n\nTable of Contents\n-----------------\n\n-  `Overview \u003c#overview\u003e`__\n\n   -  `Usage \u003c#usage\u003e`__\n\n-  `Getting started \u003c#getting-started\u003e`__\n-  `Under The Hood \u003c#under-the-hood\u003e`__\n\n   -  `Database Reflection and Loading\n      Stategy \u003c#database-reflection-and-loading-stategy\u003e`__\n   -  `SQL from YAML \u003c#sql-from-yaml\u003e`__\n   -  `Extraction Graph \u003c#extraction-graph\u003e`__\n\nOverview\n--------\n\nIts main features are:\n\n-  Extract data from large databases.\n-  Reinject data into another base.\n-  Target and source databases could be based on different DBMS (i.e., MySQL -\u003e PostgreSQL/SQLite).\n-  Extraction queries simplified in YAML.\n-  Support nested associations.\n-  Json and plain SQL export.\n-  Reasonable performance.\n-  Caching of extractions to accelerate future extractions.\n\nUsage\n~~~~~\n\n.. code:: shell\n\n   Usage: dbcut [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...\n\n     Extract a lightweight subset of your production DB for development and\n     testing purpose.\n\n   Options:\n     -c, --config PATH    Configuration file\n     --version            Show the version and exit.\n     -y, --force-yes      Never prompts for user intervention\n     -i, --interactive    Prompts for user intervention.\n     --quiet, --no-quiet  Suppresses most warning and diagnostic messages.\n     --debug              Enables debug mode.\n     --verbose            Enables verbose output.\n     -h, --help           Show this message and exit.\n\n   Commands:\n     load        Extract and load data to the target database.\n     flush       Remove ALL TABLES from the target database and recreate them\n     inspect     Check databases content.\n     dumpsql     Dump all SQL insert queries.\n     dumpjson    Export data to json.\n     clear       Remove all data (only) from the target database\n     purgecache  Remove all cached queries.\n\nGetting started\n---------------\n\nLet's take the following database example:\n\n.. image:: docs/example-simple-db.png\n   :alt: Simple Database\n\nWe want to extract some users with all related data to our development database.\n\nFirst, we have to edit the extraction file ``dbcut.yaml`` as follows:\n\n.. code:: yaml\n\n   # dbcut.yml\n   databases:\n     source_uri: mysql://foo:bar@db-host/prod\n     destination_uri: sqlite:///small-dev-database.db\n\n   queries:\n     - from: user\n       limit: 2\n\nThen, we set the limit to two users, the default limit being 10.\n\nAfter that, we launch the extraction command with the ``load`` command:\n\n.. code:: shell\n\n   $ dbcut load\n    ---\u003e Reflecting database schema from mysql://foo:***@db-host/prod\n    ---\u003e Creating new sqlite:///small-dev-database.db database\n    ---\u003e Creating all tables and relations on sqlite:///small-dev-database.db\n\n   Query 1/1 :\n\n       from: user\n       limit: 2\n       backref_limit: 10\n       backref_depth: 5\n       join_depth: 5\n       exclude: []\n       include: []\n\n\n        ┌─ⁿ─comment\n        ├─ⁿ─vote\n    user┤\n        └─ⁿ─user_group┐\n                      └─¹─group┐\n                               └─¹─role┐\n                                       └─ⁿ─role_permission┐\n                                                          └─¹─permission\n\n\n   8 tables loaded\n\n    ---\u003e Cache key : 4a468c3555074890b7c342c0a575f29d47145821\n    ---\u003e Executing query\n    ---\u003e Fetching objects\n    ---\u003e Inserting 31 rows\n\nWe can check the data on our new database :\n\n.. code:: shell\n\n   $ ls\n   dbcut.yml  small-dev-database.db\n   $ sqlite3 small-dev-database.db\n\n.. code:: sql\n\n   sqlite\u003e SELECT id, login FROM user;\n   3|jerome\n   4|julien\n\n.. code:: sql\n\n   sqlite\u003e SELECT * from comment;\n   8|comment jerome 1|3\n   9|comment jerome 2|3\n   10|comment jerome 3|3\n\nIn the following example, we are going to retrieve roles with related groups and permissions. In order to obtain the\nbest extraction graph, we are going to use the keyword ``include``, which indicated to dbcut that we want to minimize\nthe number of associated tables (Nested associations).\n\n.. code:: yaml\n\n   queries:\n     - from: user\n       limit: 2\n\n     - from: role\n       include:\n         - group\n         - permission\n\nIt is possible to empty the content of the local database before beginning the extraction with the ``clear`` command.\n\n.. code:: shell\n\n   $ dbcut -y clear load\n    ---\u003e Removing all data from sqlite:///small-dev-database.db database\n    ---\u003e Reflecting database schema from mysql://foo:***@db-host/prod?charset=utf8\n    ---\u003e Creating all tables and relations on sqlite:///small-dev-database.db\n\n   Query 1/2 :\n\n       from: user\n       limit: 2\n       backref_limit: 10\n       backref_depth: 5\n       join_depth: 5\n       exclude: []\n       include: []\n\n\n        ┌─ⁿ─comment\n        ├─ⁿ─vote\n    user┤\n        └─ⁿ─user_group┐\n                      └─¹─group┐\n                               └─¹─role┐\n                                       └─ⁿ─role_permission┐\n                                                          └─¹─permission\n\n\n   8 tables loaded\n\n    ---\u003e Cache key : 4a468c3555074890b7c342c0a575f29d47145821\n    ---\u003e Using cache (2 elements)\n    ---\u003e Fetching objects\n    ---\u003e Inserting 31 rows\n\n   Query 2/2 :\n\n       from: role\n       limit: 10\n       backref_limit: 10\n       backref_depth: null\n       join_depth: null\n       exclude: []\n       include:\n       - group\n       - permission\n\n\n        ┌─ⁿ─group\n    role┤\n        └─ⁿ─role_permission┐\n                           └─¹─permission\n\n\n   4 tables loaded\n\n    ---\u003e Cache key : 5029d84dbb2bc75a7df898dd94df93b395e91e44\n    ---\u003e Executing query\n    ---\u003e Fetching objects\n    ---\u003e Inserting 22 rows\n\nAs you can see in the first query, the cache was used and there was thus no interaction with the source database.\n\nThis query allowed the extraction of all roles:\n\n.. code:: sql\n\n   sqlite\u003e SELECT * from role;\n   1|admin\n   2|moderator\n   3|user\n\nIf we had not used the ``include`` keyword, all tables would have been extracted:\n\n::\n\n           ┌─ⁿ─role_permission┐\n           │                  └─¹─permission\n       role┤\n           └─ⁿ─group┐\n                    └─ⁿ─user_group┐\n                                  │       ┌─ⁿ─comment\n                                  └─¹─user┤\n                                          └─ⁿ─vote\n\nTo narrow more precisely our extraction, we are now going to limit to roles that can delete a user.\n\n.. code:: yaml\n\n   queries:\n     - from: user\n       limit: 2\n\n     - from: role\n       include:\n         - group\n         - permission\n       where:\n         permission.codename: 'delete_user'\n\nOnly the last extraction rule is relaunched with the ``--last-only`` option.\n\n.. code:: shell\n\n   $ dbcut -y clear load --last-only\n   ...\n    ---\u003e Cache key : ffb664a2e69c88fa48db2680daf71d30408bd207\n    ---\u003e Executing query\n    ---\u003e Fetching objects\n    ---\u003e Inserting 14 rows\n\nThis time, only the 'admin' role is retrieved:\n\n.. code:: sql\n\n   sqlite\u003e SELECT * from role;\n   1|admin\n\nPlease note that the filter only applies here to role table (``from``) and not to the permission.\n\n.. code:: sql\n\n   sqlite\u003e SELECT * FROM permission\";\n   1|delete_comment\n   2|delete_vote\n   3|delete_user\n   4|create_comment\n   5|create_vote\n   6|create_user\n\nIndeed, we filter the roles based on a value from the permission table, but we do retrieved all permissions associated\nto this role.\n\nIn the above example, it makes sense that the admin role has all permissions.\n\nLast but not least, we can also retrieve data in json or raw sql format !\n\n.. code:: shell\n\n   $ dbcut dumpjson|dumpsql\n\n.. code:: json\n\n   [\n     {\n       \"password\": \"julien\",\n       \"vote_collection\": [\n         {\n           \"user_id\": 4,\n           \"comment_id\": 1,\n           \"id\": 3,\n           \"rating\": 4\n         },\n         {\n           \"user_id\": 4,\n           \"comment_id\": 3,\n           \"id\": 6,\n           \"rating\": 10\n         },\n         {\n           \"user_id\": 4,\n           \"comment_id\": 6,\n           \"id\": 13,\n           \"rating\": 10\n         }\n       ],\n       \"comment_collection\": [],\n       \"id\": 4,\n       \"login\": \"julien\",\n       \"user_group_collection\": [\n         {\n           \"user_id\": 4,\n           \"group\": {\n             \"name\": \"Utilisateur\",\n             \"role\": {\n               \"id\": 3,\n               \"role_permission_collection\": [\n                 {\n                   \"permission\": {\n                     \"id\": 4,\n                     \"codename\": \"create_comment\",\n                     \"role_permission_collection\": []\n                   },\n\n.. code:: sql\n\n   PRAGMA foreign_keys = OFF;\n\n   BEGIN;\n   INSERT OR IGNORE INTO permission (id, codename) VALUES (4, 'create_comment');\n   INSERT OR IGNORE INTO permission (id, codename) VALUES (5, 'create_vote');\n   INSERT OR IGNORE INTO permission (id, codename) VALUES (1, 'delete_comment');\n   INSERT OR IGNORE INTO permission (id, codename) VALUES (2, 'delete_vote');\n   INSERT OR IGNORE INTO role (id, name) VALUES (3, 'user');\n   INSERT OR IGNORE INTO role (id, name) VALUES (2, 'moderator');\n   INSERT OR IGNORE INTO user (id, login, password) VALUES (4, 'julien', 'julien');\n   INSERT OR IGNORE INTO user (id, login, password) VALUES (3, 'jerome', 'jerome');\n   INSERT OR IGNORE INTO \"group\" (id, name, role_id) VALUES (3, 'Utilisateur', 3);\n   INSERT OR IGNORE INTO \"group\" (id, name, role_id) VALUES (2, 'Moderateur', 2);\n   INSERT OR IGNORE INTO comment (id, content, user_id) VALUES (8, 'comment jerome 1', 3);\n   INSERT OR IGNORE INTO comment (id, content, user_id) VALUES (9, 'comment jerome 2', 3);\n   INSERT OR IGNORE INTO comment (id, content, user_id) VALUES (10, 'comment jerome 3', 3);\n   INSERT OR IGNORE INTO role_permission (id, role_id, permission_id) VALUES (12, 3, 4);\n   INSERT OR IGNORE INTO role_permission (id, role_id, permission_id) VALUES (13, 3, 5);\n   INSERT OR IGNORE INTO role_permission (id, role_id, permission_id) VALUES (7, 2, 4);\n   INSERT OR IGNORE INTO role_permission (id, role_id, permission_id) VALUES (8, 2, 5);\n   INSERT OR IGNORE INTO role_permission (id, role_id, permission_id) VALUES (10, 2, 1);\n   INSERT OR IGNORE INTO role_permission (id, role_id, permission_id) VALUES (11, 2, 2);\n   INSERT OR IGNORE INTO user_group (id, user_id, group_id) VALUES (4, 4, 3);\n   INSERT OR IGNORE INTO user_group (id, user_id, group_id) VALUES (3, 3, 2);\n   INSERT OR IGNORE INTO vote (id, rating, user_id, comment_id) VALUES (3, 4, 4, 1);\n   INSERT OR IGNORE INTO vote (id, rating, user_id, comment_id) VALUES (6, 10, 4, 3);\n   INSERT OR IGNORE INTO vote (id, rating, user_id, comment_id) VALUES (13, 10, 4, 6);\n   INSERT OR IGNORE INTO vote (id, rating, user_id, comment_id) VALUES (2, 5, 3, 1);\n   INSERT OR IGNORE INTO vote (id, rating, user_id, comment_id) VALUES (5, 1, 3, 2);\n   INSERT OR IGNORE INTO vote (id, rating, user_id, comment_id) VALUES (7, 10, 3, 3);\n   INSERT OR IGNORE INTO vote (id, rating, user_id, comment_id) VALUES (10, 6, 3, 1);\n   INSERT OR IGNORE INTO vote (id, rating, user_id, comment_id) VALUES (11, 5, 3, 5);\n   INSERT OR IGNORE INTO vote (id, rating, user_id, comment_id) VALUES (12, 6, 3, 6);\n   INSERT OR IGNORE INTO vote (id, rating, user_id, comment_id) VALUES (19, 10, 3, 10);\n   COMMIT;\n\nUnder The Hood\n--------------\n\nDatabase Reflection and Loading Stategy\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDBcut heavily uses SQLAlchemy, the SQL toolkit and Object Relational Mapper for Python. The ORM makes it possible to\nfree ourselves from the SQL direct manipulation, but that is not all. SQLAlchemy offers a range of toolkits that enable\nus to programmatically build all SQL queries useful to DBcut. This include both the schema creation and all of its\nproperties, the select, join and insert queries… no matter which DBMS is used (PostgreSQL, MySQL, SQLite, oracle etc.).\n\nOne of the most important features of DBcut is that the user does not need to know or provide the source database\nschema to use it. First of all, DBcut will inspect the source database and retrieve all metadata. This action is what we\ncall: *Database Reflection*.\n\n.. image:: docs/database_reflection.png\n   :alt: Database Reflection\n\n\nThe MetaData object store all the collection of metadata entities. DBcut will alter this MetaData object to make it\ncompatible with most DBMS. For example, the names of indexes or foreign keys can be too long for SQLite but not for\nMYSQL. Sometimes, it also changes the types of the column to make it match what is expected in the target database.\n(``mysql.TINYINT`` became ``SMALLINT`` in SQLite and PostgreSQL)\n\nOnce the MetaData object is complete, we can create the new database which is almost identical to the source database\n(except some compatibility adjustments)\n\nDBcut will generate and launch extraction request on the source database. The data thus obtained will be detached from\nthe first SQLAlchemy session to be attached to the new session in the target database. This is where the SQLAlchemy\nmagic happens: the same request will be used to extract data from the source database and to load them into the target\ndatabase. Indeed, in the first case (query/fetch), it will be translated into SQL ``SELECT`` queries and in the second\ncase, into SQL ``INSERT`` statements (load).\n\nSQL from YAML\n~~~~~~~~~~~~~\n\nOne of the goals of DBcut is to allow quick writing of extraction requests. Most of the time, to write an extraction\nrequest, not much information is needed: only the main table name, hoping to retrieve the maximum number of related\ndata as possible.\n\nThe idea was to find a sufficiently concise syntax that allows us to build the most complete extraction requests with\nthe minimum effort.\n\nThe YAML came to us naturally as it is pleasant to read, easy to understand and to edit for humans.\n\nThe ``dbcut.yml`` file is both used to configure DBcut and to write\nextraction requests.\n\n.. code:: yaml\n\n   databases:\n     source_uri: mysql://chinook:chinook@192.168.66.66/chinook\n     destination_uri: sqlite:///chinook.db\n\n   queries:\n     - from: customer_customer\n\nTo write an extraction request, only the keyword ``from`` is mandatory. However, other keywords can be added to reduce\nthe size of data to retreive.\n\n.. code:: yaml\n\n     - from: contracts_customer\n       where:\n         brand: 2\n       limit: 100\n       backref_limit: 500\n       backref_depth: 2\n       join_depth: 5\n       exclude:\n       - django_admin_log\n       - django_session\n       include: []\n\nUnlike the SQL queries, an extraction request using DBcut automatically and recursively loads all associated relations\n(See `Extraction Graph \u003c#extraction-graph\u003e`__). All these options are filtering and reducing options that prevents from\nslowing down the extraction process.\n\nFinally, with the scope of making the extraction requests as compact as possible, we can add default values to most of\nthese options:\n\n.. code:: yaml\n\n   default_limit: 100\n   default_backref_limit: 500\n\n   default_backref_depth: 2\n   default_join_depth: 5\n\n   global_exclude:\n     - django_admin_log\n     - django_session\n\nExtraction Graph\n~~~~~~~~~~~~~~~~\n\nTo build an extraction request, we first build its extraction graph.\n\nAn extraction graph is a subset of the complete graph of database relations. Every node represents a table, and each\nlink represents a relation between two tables. The link direction is defined by the foreign key.\n\nTo build this graph, we use the ``MetaData`` object (See `Database Reflection and Loading Stategy\n\u003c#database-reflection-and-loading-stategy\u003e`__).\n\nLet's use the following database schema:\n\n\n.. image:: docs/chinook_schema.png\n   :alt: Database chinook schema\n\n\nThe retrieved metadata during the database reflection are used to build\nthe following complete graph of relations:\n\n\n.. image:: docs/chinook_uml_graph.png\n   :alt: Complete graph of relations\n\n\nTo build the extraction graph, we browse the complete graph starting\nfrom the table used in the ``from`` instruction. The browsing only stops\nif :\n\n-  the link has already been browsed\n-  the table is explicitly excluded\n-  the maximum depth is reached\n\nFor the following request:\n\n.. code:: yaml\n\n   queries:\n     - from: customer_customer\n\nThe generated extraction graph is:\n\n\n.. image:: docs/dbcut-load-chinook.png\n   :alt: Generated extraction graph\n\n\nPlease note that we handle the two types of relations : one-to-many relations (noted ``1`` in the extraction graph) and\nmany-to-many relations (noted ``n``).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitsolutionsfactory%2Fdbcut","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fitsolutionsfactory%2Fdbcut","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitsolutionsfactory%2Fdbcut/lists"}