{"id":16301650,"url":"https://github.com/bbkr/uprooted","last_synced_at":"2025-07-18T01:03:17.898Z","repository":{"id":37589748,"uuid":"227689645","full_name":"bbkr/UpRooted","owner":"bbkr","description":"Extract subtrees of data from relational databases.","archived":false,"fork":false,"pushed_at":"2023-02-18T20:46:24.000Z","size":145,"stargazers_count":3,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-10T01:18:08.633Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Raku","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"artistic-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bbkr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-12T20:22:18.000Z","updated_at":"2023-12-11T12:01:45.000Z","dependencies_parsed_at":"2024-11-05T22:01:39.818Z","dependency_job_id":null,"html_url":"https://github.com/bbkr/UpRooted","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bbkr/UpRooted","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbkr%2FUpRooted","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbkr%2FUpRooted/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbkr%2FUpRooted/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbkr%2FUpRooted/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bbkr","download_url":"https://codeload.github.com/bbkr/UpRooted/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbkr%2FUpRooted/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265686538,"owners_count":23811208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-10T20:55:16.511Z","updated_at":"2025-07-18T01:03:17.804Z","avatar_url":"https://github.com/bbkr.png","language":"Raku","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Extract subtrees of data from relational databases in [Raku](https://www.raku.org) language.\n\n[![test](https://github.com/bbkr/UpRooted/actions/workflows/test.yml/badge.svg)](https://github.com/bbkr/UpRooted/actions/workflows/test.yml)\n\n## DESCRIPTION\n\nThis module allows to extract tree of data from relational database and process it.\nTree of data may be for example some user account and all records in related tables that belong to him.\nUseful for cases like:\n\n* Transferring users between database shards for better load balancing.\n* Cloning user data from production environment to devel to debug some issues in isolated manner.\n* Saving user state for backup or legal purposes.\n\nThis module is NOT continuous replication tool (like for example Debezium).\n\n## SYNOPSIS\n\nLet's say you have MySQL database and want to save user of `id = 1` from `users` table with all his data from other tables to `.sql` file. \n\n```raku\nmy $connection = DBIish.connect( 'mysql', host =\u003e ..., port =\u003e ..., ... );\n\nuse UpRooted::Schema::MySQL;\nmy $schema = UpRooted::Schema::MySQL.new( :$connection );\n\nuse UpRooted::Tree;\nmy $tree = UpRooted::Tree.new( root-table =\u003e $schema.table( 'users' ) );\n\nuse UpRooted::Reader::MySQL;\nmy $reader = UpRooted::Reader::MySQL.new( :$connection, :$tree );\n\nuse UpRooted::Writer::MySQLFile;\nmy $writer = UpRooted::Writer::MySQLFile.new;\n    \n$writer.write( $reader, id =\u003e 1 );\n```\n\nYour user will be saved as `out.sql` file.\n\n## DOCKER\n\nIf you do not have Raku installed UpRooted is also available as [Docker image](https://hub.docker.com/r/bbkr2/uprooted).\n\n## MODULES\n\nThis section explains role of every module in `UpRooted` stack and tells which variants of each module are available.\n\n### UpRooted::Schema\n\n`UpRooted::Schema` describes relation between `UpRooted::Tables`.\n\nIt can be discovered automatically by plugins like:\n\n* `UpRooted::Schema::MySQL`\n* `UpRooted::Schema::PostgreSQL`\n\nIn rare cases you may need to construct or fine tune `UpRooted::Schema` manually. For example if you use MySQL MyISAM engine or MySQL partitioning. Or you use PostgreSQL foreign keys relying on unique keys instead of unique constraints. Without proper foreign keys relations between `UpRooted::Table`s cannot be auto discovered and must be defined by hand. There is [separate manual](docs/Schema.md) describing this process.\n\nCreating `UpRooted::Schema` must be done only once per schema.\n\n### UpRooted::Tree\n\n`UpRooted::Tree` knows how to reach each leaf `UpRooted::Table` from chosen root `UpRooted::Table`.\nIt also resolves `UpRooted::Table`s order correctly to satisfy foreign key constraints, which is important for example when writing data tree to online database.\n\nYou can derive many `UpRooted::Tree`s from single `UpRooted::Schema`, depending on which root `UpRooted::Table` is used.\n\nCreating `UpRooted::Tree` must be done only once per root `UpRooted::Table`.\n\n### UpRooted::Reader\n\n`UpRooted::Reader` transforms `UpRooted::Tree` to series of queries allowing to extract data that belong to given row in root `UpRooted::Table`. This is always database engine specific.\n\nAvailable variants:\n\n* `UpRooted::Reader::MySQL`\n* `UpRooted::Reader::PostgreSQL`\n\nCreating `UpRooted::Reader` must be done only once per `UpRooted::Tree`.\n\n### UpRooted::Writer\n\n`UpRooted::Writer` writes data provided by `UpRooted::Reader`.\n\nAvailable variants:\n\n* `UpRooted::Writer::MySQL` - Write directly to another MySQL database.\n* `UpRooted::Writer::MySQLFile` - Write to `.sql` file compatible with MySQL.\n* `UpRooted::Writer::PostgreSQL` - Write directly to another PostgreSQL database.\n* `UpRooted::Writer::PostgreSQLFile` - Write to `.sql` file compatible with PostgreSQL.\n\nNote that `UpRooted::Reader` and `UpRooted::Writer` are independent. You can read from MySQL database and write directly to PostgreSQL database if needed.\n\nDisabling `UpRooted::Schema` name in Fully Qualified Names in queries may be useful for example when you need to write data to whatever schema is / will be used in connection.\nTo do so provide flag to constructor like this: `UpRooted::Writer::MySQL.new( :!use-schema-name )`.\nTo find other options accepted by each `UpRooted::Writer` call `p6doc` on chosen module.\n\nNote that not every `UpRooted::Writer` can save every data type provided by `UpRooted::Reader`. For example MySQL does not support PostgreSQL array types and will save them as joined strings.\n\n## CACHING\n\nCreating instances of modules mentioned above are heavy operations, especially on large schemas. You can reuse all of them for great speed improvement.\n\nFor example if you need to save multiple users you need to create `UpRooted::Schema`, `UpRooted::Tree`, `UpRooted::Reader` and `UpRooted::Writer` only once.\n\n```raku\nmy $schema = ...;\nmy $tree = ...;\nmy $reader = ...;\nmy $writer = UpRooted::Writer::MySQLFile.new(\n    # name generator to avoid file name conflicts\n    file-naming =\u003e sub ( $tree, %conditions ) {\n        %conditions{ 'id' } ~ '.sql'\n    }\n);\n    \n$writer.write( $reader, id =\u003e 1 );\n$writer.write( $reader, id =\u003e 2 );\n$writer.write( $reader, id =\u003e 3 );\n```\n\nIt will create `1.sql`, `2.sql`, `3.sql` files without rediscovering everything every time.\n\n## EXTENSIONS\n\n`UpRooted` is written with extensibility in mind.\n\nBase version will focus on MySQL and PostgreSQL databases, as those are the two most common open source ones.\n\nHowever if you need for example to create `UpRooted::Schema` from Red ORM, make `UpRooted::Writer` to save set of CSV files or even implement whole `UpRooted` stack to work with MS SQL - do not be afraid to implement it. Interfaces are simple, well documented and checking existing MySQL / PostgreSQL code will give you the idea how little is actually needed to extend `UpRooted` capabilities.\n\n## SCHEMA DESIGN ISSUES\n\nRelational databases are extremly flexible in terms of schema design.\nHowever there are few rules you must follow if you want to work with data trees.\n\nIf you use `UpRooted` but do not get expected results please go through this list carefully before creating issue on GitHub.\n\n### Data is transformed (extra rows, counters are too high) when written to another database\n\n**Cause:** There are `ON INSERT` triggers changing data.\n\nThis one is very easy to overlook.\nTriggers are convenient, relaible and cheap way of managing entangled data state.\nFor example when row is inserted into `orders` table increase counter in `order_stats` table by `1`.\n\nBut those triggers will get in a way of copying / moving data trees literally between databases, because they will \"replay\" their logic when data tree is written to another database.\n\n**Fix:** When designing schema with data tree reading / writing in mind triggers can be only used to verify constraints.\n\n### Data tree is not writable to another database without disabling foreign key constraints\n\n**Cause 1:** There are self-looped tables. Usually implementing tree logic.\n\nBy default `UpRooted` resolves dependencies and whole purpose of `UpRooted::Tree` is to provide data in correct order. However it may be not possible if table is directly referencing itself.\n\n```\n    +----------+\n    | users    |\n    +----------+\n    | id       |----------------------+\n    | login    |                      |\n    | password |                      |\n    +----------+                      |\n                                      |\n              +--------------------+  |\n              | albums             |  |\n              +--------------------+  |\n          +---| id                 |  |\n          |   | user_id            |\u003e-+\n          +--\u003c| parent_album_id    |\n              | name               |\n              +--------------------+\n```\n\nFor example user has album with `id = 2` as subcategory of album with `id = 1`. Then he rearranges his collection, so that the album with `id = 2` is on top. In such scenario if database returned data tree rows in primary key order then it will not be possible to insert album with `id = 1` because it requires presence of album with `id = 2`.\n\n**Hint 1:**\n\nYou can check if `UpRooted::Tree` has loops by running:\n\n```raku\nmy $tree = ...;\nsay $tree.paths.grep: *.is-looped;\n```\n\n**Fix 1:** Have separate table that establishes tree hierarchy between rows:\n\n```\n    +----------+\n    | users    |\n    +----------+\n    | id       |----------------------+\n    | login    |                      |\n    | password |                      |\n    +----------+                      |\n                                      |\n              +--------------------+  |\n              | albums             |  |\n              +--------------------+  |\n      +-+=====| id                 |  |\n      | |     | user_id            |\u003e-+\n      | |     | name               |\n      | |     +--------------------+\n      | |\n      | |    +------------------+\n      | |    | album_hierarchy  |\n      | |    +------------------+\n      | +---\u003c| parent_album_id  |\n      +-----\u003c| child_album_id   |\n             +------------------+ \n```\n\n**Cause 2:** Jailbreak. Data from one tree links to data from another tree. Usually implementing relations or activities between users.\n\n```\n      +----------+        +------------+\n      | users    |        | blog_posts |\n      +----------+        +------------+\n    +-| id       |---+    | id         |---+\n    | | login    |   +---\u003c| user_id    |   |\n    | | password |        | text       |   |\n    | +----------+        +------------+   |\n    |                                      |\n    |    +--------------+                  |\n    |    | comments     |                  |\n    |    +--------------+                  |\n    |    | blog_post_id |\u003e-----------------+\n    +---\u003c| user_id      |\n         | text         |\n         +--------------+\n```\n\nFor example user with `id = 1` created a blog post that was commented by user with `id = 2`. Now record from `comments` table has two owners, one direct (belongs to user with `id = 2`) and one indirect (belongs to post written by user with `id = `1`).\n\n**Fix 2:** Unfortunately the only way to detach two data trees in such case is to remove one of foreign key constraints.\n\n### Some rows from table are missing\n\n**Cause:** Ambiguity in correct way of reaching rows in table. Only multiple nullable relation paths to this table exist.\n\n```\n                  +----------+\n                  | users    |\n                  +----------+\n    +-------------| id       |-------------+\n    |             | login    |             |\n    |             | password |             |\n    |             +----------+             |\n    |                                      |\n    |  +-----------+        +-----------+  |\n    |  | time      |        | distance  |  |\n    |  +-----------+        +-----------+  |\n    |  | id        |--+  +--| id        |  |\n    +-\u003c| user_id   |  |  |  | user_id   |\u003e-+\n       | amount    |  |  |  | amount    |\n       +-----------+  |  |  +-----------+\n                      |  |\n                   (nullable)\n                      |  |\n                      |  |\n             +--------+  +---------+\n             |                     |\n             |   +-------------+   |\n             |   | parts       |   |\n             |   +-------------+   |\n             +--\u003c| time_id     |   |\n                 | distance_id |\u003e--+\n                 | name        |\n                 +-------------+\n```\n\nThis time our product is application that helps you with car maintenance schedule. Our users car has 4 tires that must be replaced after 10 years or 100000km and 4 spark plugs that must be replaced after 100000km. So 4 indistinguishable rows for tires are added to parts table (they reference both time and distance) and 4 indistinguishable rows are added for spark plugs (they reference only distance).\n\nNow to extract to shard we have to find which rows from parts table does he own. By following relations through time table we will get 4 tires. But because this path is nullable at some point we are not sure if we found all records. And indeed, by following relations through distance table we found 4 tires and 4 spark plugs. Since this path is also nullable at some point we are not sure if we found all records. So we must combine result from time and distance paths, which gives us... 8 tires and 4 spark plugs? Well, that looks wrong. Maybe let's group it by time and distance pair, which gives us... 1 tire and 1 spark plug? So depending how you combine indistinguishable rows from many nullable paths to get final row set, you may suffer either data duplication or data loss.\n\nTo understand this issue better consider two answers to question `How many legs does the horse have?`:\n\n* `Eight. Two front, two rear, two left and two right.` This answer is incorrect because each leg is counted multiple times from different nullable relations.\n* `Four. Those attached to it.` This answer is correct because each leg is counted exactly once through not nullable relation.\n\n**Hint:**\n\nYou can check if `UpRooted::Tree` has ambiguities by running:\n\n```raku\nmy $tree = ...;\nsay $tree.paths.grep: *.is-ambiguous;\n```\n\n**Fix:** Redesign schema so that at least one not nullable relations path leads to every table.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbbkr%2Fuprooted","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbbkr%2Fuprooted","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbbkr%2Fuprooted/lists"}