{"id":15454670,"url":"https://github.com/hooopo/shadow","last_synced_at":"2025-10-23T22:22:42.936Z","repository":{"id":139402335,"uuid":"232551787","full_name":"hooopo/shadow","owner":"hooopo","description":"shadow table.","archived":false,"fork":false,"pushed_at":"2020-01-10T08:18:19.000Z","size":12,"stargazers_count":12,"open_issues_count":0,"forks_count":1,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-28T09:44:13.980Z","etag":null,"topics":["audit","cdc","history","pit","point-in-time","shadow","temporal"],"latest_commit_sha":null,"homepage":null,"language":"PLpgSQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hooopo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-08T11:50:16.000Z","updated_at":"2021-04-28T03:47:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"d0c3e5e8-eaa2-41c9-b75a-045651454b89","html_url":"https://github.com/hooopo/shadow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hooopo%2Fshadow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hooopo%2Fshadow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hooopo%2Fshadow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hooopo%2Fshadow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hooopo","download_url":"https://codeload.github.com/hooopo/shadow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248966392,"owners_count":21190763,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audit","cdc","history","pit","point-in-time","shadow","temporal"],"created_at":"2024-10-01T22:04:44.946Z","updated_at":"2025-10-23T22:22:37.884Z","avatar_url":"https://github.com/hooopo.png","language":"PLpgSQL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Shadow Table for Postgres\n\n\u003ca name=\"mW3mg\"\u003e\u003c/a\u003e\n## Why\n有时候，我们需要查看一条记录在过去某个时间点的状态，也就是一个 [Slowly Changing Dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension) 问题。然而，大部分OLTP系统数据模型设计天然忽略了历史记录的保存，直接删掉或者更新掉。 但一些场景我们需要查看任意时间点记录的状态：\n\n- 审计或者安全需求\n- 实际业务需求，比如一个员工的薪资或者职位变更历史\n- 分析统计需求，数据仓库需要根据历史状态一些分析挖掘\n- 灾难恢复，开发人员上线了有bug的代码，错误的修改或删除了重要数据，需要恢复到正确状态\n\n\u003ca name=\"Cabg1\"\u003e\u003c/a\u003e\n## 现有实现\n\nRails 里有 [paranoia](https://github.com/radar/paranoia) 和 Audited 等插件可以解决上面提出的部分需求。但有几个问题：\n\n- paranoia这种软删插件，把 destroy 变成了 update，和其他需要hook after_destroy 的插件会冲突。\n- paranoia 和 audited 只 hook了应用层，只有针对单条model记录的操作才有效。如果开发人员写SQL来做一些操作就没有效果。\n- 除了应用层会有绕过model的SQL，现实场景开发人员或DBA也会直接在DB上执行一些语句更新数据，这种场景paranoia和audited也是无能为力。\n\n所以这个问题最佳的解决方案应该是从DB层解决。PG现有的解决方案有 [https://github.com/arkhipov/temporal_tables](https://github.com/arkhipov/temporal_tables) 等，但也存在一些问题。\n\n\u003ca name=\"wmc3x\"\u003e\u003c/a\u003e\n## 理想中的方案\n\n理想中的方案应该满足下面这些条件：\n\n1. 基于DB层，而非应用层，在任何场景下都不会漏掉数据\n1. 容易安装和使用，temporal_tables 不满足这一点，因为这东西依赖C扩展，在各种云服务环境下不能用\n1. 能够集成应用层信息，比如操作人，一些DB插件功能很全面，但不满足这一条，记录的只是DB层的操作账号，而非应用层的，对于Rails项目来说，其实都是同一个用户。\n1. 对UI和分析友好，一些方案把变更记录直接存在json里，使用起来其实需要很多额外的工作。比如 Rails 项目里，显示记录逻辑和显示历史变更逻辑难以复用。\n\n\u003ca name=\"mz4tD\"\u003e\u003c/a\u003e\n## Shadow Table with static copy\n\n[https://github.com/hooopo/shadow/blob/master/sql/shadow.sql](https://github.com/hooopo/shadow/blob/master/sql/shadow.sql)\n\n对于目标表 users，生成一个结构一致的shadow表 shadow.users，修改和更新直接回写到 users表上，把被修改前的值写入到 shadow.users 表里。并且记录 session_user, current_query, operation time, operation type等信息。下面演示一下：\n\n创建 users 表:\n```sql\ncreate database test_shadow;\n\\c test_shadow;\ncreate table users (id integer primary key, name varchar,  age integer default 20);\ninsert into users values (1, 'name1', 30);\ninsert into users values (2, 'name2', 35);\ninsert into users values (3, 'name3', 35);\n\nselect * from users;\n id | name  | age |             sys_period\n----+-------+-----+------------------------------------\n  1 | name1 |  30 | [\"2020-01-10 14:57:20.756974+08\",)\n  2 | name2 |  35 | [\"2020-01-10 14:57:20.756974+08\",)\n  3 | name3 |  35 | [\"2020-01-10 14:57:20.756974+08\",)\n                     \n```\n\n导入shadow.sql:\n\n```sql\n\\i ~/w/shadow/sql/shadow.sql\n\nselect shadow.setup('users', 'users');\n\n-- 实际执行过程，给 users 表添加 sys_period 字段\nINFO:  EXECUTE SQL: ALTER TABLE users\n    ADD COLUMN sys_period tstzrange NOT NULL DEFAULT tstzrange(current_timestamp, null);\n-- 静态copy users 的表结构，去掉约束，保留默认值\nINFO:  EXECUTE SQL: CREATE TABLE shadow.users (\n      LIKE users INCLUDING DEFAULTS EXCLUDING CONSTRAINTS EXCLUDING INDEXES INCLUDING COMMENTS\n    )\n-- 创建一个trigger，在insert or update or delete 过程前\nINFO:  EXECUTE SQL: CREATE TRIGGER zzz_users_shadow_trigger\n      BEFORE INSERT OR UPDATE OR DELETE ON users\n      FOR EACH ROW EXECUTE PROCEDURE shadow.versioning(\n        'sys_period', 'shadow.users', true\n      )\n```\n\n看下效果：\n\n```sql\nupdate users set name = 'hello' where id = 1;\ndelete from users where id = 2;\n\n-- users 表还是按照原来的逻辑，完全无影响，只是会记录sys_period\nselect * from users;\n id | name  | age |             sys_period\n----+-------+-----+------------------------------------\n  3 | name3 |  35 | [\"2020-01-10 14:57:20.756974+08\",)\n  1 | hello |  30 | [\"2020-01-10 14:59:11.48809+08\",)\n\n -- 上面一条删除一条更新语句之后，产生了两个历史记录，是被修改前的记录快照，并且有当时执行的sql语句。               \n select * from shadow.users;\n-[ RECORD 1 ]-------+------------------------------------------------------------------\nid                  | 1\nname                | name1\nage                 | 30\nsys_period          | [\"2020-01-10 14:57:20.756974+08\",\"2020-01-10 14:59:11.48809+08\")\nop                  | U\nop_query            | update users set name = 'hello' where id = 1;\ndb_session_user     | hooopo\napp_session_user_id | (null)\n-[ RECORD 2 ]-------+------------------------------------------------------------------\nid                  | 2\nname                | name2\nage                 | 35\nsys_period          | [\"2020-01-10 14:57:20.756974+08\",\"2020-01-10 14:59:28.137144+08\")\nop                  | D\nop_query            | delete from users where id = 2;\ndb_session_user     | hooopo\napp_session_user_id | (null)\n```\n\n这个简单的demo已经满足了上面提到的4个要求，唯一不足的地方是，创建历史表的时候使用的是静态复制了目标表的结构，目标表之后添加修改或者删除字段，需要开发者自己去维护 shadow 表的结构和目标表一致。一个解决办法是使用event trigger，PG在DDL语句也可以使用trigger，可以在修改目标表之后去刷新shadow表，但实际执行的DDL语句 pg_ddl_command 在非 C 扩展环境无法取得，所以这个方案在不使用 C 扩展的前提下就只能到这里了。\n\n\n\u003ca name=\"OA4xS\"\u003e\u003c/a\u003e\n## Shadow Table with json\n\n[https://github.com/hooopo/shadow/blob/master/sql/shadow_jsonb.sql](https://github.com/hooopo/shadow/blob/master/sql/shadow_jsonb.sql)\n\n如果不想处理shadow表和目标表的结构同步，可以使用json这种schemaless的结构来存储历史变更，甚至还可以避免去处理一些不兼容的类型修改等问题，比如把一个字段从char(10)改成了char(5)，第一种方案需要把已经存进去的长度大于5的从shadow表里移除，才能保证结构同步成功。\n\n下面演示json的效果：\n\n```sql\ncreate database test_shadow_jsonb;\n\\c test_shadow_jsonb;\ncreate table users (id integer primary key, name varchar,  age integer default 20);\ninsert into users values (1, 'name1', 30);\ninsert into users values (2, 'name2', 35);\ninsert into users values (3, 'name3', 35);\n\nselect * from users;\n id | name  | age |             sys_period\n----+-------+-----+------------------------------------\n  1 | name1 |  30 | [\"2020-01-10 14:57:20.756974+08\",)\n  2 | name2 |  35 | [\"2020-01-10 14:57:20.756974+08\",)\n  3 | name3 |  35 | [\"2020-01-10 14:57:20.756974+08\",)\n```\n\n导入shadow_jsonb.sql\n\n```sql\n\\i ~/w/shadow/sql/shadow_jsonb.sql\n\nselect shadow.setup_jsonb('users', 'users');\nINFO:  EXECUTE SQL: ALTER TABLE users\n    ADD COLUMN sys_period tstzrange NOT NULL DEFAULT tstzrange(current_timestamp, null);\nINFO:  EXECUTE SQL: CREATE TABLE shadow.users ()\n\n-- 需要目标表有一个主键，如果不是id可指定\nINFO:  EXECUTE SQL: CREATE TRIGGER zzz_users_shadow_trigger\n      BEFORE INSERT OR UPDATE OR DELETE ON users\n      FOR EACH ROW EXECUTE PROCEDURE shadow.versioning(\n        'sys_period', 'shadow.users', 'id', true\n      )\n setup_jsonb\n \n -- shadow.users 的结构\n \\d shadow.users\n                                                     Table \"shadow.users\"\n       Column        |       Type        | Collation | Nullable |                           Default\n---------------------+-------------------+-----------+----------+--------------------------------------------------------------\n id                  | character varying |           |          |\n shadow_data         | jsonb             |           |          | '{}'::jsonb\n op                  | character(1)      |           |          | 'U'::bpchar\n op_query            | character varying |           |          |\n db_session_user     | character varying |           |          |\n sys_period          | tstzrange         |           | not null | tstzrange(CURRENT_TIMESTAMP, NULL::timestamp with time zone)\n app_session_user_id | character varying |           |          |\nIndexes:\n    \"users_id_idx\" btree (id)\n```\n\n看一下效果：\n\n```sql\nupdate users set name = 'hello' where id = 1;\ndelete from users where id = 2;\n\nselect * from shadow.users;\n-[ RECORD 1 ]-------+------------------------------------------------------------------\nid                  | 1\nshadow_data         | {\"id\": 1, \"age\": 30, \"name\": \"name1\"}\nop                  | U\nop_query            | update users set name = 'hello' where id = 1;\ndb_session_user     | hooopo\nsys_period          | [\"2020-01-10 15:35:37.185797+08\",\"2020-01-10 15:40:29.22283+08\")\napp_session_user_id | (null)\n-[ RECORD 2 ]-------+------------------------------------------------------------------\nid                  | 2\nshadow_data         | {\"id\": 2, \"age\": 35, \"name\": \"name2\"}\nop                  | D\nop_query            | delete from users where id = 2;\ndb_session_user     | hooopo\nsys_period          | [\"2020-01-10 15:35:37.185797+08\",\"2020-01-10 15:40:30.265188+08\")\napp_session_user_id | (null)\n```\n\napp_session_user_id 字段是用来保存应用层的用户信息，比如 Rails 里的 current_user.id\n\n可以在 Rails before_action 里：\n\n```ruby\nActiveRecord::Base.connection.execute(\"select set_config('app.session_user_id', '#{current_user\u0026.id}', false);\")\n```\n\n相关文档：https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-SET-TABLE\n\n当然，这个方案不满足上面提到的第四条，因为users表和shadow.users表是不同结构的，对于显示来说，需要写两遍处理逻辑。一个可能的解决方案是，通过AR的attributes可以把shadow_data塞进去，模拟出和users model统一的接口，待尝试。\n\n\u003ca name=\"XfQuS\"\u003e\u003c/a\u003e\n## Shadow Table with updatable view\n\n这个方案操作起来挺复杂的，主要解决了第一种方案里复制结构带来的手工维护问题。还是基于方案1，既然静态复制结构需要维护，那么其实可以使用PG的表继承来产生一个和目标表完全一致的表结构。\n\n```sql\ncreate table shadow.users_v2(op char(1)) inherits (users);\n```\n\n但是带来一个新的问题：select * from users 的时候，shadow.users 里的数据也被查出来了，这个是继承的特性。\n\n如果只查父表，可以使用 only 关键词： select * from only users，这样查出来的就是只有 users表的数据。\n\n所以我们可以产生一个view：\n\n```sql\ncreate view only_users as (select * from only users)\n```\n\n```ruby\nclass User \u003c AR\n  self.table_name = 'only_users'\nend\n```\n\n更新呢？从PG 9.3开始，view是updatable的：[https://paquier.xyz/postgresql-2/postgres-9-3-feature-highlight-auto-updatable-views/](https://paquier.xyz/postgresql-2/postgres-9-3-feature-highlight-auto-updatable-views/)\u003cbr /\u003e但有限制：\n\n\u003e - The view must have exactly one entry in its FROM list, which must be a table or another updatable view.\n\u003e - The view definition must not contain WITH, DISTINCT, GROUP BY, HAVING, LIMIT, or OFFSET clauses at the top level.\n\u003e - The view definition must not contain set operations (UNION, INTERSECT or EXCEPT) at the top level.\n\u003e - The view’s select list must not contain any aggregates, window functions, or set-returning functions.\n\n\n上面的shadow.users_v2满足这些条件，所以更新问题也解决了。唯一的不足是，开发者还是可以绕过AR的定义去直接拼SQL写成 select * from users。\n\n所以，各个方案都不是那么完美，如果你的表结构很稳定，你可以选择方案1，如果你不关心view层展示，你可以选择方案2，如果你乐于踩坑，可以试试方案3。\n\n虽然还不是很完美，但替代 [paranoia](https://github.com/radar/paranoia) + Audited 还是挺不错的。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhooopo%2Fshadow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhooopo%2Fshadow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhooopo%2Fshadow/lists"}