{"id":13707087,"url":"https://github.com/alextanhongpin/database-design","last_synced_at":"2025-05-16T00:00:44.285Z","repository":{"id":41050529,"uuid":"159698051","full_name":"alextanhongpin/database-design","owner":"alextanhongpin","description":"Ideas on better database design","archived":false,"fork":false,"pushed_at":"2024-11-12T17:29:57.000Z","size":634,"stargazers_count":472,"open_issues_count":0,"forks_count":42,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-04-01T09:28:39.565Z","etag":null,"topics":["database","design","golang","mysql","nodejs"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alextanhongpin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-29T16:45:31.000Z","updated_at":"2025-03-31T10:08:15.000Z","dependencies_parsed_at":"2023-10-11T03:45:27.420Z","dependency_job_id":"7c9bcf5e-c33c-49e5-a4e7-be7ae446bcaa","html_url":"https://github.com/alextanhongpin/database-design","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alextanhongpin%2Fdatabase-design","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alextanhongpin%2Fdatabase-design/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alextanhongpin%2Fdatabase-design/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alextanhongpin%2Fdatabase-design/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alextanhongpin","download_url":"https://codeload.github.com/alextanhongpin/database-design/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247824406,"owners_count":21002267,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","design","golang","mysql","nodejs"],"created_at":"2024-08-02T22:01:18.409Z","updated_at":"2025-04-08T10:33:26.482Z","avatar_url":"https://github.com/alextanhongpin.png","language":"Go","funding_links":["https://buymeacoffee.com/alextan2205"],"categories":["Go","Schema"],"sub_categories":["Design"],"readme":"# Database Design\n\nI'm trying to spend more time to organize and write higher quality content. Would be great if you could support me!\n\n[![\"Buy Me A Coffee\"](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://buymeacoffee.com/alextan2205)\n\n\n\n## Posgres 15\n\n- Merge command\n- public schema no longer recommended, namespace instead?\n- use BRIN index for created at column?\n\n## Postgres 14 in 2023\n\n- use `identity` columns instead of `serial`\n\n- use `generated columns` instead of triggers if you need a column with computed values within the same table\n\n- `tstzrange` and `tstzrmultirange` becomes a superpower for time-based applications (appointments, scheduling, temporal, slowly-changing-dimensions etc)\n\n- array/jsonb defies the 1NF but leads to evolution in db schema\n\n- triggers are not evil - it's a superpower to those who masters it\n\n- embedding custom domain and types should be a norm and preferred over jsonb (except for highly unstructured data)\n\n## About\n\nUseful tips for designing a robust database schema. This guide is more of a decision reference (1) for people wanting to design better database schemas for startups (2).\n\n- Decision reference: You will probably come across any one of the problems below when designing database schema (e.g. should I use `JSONB`? How do I design `tagging` schema? How do I keep historical records?) and be presented with different options and trade-offs. If two standards are equal (should table name be singular or plural), then it's up to you to pick one and make it a standard and keeping it consistent. Rather than giving you a `it depends` answer, this guide is meant to share the `what-ifs`, that is the decision I made, and the outcome. Yes, the codes are actually being written and used in different applications I wrote. You don't have to agree with the approaches - I used to take some of the more complex approaches (for the sake of _best practice_), but over time I realised that it is unnecessary. Simple is best.\n- Startups: A lot of startups starts by using ORMs or frameworks that provides a lot of convenience when it comes to dealing with database. If you are thinking _hey, we need to be agile, that's why we are okay introducing some technical debts_ let me tell you something, you can be both fast and produce quality work (not compromising database design). Most of the time, when new features are being introduced, or when new situation aroses (e.g. analytics are not accurate, because we did not store timestamp with timezones in database, using integer id instead of uuid, and people start _hacking_ your system, etc), the application just won't cut it anymore. The last thing I want to live with is a poorly designed database schema. You can switch language and frameworks for your frontend/backend servers, but if you design your database wrongly, you have to live with it. Also, I mentioned `startups` because my knowledge on database design is limited to a user scale of `\u003c5m`. Also, the approach taken by larger companies might vary, and the technology used might differ as they are focusing a lot on performance, reliability and running databases across the world. For this guide, we are talking specifically about `mysql` and `postgres`, and probably just running them in cloud providers like AWS/Google Cloud.\n\n## HELP ME IMPROVE THIS GUIDE :)\n\nThere are few things you could help me with:\n\n- provide feedback (is the topic relevant? did you find another way to approach it? or did you find some edge cases not covered)\n- give me opportunity to work on something 🙃\n- help me with writing (I don't have a structured way of writing 😞, and I want to improve on it)\n\n# principles\n\n- use singular noun\n- ~~use uuid if possible~~\n- use ordered UUID v1 (only v1 can be ordered sequentially), stored as BINARY(16) when the data is dynamically generated. For reference table, stick to auto-incremented primary keys since the values are static and won't change that much either. If the number of items can be less than [a-z0-9], then use char(1) as primary key, since they can be more verbose than just int (`m` for male, `f` for female, `o` for others etc) \n- some issues with auto-incremented id is that it needs to be converted to the correct type (int64/uint64) in the application to prevent users from submitting string alphabets. The same complexity lies with UUID, which needs validation too. This is only necessary if we want to avoid the call to the db. Casting the type to `int8` means only up to `127` ids are supported!\n- use soft delete\n- no null fields, except date (why? When using a strongly typed language as the database client, dealing with null (or nil pointer) is a pain. It is easier to go with sane default values. Also, when you start working with reporting tools, there's some additional logic or edge cases with database `NULL` that you need to handle)\n\n# Notes\n\n- inner joins are faster than subqueries most of the time\n- apply logic in the database if you are going to have many different applications\n- use optimized uuid for faster querying\n- include the default auto incremented id\n- PostgreSQL automatically creates indexes on primary keys and unique constraints, but not on the referencing side of foreign key relationships.\n- put shared logic in template databases - they are like your `common` folders\n- (postgres) don't use `serial`, use `generated as always identity` for primary keys postgres if non-uuid keys are required \n- (postgres) use `text`, for postgres, `text` and `varchar` has no performance difference unlink `mysql` (which don't even has `text`)\n- for reference table, use the naming convention `entity_type`, e.g. notification_type, role_type, and use identity keys\n- you can use custom function as default keys, this is useful when require insert into a different table as foreign keys (e.g. party relationship)\n- the equivalent of `api` is schema `views`\n- use `schema` to split migrations and functionality, e.g. `auth` schema contains all auth related operations\n- use `extra/other` column naming for jsonb\n- use `valid_through` for tstzrange\n- use `\u003centities\u003e_count` for counts\n- use `id`, `created_at`, `updated_at`, `deleted_at` \n- be careful when using auto incremented ids. Some application have slug for username, and if there is no checking on the username, and the user used an integer id as a name, then the query will always resolve wrongly\n- (postgres) adding new column to existing table will always end up in the last position. In Mysql, you can specify to place it before or after an column.\n\n## Search Path\n\n```sql\nSET search_path=onetsoc,public;\nSHOW search_path;\nSET search_path TO default;\n```\n\n## Styleguides\n\nYes, there are styleguide (and perhaps linters) for everything.\n\nhttps://www.sqlstyle.guide/\n\n# Migration file naming convention\n\nReference the naming convention from active record:\n\nhttps://edgeguides.rubyonrails.org/active_record_migrations.html#using-the-change-method\n\n## Useful Statements\n\n```sql\n-- Sets a default created date\ncreated_at datetime     NOT NULL DEFAULT CURRENT_TIMESTAMP\n\n-- Sets a default updated date, and updates it whenever a row is updated\nupdated_at datetime     NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP\n\n-- Adds a new constraint that checks if the first user id is smaller than the second.\nCONSTRAINT check_one_way CHECK (user_id1 \u003c user_id2)\n\n-- Adds a constraint that checks if the combination of both columns is unique.\nCONSTRAINT uq_user_id_1_user_id_2 UNIQUE (user_id1, user_id2)\n\n-- Sets a foreign key constraint, and updates the foreign key when the primary key changes, or delete the row when it is deleted.\nFOREIGN KEY (user_id1) REFERENCES user (id) ON UPDATE CASCADE ON DELETE CASCADE\n\n-- Sets a foreign key constraint, and updates the foreign key when the primary key changes, or set the foreign key to null when it is deleted.\nFOREIGN KEY (relationship) REFERENCES ref_relationship (status) ON UPDATE CASCADE ON DELETE SET NULL\n\n-- Create a composite primary key from two columns.\nPRIMARY KEY (user_id1, user_id2)\n```\n\nSorting:\n\n```sql\n-- Sorting integer string in the correct order\nSELECT * FROM \u003ctable\u003e ORDER BY CAST(\u003ccolumn\u003e AS unsigned)\n```\n\n## Data Type: IPV4 and IPV6\n\nYou have two possibilities (for an IPv4 address) :\n\n- a `varchar(15)`, if your want to store the IP address as a string. e.g. `192.128.0.15` for instance\n- an `integer (4 bytes)`, if you convert the IP address to an integer. e.g. `3229614095` for the IP I used before\n\n```sql\n`ipv4` INT UNSIGNED\nINSERT INTO `table` (`ipv4`) VALUES (INET_ATON(\"127.0.0.1\"));\nSELECT INET_NTOA(`ipv4`) FROM `table`;\n```\n\n```sql\n`ipv6` VARBINARY(16)\nINSERT INTO `table` (`ipv6`) VALUES (INET6_ATON(\"127.0.0.1\"));\nSELECT INET6_NTOA(`ipv6`) FROM `table`;\n```\n\nTo use a single column for both IPV4 and IPV6:\n\n```sql\nCREATE TABLE `sensor` (\n  `ip` varbinary(16) NOT NULL DEFAULT '0x'\n)\n-- Insert IPv6.\ninsert into sensor (ip) values (INET6_ATON(\"2001:0db8:85a3:0000:0000:8a2e:0370:7334\"));\n\n-- Insert IPv4.\ninsert into sensor (ip) values (INET6_ATON(\"255.255.255.0\"));\n\nselect INET6_NTOA(ip) from sensor;\n+------------------------------+\n| INET6_NTOA(ip)               |\n+------------------------------+\n| 2001:db8:85a3::8a2e:370:7334 |\n| 255.255.255.0                |\n+------------------------------+\n2 rows in set (0.00 sec)\n```\n\nSample query from `ipnation`:\n\n```mysql\nSELECT c.country \nFROM ip2nationCountries c, ip2nation i \nWHERE i.ip \u003c INET_ATON('your_ip_address') \nAND c.code = i.country \nORDER BY i.ip DESC \nLIMIT 0,1;\n```\n\nReferences:\n\n- http://www.ip2nation.com/\n\n## Data Type: Country\n\nAl Jumahiriyah al Arabiyah al Libiyah ash Shabiyah al Ishtirakiyah al Uzma also known as Libya is the world's longest country name at 74 characters with spaces and 63 characters without.\n\n```sql\ncountry varchar(74) NOT NULL DEFAULT ''\n```\n\n## Data Type: Address\n\n```sql\naddress_line_1 VARCHAR(255) NOT NULL DEFAULT '',\naddress_line_2 VARCHAR(255) NOT NULL DEFAULT '',\n\n-- Longest city name: Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch (58 chars.)\ncity VARCHAR(58) NOT NULL DEFAULT '',\n\n-- Longest state name: The State of Rhode Island and Providence Plantations (52 chars.)\nstate VARCHAR(56) NOT NULL DEFAULT '',\n-- There is no maximum size for a postcode. Currently, the longest postal code is 10 char. Iran has 10 diguts and the US have 4 and 5 seperated by a hyphen. Brazil is 9 and Canada is 7, I think.\npostal_code VARCHAR(16) NOT NULL DEFAULT '',\ncountry VARCHAR(74) NOT NULL DEFAULT '',\n```\n\nAlternative is this, based on [OpenID AddressClaim](https://openid.net/specs/openid-connect-core-1_0.html#AddressClaim)\n\n```sql\nstreet_address VARCHAR(255) NOT NULL DEFAULT '' COMMENT 'Full street address component, which MAY include house number, street name, Post Office Box, and multi-line extended street address information. This field MAY contain multiple lines, separated by newlines. Newlines can be represented either as a carriage return/line feed pair (\"\\r\\n\") or as a single line feed character (\"\\n\").'\nlocality VARCHAR(58) NOT NULL DEFAULT '' COMMENT 'City or locality component.'\nregion VARCHAR(56) NOT NULL DEFAULT '' COMMENT 'State, province, prefecture or region component.'\npostal_code VARCHAR(16) NOT NULL DEFAULT '' COMMENT 'Zip code or postal code component.'\ncountry VARCHAR(74) NOT NULL DEFAULT '' COMMENT 'Country name component.'\n-- latitude (See below)\n-- longitude\n```\n\nReferences:\n\n- postal code length https://stackoverflow.com/questions/325041/i-need-to-store-postal-codes-in-a-database-how-big-should-the-column-be\n\n## Data Type: URL\n\n```sql\nurl varchar(2083) NOT NULL DEFAULT '';\n```\n\nChecking can be done at the application side. If the length exceeded that of 2083, just warn the client or suggest them to use a url shortener.\n\nReferences: https://stackoverflow.com/questions/219569/best-database-field-type-for-a-url\n\n## Data Type: Email\n\nhttps://stackoverflow.com/questions/8242567/acceptable-field-type-and-size-for-email-address\n\n```sql\nemail VARCHAR(255) NOT NULL UNIQUE\n```\n\nAlos, consider using citext for email, as it should be case insensitive.\n\n## Data Type: Geolocation\n\nWith `MySQL \u003c8.0`:\n\n```sql\nCREATE TABLE locations (\n  lat DECIMAL(10,8) NOT NULL, \n  lng DECIMAL(11,8) NOT NULL\n);\n```\n\nwith `MySQL \u003e8.0`:\n\n```sql\nCREATE TABLE locations (\n    location POINT SRID 4326 NOT NULL,\n    SPATIAL INDEX (location)\n);\n```\n\nTo insert:\n\n```sql\nINSERT INTO locations (location) VALUES (ST_PointFromText('Point(1 1)', 4326));\n```\n\nTo select:\n\n```sql\nSELECT ST_AsText(location) FROM locations\n```\n\nReferences:\n\n- https://medium.com/maatwebsite/the-best-way-to-locate-in-mysql-8-e47a59892443\n\n## Data Type: TZ\n\nMax length of 32, longest is `America/Argentina/ComodRivadavia`:\n\n```sql\nzoneinfo VARCHAR(32) COMMENT \"String from zoneinfo [zoneinfo] time zone database representing the End-User's time zone. For example, Europe/Paris or America/Los_Angeles\"\n```\n\nReferences: \n\n- https://stackoverflow.com/questions/12546312/max-length-of-tzname-field-timezone-identifier-name\n- https://www.iana.org/time-zones\n\n## Data Type: Locale\n\nBCP47/RFC5646 section 4.4.1 recommends a 35 characters tag length:\n\n```sql\nlocale VARCHAR(35) NOT NULL DEFAULT '' COMMENT \"End-User's locale, represented as a BCP47 [RFC5646] language tag. This is typically an ISO 639-1 Alpha-2 [ISO639?1] language code in lowercase and an ISO 3166-1 Alpha-2 [ISO3166?1] country code in uppercase, separated by a dash. For example, en-US or fr-CA. As a compatibility note, some implementations have used an underscore as the separator rather than a dash, for example, en_US\",\n```\n\nReferences:\n\n- https://stackoverflow.com/questions/17848070/what-data-type-should-i-use-for-ietf-language-codes\n- https://openid.net/specs/openid-connect-core-1_0.html#zoneinfo\n\n## Data Type: Phone number\n\nPhone numbers are usually stored as E.164.\n\n```sql\nphone_number VARCHAR(32) NOT NULL DEFAULT '',\nphone_number_verified BOOLEAN NOT NULL DEFAULT 0,\n```\n\nTL;DR, don't store phone number as bigint, as trailing zeros will break it.\nReferences:\n\n- https://en.wikipedia.org/wiki/Telephone_numbering_plan\n- https://boards.straightdope.com/sdmb/showthread.php?t=417024\n  https://stackoverflow.com/questions/723587/whats-the-longest-possible-worldwide-phone-number-i-should-consider-in-sql-varc\n- [Google: Falsehoods Programmers Believe About Phone Numbers](https://github.com/google/libphonenumber/blob/master/FALSEHOODS.md)\n- https://dba.stackexchange.com/questions/164796/how-do-i-store-phone-numbers-in-postgresql\n- https://www.mayerdan.com/programming/2017/06/26/db_phone_types\n- [Twillio: What is E.164?](https://www.google.com/search?q=twillio+e164\u0026oq=twillio+e164\u0026aqs=chrome..69i57j0i13.3516j0j4\u0026sourceid=chrome\u0026ie=UTF-8)\n\n## Data Type: Name\n\nLongest name (225 characters)\n\n```\nBarnaby Marmaduke Aloysius Benjy Cobweb Dartagnan Egbert Felix Gaspar Humbert Ignatius Jayden Kasper Leroy Maximilian Neddy Obiajulu Pepin Quilliam Rosencrantz Sexton Teddy Upwood Vivatma Wayland Xylon Yardley Zachary Usansky\n```\n\nReferences:\n\n- http://www.worldrecordacademy.com/society/longest_name_Barnaby_Marmaduke_sets_world_record_112063.html\n\n## Data Type: Gender\n\nColumn naming can be `sex`, or alternatively `gender`:\n\n```sql\n-- Probably the best bet, but needs to be validated. When in doubt, use this.\nsex char(1) \ninsert into table (gender) values (IF(LEFT(?,1) in ('m', 'f', 'x', 'o'), LOWER(LEFT(?,1)), ''));\n-- We can also just take the first character of the string with the left function.\ninsert into table(gender) values (LEFT('female', 1));\n\n-- With enum. Allows only 'm', 'f', 'M', or 'F'. Don't use enum - it will rebuild the whole database when we update it.\nsex enum('m','f') DEFAULT 'm' \n\n-- With set.\nsex set('m', 'f') // Allows '', 'm', 'M', 'f', 'F', or 'm,f'\n```\n\nReferences:\n\n- http://komlenic.com/244/8-reasons-why-mysqls-enum-data-type-is-evil/\n- http://download.nust.na/pub6/mysql/tech-resources/articles/mysql-set-datatype.html\n- https://ocelot.ca/blog/blog/2013/09/16/representing-sex-in-databases/\n\nNote: We could have used check constraint, but it is ignored by MySQL.\n\n## Data Type: Currency\n\ncompliant with Generally Accepted Accounting Principles (GAAP):\n\n```sql\ncurrency DECIMAL(13,4)\n```\n\nFor percentage:\n\n```sql\n-- For your case (0.00% to 100.00%) you'd want decimal(5,4).\ngst DECIMAL(5, 4)\n\n-- For the most common case (0% to 100%) you'd want decimal(3,2).\ndiscount DECIMAL(3, 2)\n```\n\n## Data Type: Stock Ticker\n\nTickers on the NYSE range from one to five characters long, with those of length five typically being used for mutual funds and ETFs (VFIAX is the symbol for Vanguard 500 index).\n\n```sql\nsymbol VARCHAR(5)\n```\n\n## Marital Status\n\n| Code | Description  | Definition                                                                                                                                              |\n| ---- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| 1    | Single       | This refers to a person who has never been married.                                                                                                     |\n| 2    | Married      | This refers to a person who is recognised as married under the marriage laws in Singapore. It includes a person who has remarried.                      |\n| 3    | Widowed      | This refers to a person whose spouse(s) is/are deceased and who has not remarried.                                                                      |\n| 4    | Separated    | This refers to a person who has been legally separated or estranged from his/her spouse(s) and who has not remarried.                                   |\n| 5    | Divorced     | This refers to a person whose marriage(s) has/have been legally dissolved and who has not remarried.                                                    |\n| x    | Not Reported | This includes instances where the marital status is unknown, not reported or where there is no/insufficient information available on the marital status |\n\n```sql\nmarital_status ENUM('single', 'married', 'widowed', 'separated', 'divorced', 'not reported');\n```\n\nReferences:\n\n- https://www.singstat.gov.sg/-/media/files/standards_and_classifications/scms.pdf\n\n## References\n\n- Gender X: https://www.lifesitenews.com/news/generation-x-germany-to-allow-third-blank-gender-for-birth-certificates\n\n## One-to-One Relationship\n\nExample of 1-to-1 relationship between `user` and `preference` table:\n\n```\nCREATE TABLE IF NOT EXISTS user (\n  name VARCHAR(255),\n  id INT UNSIGNED AUTO_INCREMENT,\n  PRIMARY KEY (id)\n);\n\nCREATE TABLE IF NOT EXISTS preference (\n  user_id INT UNSIGNED AUTO_INCREMENT,\n  interest TEXT,\n  -- ...other fields\n  PRIMARY KEY (id),\n  FOREIGN KEY (id) REFERENCES user(id)\n);\n```\n\n## Country Table\n\nhttps://www.ip2location.com/free/country-multilingual\n\n## Issues\n\nThere was a scenario where User A is logged in User B account (bug), the reason is very simple. User A previous user `id` is 2, and the JWT token is not expired. When the db was cleared, and User B recreated two new users, which has user id `1` and `2`. So now, the JWT token only store the user id `2`, when User A calls the API, since the previous id is `2`, User A can view User B profile. Pitfalls of integer id.\n\n## Thoughts\n\n- Should I create a different table for user profile and password? No.\n  - https://stackoverflow.com/questions/17683571/should-i-create-2-tables-first-for-usernames-and-passwords-and-other-for-user\n  - https://www.quora.com/Should-we-keep-the-user-name-and-password-in-the-same-table-where-the-other-personal-information-is\n  - https://dba.stackexchange.com/questions/148909/is-it-a-good-practice-to-isolate-login-information-username-password-in-a-sep\n\n## Useful features for postgres\n\n- lateral join\n- sum conditional\n- filter option\n- RANK, DENSE_RANK, ROW_NUMBER\n- partial index\n\n```sql\nselect id, name, \n    review_count_rank,\n    recent_review_count_rank,\n    rating_rank,\n    word_count_rank,\n    review_count_rank + recent_review_count_rank + rating_rank + word_count_rank as total\nfrom (\n    select \n        product_items.id as id,\n        product_items.name,\n\n    --    product_items.cached_reviews_count,\n        DENSE_RANK() OVER (order by product_items.cached_reviews_count desc) review_count_rank,\n\n    --    COALESCE(tmp.review_count, 0) as recent_review_count,\n        DENSE_RANK() OVER (order by COALESCE(tmp.review_count, 0) desc) recent_review_count_rank,\n\n    --    product_items.cached_rating,\n        DENSE_RANK() OVER (order by product_items.cached_rating desc) rating_rank,\n\n    --    COALESCE(tmp.word_count, 0) as word_count,\n        DENSE_RANK() OVER (order by COALESCE(tmp.word_count,0) desc) word_count_rank\n    from product_items \n    left join (\n        select \n            pir.item_id as item_id, \n            count(*) as review_count,\n            sum(array_length(regexp_split_to_array(pir.text, '\\s'),1)) as word_count\n        from product_items pi\n        left join product_item_reviews pir\n            on (pi.id = pir.item_id)\n        where pir.deleted_at is null\n            and pi.deleted_at is null\n            and pir.created_at \u003e= current_timestamp - interval '30 day'\n        group by pir.item_id\n    ) tmp on (tmp.item_id = product_items.id)\n    where category_id = 6\n        and product_items.deleted_at is null\n    order by review_count_rank, \n        rating_rank,\n        recent_review_count_rank,\n        word_count_rank\n    ) tmp\norder by total;\n```\n\n## Null\n\nAdvantages of null fields (or when to use null):\n\n- we can use null field with unique values, so that empty strings will not be counted (they are considered unique)\n- It depends on the domain you are working on. NULL means absence of value (i.e. there is no value), while empty string means there is a string value of zero length.\n\n## Ways to sort array alphabetically in postgres.\n\nThis is one interesting problem that I faced when designing a friendship table - I need to create two rows with both the user id (user_id, friend_id) pair. However, querying becomes complex, as now I querying for the pair requires a union (and indices on both side). One way to solve it is to create another column that is the hash of both ids, sorted. The idea is to create a trigger that will sort both ids, hash them as md5, and store it in another column.\n\n```sql\nselect (select array(select unnest (ARRAY[user_id, friend_id]) as x ORDER BY x)  as j) from relationship;\nselect md5(array_to_string(array_agg(id), '')) \nfrom (\n    select * \n    from (values ('6769d922-ac68-11ea-8c70-9b8806d7aa41'), ('6769d922-ac68-11ea-8c70-9b8806d7aa41')) \n    as f(id) \n    order by f\n) tmp;\n```\n\nAlternative way:\n\n```sql\nselect \n    MD5(row(\n        case \n            when user_id \u003c friend_id \n            then (user_id, friend_id)\n            else (friend_id, user_id)\n        end\n    )::text)\nfrom (\n    values \n    ('7d7849d0-b94f-11ea-92be-43016fd48059', '8175c79c-b94f-11ea-92be-ab6d21fe7fb3'),\n    ('8175c79c-b94f-11ea-92be-ab6d21fe7fb3', '7d7849d0-b94f-11ea-92be-43016fd48059')\n) as f(user_id, friend_id);\n```\n\n## Finding missing index on foreign keys:\n\nhttps://stackoverflow.com/questions/970562/postgres-and-indexes-on-foreign-keys-and-primary-keys\n\n## Using Identity Column (Postgres)\n\n^ All postgres related topics should be tagged.\n\nIdentity column is the recommended approach over serial.\n\n```diff\nCREATE TABLE IF NOT EXISTS world (\n-    id serial PRIMARY KEY,\n+    id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n    name text\n);\n```\n\nOne advantage is we can't directly override the id:\n\n```sql\nINSERT INTO world (name) VALUES('will produce id 1');\nINSERT INTO world (id, name) OVERRIDING SYSTEM VALUE VALUES(10, 'will produce id 10');\nINSERT INTO world (name) VALUES('will produce id 2');\n```\n\n## Using custom function as default key (Postgres)\n\nWe can actually use custom functions to generate the default key in Postgres. The example below shows an example of `party` and `organization` table.\n\n- we always have to create a party first before creating a `person` or `organization`, and the reference the id\n- this can be simplified by using a function\n\n```sql\nCREATE TABLE IF NOT EXISTS party(\n    id uuid PRIMARY KEY DEFAULT gen_random_uuid(),\n    type text not null check (type in ('person', 'organization'))\n)\n\nCREATE OR REPLACE FUNCTION gen_party_id(_type text) \nRETURNS uuid AS $$\n    INSERT INTO party (type) VALUES (_type)\n    RETURNING id;\n$$ LANGUAGE SQL VOLATILE;\n\nCREATE TABLE IF NOT EXISTS organization (\n    id uuid PRIMARY KEY NOT NULL DEFAULT gen_party_id('organization'),\n    type text NOT NULL DEFAULT 'organization' CHECK (type = 'organization'),\n    name text,\n    foreign key (id, type) references party(id, type)\n);\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falextanhongpin%2Fdatabase-design","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falextanhongpin%2Fdatabase-design","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falextanhongpin%2Fdatabase-design/lists"}