{"id":23327780,"url":"https://github.com/ben-nour/sql-tips-and-tricks","last_synced_at":"2025-05-14T09:08:36.474Z","repository":{"id":257808559,"uuid":"859686186","full_name":"ben-nour/SQL-tips-and-tricks","owner":"ben-nour","description":"SQL tips and tricks","archived":false,"fork":false,"pushed_at":"2025-03-28T05:25:16.000Z","size":137,"stargazers_count":2181,"open_issues_count":0,"forks_count":91,"subscribers_count":31,"default_branch":"main","last_synced_at":"2025-04-13T00:39:10.487Z","etag":null,"topics":["mysql","snowflake","sql","sqlite","tips","tips-and-tricks"],"latest_commit_sha":null,"homepage":"","language":"SQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ben-nour.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-19T05:29:04.000Z","updated_at":"2025-04-08T06:49:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"cc791ad6-ed95-4081-a5e9-574f56042146","html_url":"https://github.com/ben-nour/SQL-tips-and-tricks","commit_stats":null,"previous_names":["ben-n93/sql-tips-and-tricks","ben-nour/sql-tips-and-tricks"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ben-nour%2FSQL-tips-and-tricks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ben-nour%2FSQL-tips-and-tricks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ben-nour%2FSQL-tips-and-tricks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ben-nour%2FSQL-tips-and-tricks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ben-nour","download_url":"https://codeload.github.com/ben-nour/SQL-tips-and-tricks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248650414,"owners_count":21139671,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mysql","snowflake","sql","sqlite","tips","tips-and-tricks"],"created_at":"2024-12-20T20:23:39.066Z","updated_at":"2025-05-14T09:08:36.459Z","avatar_url":"https://github.com/ben-nour.png","language":"SQL","readme":"# SQL tips and tricks\n\n[![Stand With Ukraine](https://raw.githubusercontent.com/vshymanskyy/StandWithUkraine/main/badges/StandWithUkraine.svg)](https://stand-with-ukraine.pp.ua)\n\n[![Ceasefire Now](https://badge.techforpalestine.org/default)](https://techforpalestine.org/learn-more)\n\nA (somewhat opinionated) list of SQL tips and tricks that I've picked up over the years.\n\nThere's so much you can you do with SQL but I've focused on what I find most useful in my day-to-day work as a data analyst and what \nI wish I had known when I first started writing SQL.\n\nPlease note that some of these tips might not be relevant for all RDBMs.\n\n## Table of contents\n\n### Formatting/readability\n\n1) [Use a leading comma to separate fields](#use-a-leading-comma-to-separate-fields)\n2) [Use a dummy value in the WHERE clause](#use-a-dummy-value-in-the-where-clause)\n3) [Indent your code](#indent-your-code)\n4) [Consider CTEs when writing complex queries](#consider-ctes-when-writing-complex-queries)\n\n### Useful features\n5) [Anti-joins will return rows from one table that have no match in another table](#anti-joins-will-return-rows-from-one-table-that-have-no-match-in-another-table)\n6) [`NOT EXISTS` is faster than `NOT IN` if your column allows `NULL`](#not-exists-is-faster-than-not-in-if-your-column-allows-null)\n7) [Use `QUALIFY` to filter window functions](#use-qualify-to-filter-window-functions)\n8) [You can (but shouldn't always) `GROUP BY` column position](#you-can-but-shouldnt-always-group-by-column-position)\n9) [You can create a grand total with `GROUP BY ROLLUP`](#you-can-create-a-grand-total-with-group-by-rollup)\n10) [Use `EXCEPT` to find the difference between two tables](#use-except-to-find-the-difference-between-two-tables)\n\n### Avoid pitfalls\n\n11) [Be aware of how `NOT IN` behaves with `NULL` values](#be-aware-of-how-not-in-behaves-with-null-values)\n12) [Avoid ambiguity when naming calculated fields](#avoid-ambiguity-when-naming-calculated-fields)\n13) [Implicit casting will slow down (or break) your query](#implicit-casting-will-slow-down-or-break-your-query)\n14) [Always specify which column belongs to which table](#always-specify-which-column-belongs-to-which-table)\n15) [Understand the order of execution](#understand-the-order-of-execution)\n16) [Comment your code!](#comment-your-code)\n17) [Read the documentation (in full)](#read-the-documentation-in-full)\n18) [Use descriptive names for your saved queries](#use-descriptive-names-for-your-saved-queries)\n\n\n## Formatting/readability\n### Use a leading comma to separate fields\n\nUse a leading comma to separate fields in the `SELECT` clause rather than a trailing comma.\n\n- Clearly defines that this is a new column vs code that's wrapped to multiple lines.\n\n- Visual cue to easily identify if the comma is missing or not. Varying line lengths makes it harder to determine.\n \n```SQL\nSELECT\nemployee_id\n, employee_name\n, job\n, salary\nFROM employees\n;\n```\n\n- Also use a leading `AND` in the `WHERE` clause, for the same reasons (following tip demonstrates this).\n\n-----\n\n### **Use a dummy value in the WHERE clause**\nUse a dummy value in the `WHERE` clause so you can easily comment out conditions when testing or tweaking a query.\n\n```SQL\n/*\nIf I want to comment out the job\ncondition the following query\nwill break:\n*/\nSELECT *\nFROM employees\nWHERE\n--job IN ('Clerk', 'Manager')\nAND dept_no != 5\n;\n\n/*\nWith a dummy value there's no issue.\nI can comment out all the conditions\nand 1=1 will ensure the query still runs:\n*/\nSELECT *\nFROM employees\nWHERE 1=1\n-- AND job IN ('Clerk', 'Manager')\nAND dept_no != 5\n;\n```\n\n-----\n\n### Indent your code\nIndent your code to make it more readable to colleagues and your future self.\n\nOpinions will vary on what this looks like so be sure to follow your company/team's guidelines or, if that doesn't exist, go with whatever works for you.\n\nYou can also use an online formatter like [poorsql](https://poorsql.com/) or a linter like [sqlfluff](https://github.com/sqlfluff/sqlfluff).\n\n``` SQL\nSELECT\n-- Bad:\nvc.video_id\n, CASE WHEN meta.GENRE IN ('Drama', 'Comedy') THEN 'Entertainment' ELSE meta.GENRE END as content_type\nFROM video_content AS vc\nINNER JOIN metadata ON vc.video_id = metadata.video_id\n;\n\n-- Good:\nSELECT \nvc.video_id\n, CASE \n\tWHEN meta.GENRE IN ('Drama', 'Comedy') THEN 'Entertainment' \n\tELSE meta.GENRE \nEND AS content_type\nFROM video_content\nINNER JOIN metadata \n\tON video_content.video_id = metadata.video_id\n;\n```\n-----\n\n### Consider CTEs when writing complex queries\nFor longer than I'd care to admit I would nest inline views, which would lead to\nqueries that were hard to understand, particularly if revisited after a few weeks.\n\nIf you find yourself nesting inline views more than 2 or 3 levels deep, \nconsider using common table expressions, which can help you keep your code more organised and readable.\n\n```SQL\n-- Using inline views:\nSELECT \nvhs.movie\n, vhs.vhs_revenue\n, cs.cinema_revenue\nFROM \n    (\n    SELECT\n    movie_id\n    , SUM(ticket_sales) AS cinema_revenue\n    FROM tickets\n    GROUP BY movie_id\n    ) AS cs\n    INNER JOIN \n        (\n        SELECT \n        movie\n        , movie_id\n        , SUM(revenue) AS vhs_revenue\n        FROM blockbuster\n        GROUP BY movie, movie_id\n        ) AS vhs\n        ON cs.movie_id = vhs.movie_id\n;\n\n-- Using CTEs:\nWITH cinema_sales AS \n    (\n        SELECT \n        movie_id\n        , SUM(ticket_sales) AS cinema_revenue\n        FROM tickets\n        GROUP BY movie_id\n    ),\n    vhs_sales AS\n    (\n        SELECT \n        movie\n        , movie_id\n        , SUM(revenue) AS vhs_revenue\n        FROM blockbuster\n        GROUP BY movie, movie_id\n    )\nSELECT \nvhs.movie\n, vhs.vhs_revenue\n, cs.cinema_revenue\nFROM cinema_sales AS cs\n    INNER JOIN vhs_sales AS vhs\n    ON cs.movie_id = vhs.movie_id\n;\n```\n\n## Useful features \n\n### Anti-joins will return rows from one table that have no match in another table\n\nUse anti-joins when you want to return rows from one table that don't have a match in another table.\n\nFor example, you only want video IDs of content that hasn't been archived.\n\nThere are multiple ways to do an anti-join:\n\n```SQL \n-- Using a LEFT JOIN:\nSELECT \nvc.video_id\nFROM video_content AS vc\n    LEFT JOIN archive\n    ON vc.video_id = archive.video_id\nWHERE 1=1\nAND archive.video_id IS NULL -- Any rows with no match will have a NULL value.\n;\n\n-- Using NOT IN/subquery:\nSELECT \nvideo_id\nFROM video_content\nWHERE 1=1\nAND video_id NOT IN (SELECT video_id FROM archive) -- Be mindful of NULL values.\n\n-- Using NOT EXISTS/correlated subquery:\nSELECT \nvideo_id\nFROM video_content AS vc\nWHERE 1=1\nAND NOT EXISTS (\n        SELECT 1\n        FROM archive AS a\n        WHERE a.video_id = vc.video_id\n        )\n\n```\n\nNote that I advise against using `NOT IN` - see the following tip.\n\n-----\n### `NOT EXISTS` is faster than `NOT IN` if your column allows `NULL`\n\n`NOT IN` is usually slower than using `NOT EXISTS`, if the values/column you're comparing against allows `NULL`.\n\nI've experienced this when using Snowflake and the PostgreSQL Wiki explicity [calls this out](https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN):\n\n*\"...NOT IN (SELECT ...) does not optimize very well.\"*\n\nAside from being slow, using `NOT IN` will not work as intended if there is a `NULL` in the values being compared against - see [tip 11](#be-aware-of-how-not-in-behaves-with-null-values).\n\nWhy include this tip if `NOT IN` doesn't work with `NULL` values anyway?\n\nWell just because a column allows `NULL` values does not mean there **are** any `NULL` values present and if you're working with a table that you cannot alter you'll want to use `NOT EXISTS` to speed up your query.\n\n-----\n### Use `QUALIFY` to filter window functions\n\n`QUALIFY` lets you filter the results of a query based on a window function, meaning you don't need\nto use an inline view to filter your result set and thus reducing the number of lines of code needed.\n\nFor example, if I want to return the top 10 markets per product I can use\n`QUALIFY` rather than an inline view:\n\n```SQL\n-- Using QUALIFY:\nSELECT \nproduct\n, market\n, SUM(revenue) AS market_revenue \nFROM sales\nGROUP BY product, market\nQUALIFY DENSE_RANK() OVER (PARTITION BY product ORDER BY SUM(revenue) DESC)  \u003c= 10\nORDER BY product, market_revenue\n;\n\n-- Without QUALIFY:\nSELECT \nproduct\n, market\n, market_revenue \nFROM\n(\nSELECT \nproduct\n, market\n, SUM(revenue) AS market_revenue\n, DENSE_RANK() OVER (PARTITION BY product ORDER BY SUM(revenue) DESC) AS market_rank\nFROM sales\nGROUP BY product, market\n)\nWHERE market_rank  \u003c= 10\nORDER BY product, market_revenue\n;\n```\n\nUnfortunately it looks like `QUALIFY` is only available in the big data warehouses (Snowflake, Amazon Redshift, Google BigQuery) but I had to include this because it's so useful.\n\n-----\n### You can (but shouldn't always) `GROUP BY` column position\n\nInstead of using the column name, you can `GROUP BY` or `ORDER BY` using the\ncolumn position.\n\n- This can be useful for ad-hoc/one-off queries, but for production code\nyou should always refer to a column by its name.\n\n```SQL\nSELECT \ndept_no\n, SUM(salary) AS dept_salary\nFROM employees\nGROUP BY 1 -- dept_no is the first column in the SELECT clause.\nORDER BY 2 DESC\n;\n```\n\n-----\n### You can create a grand total with `GROUP BY ROLLUP`\nCreating a grand total (or sub-totals) is possible thanks to `GROUP BY ROLLUP`.\n\nFor example, if you've aggregated a company's employees salary per department you \ncan use `GROUP BY ROLLUP` to create a grand total that sums up the aggregated\n`dept_salary` column.\n\n```SQL\nSELECT \nCOALESCE(dept_no, 'Total') AS dept_no\n, SUM(salary) AS dept_salary\nFROM employees\nGROUP BY ROLLUP(dept_no)\nORDER BY dept_salary -- Be sure to order by this column to ensure the Total appears last/at the bottom of the result set.\n;\n```\n\n-----\n### Use `EXCEPT` to find the difference between two tables\n\n`EXCEPT` returns rows from the first query's result set that don't appear in the second query's result set.\n\n```SQL\n/*\nMiles Davis will be returned from\nthis query\n*/\nSELECT artist_name\nFROM artist\nWHERE artist_name = 'Miles Davis'\nEXCEPT \nSELECT artist_name\nFROM artist\nWHERE artist_name = 'Nirvana'\n;\n\n/*\nNothing will be returned from this\nquery as 'Miles Davis' appears in\nboth queries' result sets.\n*/\nSELECT artist_name\nFROM artist\nWHERE artist_name = 'Miles Davis'\nEXCEPT \nSELECT artist_name\nFROM artist\nWHERE artist_name = 'Miles Davis'\n;\n```\n\nYou can also utilise `EXCEPT` with `UNION ALL` to verify whether two tables have the same data.\n\nIf no rows are returned the tables are identical - otherwise, what's returned are rows causing the difference:\n\n```SQL\n/* \nThe first query will return rows from\nemployees that aren't present in\ndepartment.\n\nThe second query will return rows from\ndepartment that aren't present in employees.\n\nThe UNION ALL will ensure that the\nfinal result set returned combines\nthese all of these rows so you know\nwhich rows are causing the difference.\n*/\n(\nSELECT \nid\n, employee_name\nFROM employees\nEXCEPT \nSELECT \nid\n, employee_name\nFROM department\n)\nUNION ALL \n(\nSELECT \nid\n, employee_name\nFROM department\nEXCEPT\nSELECT \nid\n, employee_name\nFROM employees\n)\n;\n\n```\n\n## Avoid pitfalls\n\n### Be aware of how `NOT IN` behaves with `NULL` values\n\n`NOT IN` doesn't work if `NULL` is present in the values being checked against. As `NULL` represents Unknown the SQL engine can't verify that the value being checked is not present in the list.\n- Instead use `NOT EXISTS`.\n\n``` SQL\nINSERT INTO departments (id)\nVALUES (1), (2), (NULL);\n\n-- Doesn't work due to NULL:\nSELECT * \nFROM employees \nWHERE department_id NOT IN (SELECT DISTINCT id from departments)\n;\n\n-- Solution.\nSELECT * \nFROM employees e\nWHERE NOT EXISTS (\n    SELECT 1 \n    FROM departments d \n    WHERE d.id = e.department_id\n)\n;\n```\n\n-----\n### Avoid ambiguity when naming calculated fields\n\nWhen creating a calculated field, naming it the same as an existing column can lead to unexpected behaviour.\n\nNote [Snowflake's documentation](https://docs.snowflake.com/en/sql-reference/constructs/group-by) on the topic:\n\n*\"If a GROUP BY clause contains a name that matches both a column name and an alias, then the GROUP BY clause uses the column name.\"*\n\nFor example you might expect the following to return 2 rows but what's actually returned is 3 rows:\n\n```SQL\nCREATE TABLE products (\n    product VARCHAR(50) NOT NULL,\n    revenue INT NOT NULL\n)\n;\n\nINSERT INTO products (product, revenue)\nVALUES \n    ('Shark', 100),\n    ('Robot', 150),\n    ('Racecar', 90);\n\nSELECT \nLEFT(product, 1) AS product -- Returns the first letter of the product value.\n, MAX(revenue) as max_revenue\nFROM products\nGROUP BY product\n;\n```\n\n|PRODUCT|MAX_REVENUE|\n|-------|------------|\n|S|100|\n|R|150|\n|R|90|\n\nWhat's happened is that the `LEFT` function has been applied after the product column has been \ngrouped and aggregation applied.\n\nThe solution is to use a unique alias or be more explicit in the `GROUP BY` clause: \n\n```SQL\n-- Solution option 1:\nSELECT \nLEFT(product, 1) AS product_letter\n, MAX(revenue) AS max_revenue\nFROM products\nGROUP BY product_letter\n;\n\n-- Solution option 2:\nSELECT \nLEFT(product, 1) AS product,\n, MAX(revenue) AS max_revenue\nFROM products\nGROUP BY LEFT(product, 1)\n;\n```\n\nResult:\n\n|PRODUCT_LETTER|MAX_REVENUE|\n|--------------|-----------|\n|S|100|\n|R|150|\n\n\nAssigning an alias to a calculated field can also be problematic when it comes to window functions.\n\nIn this example the the `CASE` statement is being applied AFTER the window function has executed:\n\n```SQL\n/*\nThe window function will rank the 'Robot' product as 1 when it should be 3.\n*/\nSELECT \nproduct\n, CASE product WHEN 'Robot' THEN 0 ELSE revenue END AS revenue\n, RANK() OVER (ORDER BY revenue DESC)\nFROM products\n;\n```\n\nOur earlier solutions apply:\n\n```SQL\n/*\nSolution option 1 (note this might not work in all RDBMS, in which case use the other soluton):\n*/\nSELECT \nproduct\n, CASE product WHEN 'Robot' THEN 0 ELSE revenue END AS updated_revenue\n, RANK() OVER (ORDER BY updated_revenue DESC)\nFROM products\n;\n\n-- Solution option 2:\nSELECT \nproduct\n, CASE product WHEN 'Robot' THEN 0 ELSE revenue END AS revenue\n, RANK() OVER (ORDER BY CASE product WHEN 'Robot' THEN 0 ELSE revenue END DESC)\nFROM products\n;\n```\n\nMy advice - use a unique alias when possible to avoid confusion.\n\n-----\n### Implicit casting will slow down (or break) your query\n\nIf you specify a value with a different data type than a column's, your database may automatically (implicitly) convert the value.\n\nFor example, let's say the `video_id` column has a string data type and you specify an integer in the `WHERE` clause:\n\n```SQL\nSELECT video_name\nFROM video_content \n -- Behind the scenes the database will implicitly attempt to convert the video_id column to an integer:\nWHERE video_id = 200050\n```\n\nThere's a couple of problems with relying on implicit casting:\n\n1) An error may be thrown if the implicit conversion isn't possible - for example, if one of the video IDs has a string value of _'abc2000'_\n\n2) \\*Your query will likely be slower, due to the additional work of converting each value to the specified data type.\n\nInstead, use the same data type as the column you're operating on (`WHERE video_ID = '200050'`) or, to avoid errors, use a function like [`TRY_TO_NUMBER`](https://docs.snowflake.com/en/sql-reference/functions/try_to_decimal) that \nwill attempt the conversion but handle any errors:\n\n```SQL\nSELECT video_name\nFROM video_content \n -- This won't result in an error:\nWHERE TRY_TO_NUMBER(video_id) = 200050\n```\n\n\\* Note that this depends on the size of the dataset being operated on. \n\n-----\n### Always specify which column belongs to which table\n\nWhen you have complex queries with multiple joins, it pays to be able to \ntrace back an issue with a value to its source. \n\nAdditionally, your RDBMS might raise an error if two tables share the same\ncolumn name and you don't specify which column you are using.\n\n```SQL\nSELECT \nvc.video_id\n, vc.series_name\n, metadata.season\n, metadata.episode_number\nFROM video_content AS vc \n    INNER JOIN video_metadata AS metadata\n    ON vc.video_id = metadata.video_id\n;\n```\n\n-----\n### Understand the order of execution\nIf I had to give one piece of advice to someone learning SQL, it'd be to understand the order of \nexecution (of clauses). It will completely change how you write queries. This [blog post](https://blog.jooq.org/a-beginners-guide-to-the-true-order-of-sql-operations/) is a fantastic resource for learning.\n\n-----\n### Comment your code!\nWhile in the moment you know why you did something, if you revisit\nthe code weeks, months or years later you might not remember.\n- In general you should strive to write comments that explain why you did something, not how.\n- Your colleagues and future self will thank you!\n\n```SQL\nSELECT \nvideo_content.*\nFROM video_content\n    LEFT JOIN archive -- New CMS cannot process archive video formats. \n    ON video_content.video_id = archive.video_id\nWHERE 1=1\nAND archive.video_id IS NULL\n;\n```\n\n-----\n### Read the documentation (in full)\nUsing Snowflake I once needed to return the latest date from a list of columns \nand so I decided to use `GREATEST()`.\n\nWhat I didn't realise was that if one of the\narguments is `NULL` then the function returns `NULL`. \n\nIf I'd read the documentation in full I'd have known! In many cases it can take just a minute or less to scan\nthe documentation and it will save you the headache of having to work\nout why something isn't working the way you expected:\n\n```SQL\n/*\nIf I'd read the documentation\nfurther I'd also have realised\nthat my solution to the NULL\nproblem with GREATEST()... \n*/\n\nSELECT COALESCE(GREATEST(signup_date, consumption_date), signup_date, consumption_date);\n\n/*\n... could have been solved with the\nfollowing function:\n*/\nSELECT GREATEST_IGNORE_NULLS(signup_date, consumption_date);\n```\n\n-----\n### Use descriptive names for your saved queries\n\nThere's almost nothing worse than not being able to find a query you need to re-run/refer back to.\n\nUse a descriptive name when saving your queries so you can easily find what you're looking for.\n\nI usually will write the subject of the query, the month the query was ran and the name of the requester (if they exist).\nFor example: `Lapsed users analysis - 2023-09-01 - Olivia Roberts`\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fben-nour%2Fsql-tips-and-tricks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fben-nour%2Fsql-tips-and-tricks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fben-nour%2Fsql-tips-and-tricks/lists"}