{"id":21703465,"url":"https://github.com/noopur-phadkar/sql-cheatsheet","last_synced_at":"2025-03-20T16:46:47.977Z","repository":{"id":251813878,"uuid":"838522827","full_name":"noopur-phadkar/SQL-Cheatsheet","owner":"noopur-phadkar","description":"I am using this repository to store my SQL cheatsheet and save some complicated SQL questions I come across.","archived":false,"fork":false,"pushed_at":"2024-08-17T03:32:32.000Z","size":61,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-25T15:33:30.194Z","etag":null,"topics":["join","nested-queries","sql"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/noopur-phadkar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-05T20:16:51.000Z","updated_at":"2024-08-17T03:32:35.000Z","dependencies_parsed_at":"2024-08-05T23:44:03.451Z","dependency_job_id":"0006d403-7820-46dd-a3e9-614a22649905","html_url":"https://github.com/noopur-phadkar/SQL-Cheatsheet","commit_stats":null,"previous_names":["noopur-phadkar/sql-cheatsheet"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noopur-phadkar%2FSQL-Cheatsheet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noopur-phadkar%2FSQL-Cheatsheet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noopur-phadkar%2FSQL-Cheatsheet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noopur-phadkar%2FSQL-Cheatsheet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/noopur-phadkar","download_url":"https://codeload.github.com/noopur-phadkar/SQL-Cheatsheet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244656073,"owners_count":20488635,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["join","nested-queries","sql"],"created_at":"2024-11-25T21:32:32.893Z","updated_at":"2025-03-20T16:46:47.937Z","avatar_url":"https://github.com/noopur-phadkar.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# SQL-Cheatsheet\n\n# What is SQL?\n\nSQL, or Structured Query Language, is a language designed to allow both technical and non-technical users query, manipulate, and transform data from a relational database. And due to its simplicity, SQL databases provide safe and scalable storage for millions of websites and mobile applications.\n\n# What is a Relational Database?\n\nA relational database is a type of database that stores and provides access to data points that are related to one another. Relational [databases](https://www.oracle.com/database/what-is-database/) are based on the relational model, an intuitive, straightforward way of representing data in tables. In a relational database, each row in the table is a record with a unique ID called the key. The columns of the table hold attributes of the data, and each record usually has a value for each attribute, making it easy to establish the relationships among data points.\n\n\u003caside\u003e\n💡 [https://www.oracle.com/database/what-is-a-relational-database/](https://www.oracle.com/database/what-is-a-relational-database/)\n\n\u003c/aside\u003e\n\nThe relational model means that the logical data structures—the data tables, views, and indexes—are separate from the physical storage structures. This separation means that database administrators can manage physical data storage without affecting access to that data as a logical structure. For example, renaming a database file does not rename the tables stored within it.\n\n# Types of SQL Queries\n\nIn the realm of relational databases, SQL (Structured Query Language) serves as the fundamental tool for interacting with and managing data. SQL commands are broadly categorized into \n\n## Data Definition Language (DDL) - shape the database structure\n\n- **CREATE**: Used to create database objects like tables, views, or indexes.\n- **ALTER**: Modifies the structure of existing database objects.\n- **DROP**: Deletes database objects, such as tables or views.\n- **TRUNCATE**: Removes all records from a table but retains the table structure.\n\n## Data Manipulation Language (DML) - handle data modification\n\n- **SELECT**: Retrieves data from one or more tables.\n- **INSERT**: Adds new records into a table.\n- **UPDATE**: Modifies existing records in a table.\n- **DELETE**: Removes records from a table.\n\n## Data Control Language (DCL) - manage access and permissions\n\n- **GRANT**: Provides specific privileges to database users.\n- **REVOKE**: Withdraws previously granted privileges.\n\n## Transaction Control Language (TCL) - ensure transaction integrity\n\n- **COMMIT**: Finalizes a transaction, making all changes permanent.\n- **ROLLBACK**: Reverts the database to its state before the beginning of a transaction.\n- **SAVEPOINT**: Sets points within transactions to which you can later roll back.\n\n## Data Query Language (DQL) - facilitate data retrieval\n\n- **SELECT**: Primarily used for querying the database to retrieve specific information.\n\n# Basic Queries\n\n## SELECT\n\n```sql\nSELECT column, another_column, … \n\tFROM mytable;\n```\n\n## WHERE - Queries with constraints\n\n```sql\nSELECT column, another_column, …\n\tFROM mytable\n\t\tWHERE condition\n\t    AND/OR another_condition\n\t    AND/OR …;\n```\n\nInteger Operators -\n\n| Operator | Condition | SQL Example |\n| --- | --- | --- |\n| =, !=, \u003c \u003c=, \u003e, \u003e= | Standard numerical operators | col_name != 4 |\n| BETWEEN … AND … | Number is within range of two values (inclusive) | col_name BETWEEN 1.5 AND 10.5 |\n| NOT BETWEEN … AND … | Number is not within range of two values (inclusive) | col_name NOT BETWEEN 1 AND10 |\n| IN (…) | Number exists in a list | col_name IN (2, 4, 6) |\n| NOT IN (…) | Number does not exist in a list | col_name NOT IN (1, 3, 5) |\n\nString Operators -\n\n| Operator | Condition | Example |\n| --- | --- | --- |\n| = | Case sensitive exact string comparison (notice the single equals) | col_name = \"abc\" |\n| != or \u003c\u003e | Case sensitive exact string inequality comparison | col_name != \"abcd\" |\n| LIKE | Case insensitive exact string comparison | col_name LIKE \"ABC\" |\n| NOT LIKE | Case insensitive exact string inequality comparison | col_name NOT LIKE \"ABCD\" |\n| % | Used anywhere in a string to match a sequence of zero or more characters (only with LIKE or NOT LIKE) | col_name LIKE \"%AT%\"(matches \"AT\", \"ATTIC\", \"CAT\" or even \"BATS\") |\n| _ | Used anywhere in a string to match a single character (only with LIKE or NOT LIKE) | col_name LIKE \"AN_\"(matches \"AND\", but not \"AN\") |\n| IN (…) | String exists in a list | col_name IN (\"A\", \"B\", \"C\") |\n| NOT IN (…) | String does not exist in a list | col_name NOT IN (\"D\", \"E\", \"F\") |\n\n## DISTINCT - discard rows that have a duplicate column value\n\n```sql\nSELECT DISTINCT column, another_column, …\n\tFROM mytable\n\t\tWHERE condition(s);\n```\n\n## ORDER - sort alpha-numerically based on the specified column's value\n\n```sql\nSELECT column, another_column, …\n\tFROM mytable\n\t\tWHERE condition(s)\n\t\t\tORDER BY column ASC/DESC;\n```\n\n## LIMIT and OFFSET\n\n**`LIMIT`** will reduce the number of rows to return, and the optional **`OFFSET`** will specify where to begin counting the number rows from.\n\n```sql\nSELECT column, another_column, …\n\tFROM mytable\n\t\tWHERE condition(s)\n\t\t\tORDER BY column ASC/DESC\n\t\t\t\tLIMIT num_limit OFFSET num_offset;\n```\n\nExample - List the first five movies sorted alphabetically\n\n```sql\nSELECT * \n\tFROM movies \n\t\tORDER BY title asc \n\t\t\tLIMIT 5;\n```\n\nExample - List the next five movies sorted alphabetically\n\n```sql\nSELECT * \n\tFROM movies \n\t\tORDER BY title asc \n\t\t\tLIMIT 5 OFFSET 5;\n```\n\n## Exercise\n\nThe table instead contains information about a few of the most populous cities of North America[[1]](http://en.wikipedia.org/wiki/List_of_North_American_cities_by_population) including their population and geo-spatial location in the world.\n\nPositive latitudes correspond to the northern hemisphere, and positive longitudes correspond to the eastern hemisphere. Since North America is north of the equator and west of the prime meridian, all of the cities in the list have positive latitudes and negative longitudes.\n\nTable Subset: north_american_cities\n\n| City | Country | Population | Latitude | Longitude |\n| --- | --- | --- | --- | --- |\n| Guadalajara | Mexico | 1500800 | 20.659699 | -103.349609 |\n| Toronto | Canada | 2795060 | 43.653226 | -79.383184 |\n| Houston | United States | 2195914 | 29.760427 | -95.369803 |\n| New York | United States | 8405837 | 40.712784 | -74.005941 |\n\n### List all the Canadian cities and their populations\n\n```sql\nSELECT * \n\tFROM north_american_cities \n\t\tWHERE Country = \"Canada\";\n```\n\n### Order all the cities in the United States by their latitude from north to south\n\n```sql\nSELECT * \n\tFROM north_american_cities \n\t\tWHERE Country = \"United States\" \n\t\t\tORDER BY Latitude DESC;\n```\n\n### List all the cities west of Chicago, ordered from west to east\n\n```sql\nSELECT * \n\tFROM north_american_cities \n\t\tWHERE Longitude \u003c -87.629798 \n\t\t\tORDER BY Longitude ASC;\n```\n\n### List the two largest cities in Mexico (by population)\n\n```sql\nSELECT * \n\tFROM north_american_cities \n\t\tWHERE Country = \"Mexico\" \n\t\t\tORDER BY Population DESC \n\t\t\t\tLIMIT 2;\n```\n\n### List the third and fourth largest cities (by population) in the United States and their population\n\n```sql\nSELECT * \n\tFROM north_american_cities \n\t\tWHERE Country = \"United States\" \n\t\t\tORDER BY Population DESC \n\t\t\t\tLIMIT 2 OFFSET 2;\n```\n\n# Normalization\n\n\u003caside\u003e\n💡 Process of breaking down entity data into pieces and stored across multiple tables.\n\n\u003c/aside\u003e\n\nDatabase normalization is useful because it minimizes duplicate data in any single table, and allows for data in the database to grow independently of each other (ie. Types of car engines can grow independent of each type of car). As a trade-off, queries get slightly more complex since they have to be able to find data from different parts of the database, and performance issues can arise when working with many large tables.\n\nIn order to answer questions about an entity that has data spanning multiple tables in a normalized database, we need to learn how to write a query that can combine all that data and pull out exactly the information we need.\n\n# **Types of Keys in Relational Model**\n\n## Candidate Key\n\nThe minimal set of attributes that can uniquely identify a tuple is known as a candidate key.\n\n- It is a minimal super key.\n- It is a super key with no repeated data is called a candidate key.\n- The minimal set of attributes that can uniquely identify a record.\n- It must contain unique values.\n- It can contain NULL values.\n- Every table must have at least a single candidate key.\n- A table can have multiple candidate keys but only one primary key.\n- The value of the Candidate Key is unique and may be null for a tuple.\n- There can be more than one candidate key in a relationship.\n\n## Primary Key\n\n- It is a unique key.\n- It can identify only one tuple (a record) at a time.\n- It has no duplicate values, it has unique values.\n- It cannot be NULL.\n- Primary keys are not necessarily to be a single column; more than one column can also be a primary key for a table.\n- One common primary key type is an auto-incrementing integer (because they are space efficient), but it can also be a string, hashed value, so long as it is unique.\n\n## Super Key\n\nA super key is a group of single or multiple keys that identifies rows in a table. It supports NULL values.\n\n- Adding zero or more attributes to the candidate key generates the super key.\n- A candidate key is a super key but vice versa is not true.\n- Super Key values may also be NULL.\n\n## Alternate Key\n\nThe candidate key other than the primary key is called an alternate key.\n\n- All the keys which are not primary keys are called alternate keys.\n- It is a secondary key.\n- It contains two or more fields to identify two or more records.\n- These values are repeated.\n\n![Primary-key-alternative-key-in-dbms](https://github.com/user-attachments/assets/38718d28-e42a-4230-8e02-31d04add31e9)\n\n## Foreign Key\n\nIf an attribute can only take the values which are present as values of some other attribute, it will be a [**foreign key**](https://www.geeksforgeeks.org/foreign-key-constraint-in-sql/) to the attribute to which it refers. The relation which is being referenced is called referenced relation and the corresponding attribute is called referenced attribute. The referenced attribute of the referenced relation should be the primary key to it.\n\n- It is a key it acts as a primary key in one table and it acts as secondary key in another table.\n- It combines two or more relations (tables) at a time.\n- They act as a cross-reference between the tables.\n- For example, DNO is a primary key in the DEPT table and a non-key in EMP\n- They can be NULL\n- They may contain duplicate tuples i.e. it need not follow uniqueness constraint.\n\n![Foreign-keys](https://github.com/user-attachments/assets/9a5f70fd-6e8a-4bec-b5ce-a344d974f4c3)\n\n## Composite Key\n\nSometimes, a table might not have a single column/attribute that uniquely identifies all the records of a table. To uniquely identify rows of a table, a combination of two or more columns/attributes can be used.  It still can give duplicate values in rare cases. So, we need to find the optimal set of attributes that can uniquely identify rows in a table.\n\n- It acts as a primary key if there is no primary key in a table\n- Two or more attributes are used together to make a [**composite key**](https://www.geeksforgeeks.org/composite-key-in-sql/).\n- Different combinations of attributes may give different accuracy in terms of identifying the rows uniquely.\n\n![Different-types-of-keys](https://github.com/user-attachments/assets/08044c24-2f75-4020-8103-09092345ac85)\n\n\u003caside\u003e\n💡 Candidate keys allow for distinct identification, the Primary key serves as the chosen identifier, Alternate keys offer other choices, and Foreign keys create vital linkages that guarantee data integrity between tables. The creation of strong and effective relational databases requires the thoughtful application of these keys.\n\n\u003c/aside\u003e\n\n# Joins\n\n\u003caside\u003e\n💡 Tables that share information about a single entity need to have a *primary key* that identifies that entity *uniquely* across the database.\n\n\u003c/aside\u003e\n\n## Inner Join\n\n```sql\nSELECT column, another_table_column, …\n\tFROM mytable\n\t\tINNER JOIN another_table \n\t    ON mytable.id = another_table.id\n\t\t\t\tWHERE condition(s)\n\t\t\t\t\tORDER BY column, … ASC/DESC\n\t\t\t\t\t\tLIMIT num_limit OFFSET num_offset;\n```\n\nThe **`INNER JOIN`** is a process that matches rows from the first table and the second table which have the same key (as defined by the **`ON`** constraint) to create a result row with the combined columns from both tables.\n\n\u003cimg width=\"500\" alt=\"inner-join-operation\" src=\"https://github.com/user-attachments/assets/d6ca0061-b6d4-4747-8dab-8053e785d737\"\u003e\n\n```sql\nSELECT * \n\tFROM left_table \n\t\tINNER JOIN right_table \n\t\t\tWHERE left_table.countryID=right_table.ID;\n```\n\n## Full Join\n\n```sql\nSELECT column, another_column, …\n\tFROM mytable\n\t\tFULL JOIN another_table \n\t    ON mytable.id = another_table.matching_id\n\t\t\t\tWHERE condition(s)\n\t\t\t\t\tORDER BY column, … ASC/DESC\n\t\t\t\t\t\tLIMIT num_limit OFFSET num_offset;\n```\n\nThis is an outer join, as known as **`FULL OUTER JOIN` .**\n\nA **`FULL JOIN`** simply means that rows from both tables are kept, regardless of whether a matching row exists in the other table.\n\n\u003cimg width=\"500\" alt=\"full-outer-join-operation\" src=\"https://github.com/user-attachments/assets/3d2d77cd-1753-4f05-aae1-4016111b2b90\"\u003e\n\n\n```sql\nSELECT * \n\tFROM left_table \n\t\tFULL JOIN right_table \n\t\t\tON left_table.countryID = right_table.ID;\n```\n\n## Left Join\n\n```sql\nSELECT column, another_column, …\n\tFROM mytable\n\t\tLEFT JOIN another_table \n\t    ON mytable.id = another_table.matching_id\n\t\t\t\tWHERE condition(s)\n\t\t\t\t\tORDER BY column, … ASC/DESC\n\t\t\t\t\t\tLIMIT num_limit OFFSET num_offset;\n```\n\nThis is an outer join, as known as **`LEFT OUTER JOIN` .**\n\nA **`LEFT JOIN`** simply includes rows from A regardless of whether a matching row is found in B\n\n\u003cimg width=\"500\" alt=\"left-outer-join-operation\" src=\"https://github.com/user-attachments/assets/063178f8-5606-47d9-afa1-702d82518a4c\"\u003e\n\n```sql\nSELECT * \n\tFROM left_table \n\t\tLEFT JOIN right_table \n\t\t\tWHERE left_table.countryID = right_table.ID;\n```\n\n## Right Join\n\n```sql\nSELECT column, another_column, …\n\tFROM mytable\n\t\tRIGHT JOIN another_table \n\t    ON mytable.id = another_table.matching_id\n\t\t\t\tWHERE condition(s)\n\t\t\t\t\tORDER BY column, … ASC/DESC\n\t\t\t\t\t\tLIMIT num_limit OFFSET num_offset;\n```\n\nThis is an outer join, as known as **`RIGHT OUTER JOIN` .**\n\nThe `RIGHT JOIN`  returns all records from the table B, and the matching records from the table A. The result is 0 records from the left side, if there is no match.\n\n![right-outer-join-operation](https://github.com/user-attachments/assets/03296995-41ca-429b-888b-9194a6e1373c)\n\n```sql\nSELECT * \n\tFROM left_table \n\t\tRIGHT JOIN right_table \n\t\t\tWHERE left_table.countryID=right_table.ID;\n```\n\n## Exercise 1\n\nTable: Movies\n\n| Id | Title | Director | Year | Length_minutes |\n| --- | --- | --- | --- | --- |\n| 1 | Toy Story | John Lasseter | 1995 | 81 |\n| 2 | A Bug's Life | John Lasseter | 1998 | 95 |\n| 3 | Toy Story 2 | John Lasseter | 1999 | 93 |\n| 4 | Monsters, Inc. | Pete Docter | 2001 | 92 |\n| 5 | Finding Nemo | Andrew Stanton | 2003 | 107 |\n| 6 | The Incredibles | Brad Bird | 2004 | 116 |\n\nTable: Boxoffice\n\n| Movie_id | Rating | Domestic_sales | International_sales |\n| --- | --- | --- | --- |\n| 5 | null | 380843261 | 555900000 |\n| 14 | 7.4 | 268492764 | 475066843 |\n| 8 | 8 | 206445654 | 417277164 |\n| 12 | 6.4 | 191452396 | 368400000 |\n| 3 | null | 245852179 | 239163000 |\n| 6 | 8 | 261441092 | 370001000 |\n\n### Find the domestic and international sales for each movie\n\n```sql\nSELECT * \n\tFROM movies \n\t\tINNER JOIN boxoffice \n\t\t\tON movies.id = boxoffice.movie_id;\n```\n\n### Show the sales numbers for each movie that did better internationally rather than domestically\n\n```sql\nSELECT title, boxoffice.domestic_sales, boxoffice.international_sales \n\tFROM movies \n\t\tINNER JOIN boxoffice \n\t\t\tON movies.id = boxoffice.movie_id \n\t\t\t\tWHERE \n\t\tboxoffice.international_sales \u003e boxoffice.domestic_sales;\n```\n\n### List all the movies by their ratings in descending order\n\n```sql\nSELECT * \n\tFROM movies \n\t\tINNER JOIN boxoffice \n\t\t\tON movies.id = boxoffice.movie_id \n\t\t\t\tORDER BY boxoffice.rating DESC;\n```\n\n### Find all information for each movie and sort by title\n\n```sql\nSELECT * \n\tFROM movies\n\t\tFULL OUTER JOIN boxoffice \n\t\t\tON movies.id = boxoffice.movie_id\n\t\t\t\tORDER BY movies.title ASC;\n```\n\n## Exercise 2\n\nTable: Customers\n\n| CustomerID | CustomerName | ContactName | Address |\n| --- | --- | --- | --- |\n| 1 | Alfreds Futterkiste | Maria Anders | Obere Str. 57 |\n| 2 | Ana Trujillo Emparedados y helados | Ana Trujillo | Avda. de 222 |\n| 3 | Antonio Moreno Taquería | Antonio Moreno | Mataderos 2312 |\n\nTable: Orders\n\n| OrderID | CustomerID | EmployeeID | OrderDate | ShipperID |\n| --- | --- | --- | --- | --- |\n| 10308 | 2 | 7 | 1996-09-18 | 3 |\n| 10309 | 37 | 3 | 1996-09-19 | 1 |\n| 10310 | 77 | 8 | 1996-09-20 | 2 |\n\n### Display customer names and order ID, and any orders they might have:\n\n```sql\nSELECT customers.CustomerName, orders.OrderID \n\tFROM customers\n\t\tLEFT JOIN orders \n\t\t\tON customers.CustomerId = orders.CustomerID;\n```\n\n### Display customer names and order ID, and any orders they might have:\n\n```sql\nSELECT customers.CustomerName, orders.OrderID \n\tFROM customers\n\t\tRIGHT JOIN orders \n\t\t\tON customers.CustomerId = orders.CustomerID;\n```\n\n## Exercise 3\n\nTable: Buildings (Read-Only)\n\n| Building_name | Capacity |\n| --- | --- |\n| 1e | 24 |\n| 1w | 32 |\n| 2e | 16 |\n| 2w | 20 |\n\nTable: Employees (Read-Only)\n\n| Role | Name | Building | Years_employed |\n| --- | --- | --- | --- |\n| Engineer | Becky A. | 1e | 4 |\n| Engineer | Dan B. | 1e | 2 |\n| Engineer | Sharon F. | 1e | 6 |\n| Engineer | Dan M. | 1e | 4 |\n\n### Find the list of all buildings that have employees\n\n```sql\nSELECT DISTINCT building_name from buildings \n\tLEFT JOIN employees \n\t\tON buildings.building_name=employees.building \n\t\t\tWHERE employees.building;\n```\n\n### Find the list of all buildings and their capacity\n\n```sql\nSELECT DISTINCT building_name, capacity \n\tFROM buildings;\n```\n\n### List all buildings and the distinct employee roles in each building (including empty buildings)\n\n```sql\nSELECT DISTINCT employees.role, buildings.building_name \n\tFROM buildings \n\t\tLEFT JOIN employees \n\t\t\tON buildings.building_name=employees.building;\n```\n\n# Nulls\n\nIt's always good to reduce the possibility of **`NULL`** values in databases because they require special attention when constructing queries, constraints (certain functions behave differently with null values) and when processing the results.\n\nAn alternative to **`NULL`** values in your database is to have ***data-type appropriate default values***, like 0 for numerical data, empty strings for text data, etc.\n\nBut if your database needs to store incomplete data, then **`NULL`** values can be appropriate if the default values will skew later analysis (for example, when taking averages of numerical data).\n\n```sql\nSELECT column, another_column, …\n\tFROM mytable\n\t\tWHERE column IS/IS NOT NULL\n\t\t\tAND/OR another_condition\n\t\t\tAND/OR …;\n```\n\n# **Queries with expressions**\n\nHere, we use **`expressions`**  to write more complex logic on column values in a query. These expressions can use mathematical and string functions along with basic arithmetic to transform values when the query is executed.\n\nIn addition to expressions, regular columns and even tables can also have aliases to make them easier to reference in the output and as a part of simplifying more complex queries.\n\n```sql\nSELECT col_expression AS expr_description, …\n\tFROM mytable;\n```\n\nThe use of expressions can save time and extra post-processing of the result data, but can also make the query harder to read, so when expressions are used in the **SELECT** part of the query, that they are also given a descriptive *alias* using the **AS** keyword.\n\n```sql\nSELECT column AS better_column_name, …\n\tFROM a_long_widgets_table_name AS mywidgets\n\t\tINNER JOIN widget_sales\n\t\t  ON mywidgets.id = widget_sales.widget_id;\n```\n\n## Example\n\n```sql\nSELECT particle_speed / 2.0 AS half_particle_speed\n\tFROM physics_data\n\t\tWHERE ABS(particle_position) * 10.0 \u003e 500;\n```\n\n## Exercise\n\nTable: Movies (Read-Only)\n\n| Id | Title | Director | Year | Length_minutes |\n| --- | --- | --- | --- | --- |\n| 1 | Toy Story | John Lasseter | 1995 | 81 |\n| 2 | A Bug's Life | John Lasseter | 1998 | 95 |\n| 3 | Toy Story 2 | John Lasseter | 1999 | 93 |\n| 4 | Monsters, Inc. | Pete Docter | 2001 | 92 |\n| 5 | Finding Nemo | Andrew Stanton | 2003 | 107 |\n| 6 | The Incredibles | Brad Bird | 2004 | 116 |\n\nTable: Boxoffice (Read-Only)\n\n| Movie_id | Rating | Domestic_sales | International_sales |\n| --- | --- | --- | --- |\n| 5 | 8.2 | 380843261 | 555900000 |\n| 14 | 7.4 | 268492764 | 475066843 |\n| 8 | 8 | 206445654 | 417277164 |\n| 12 | 6.4 | 191452396 | 368400000 |\n| 3 | 7.9 | 245852179 | 239163000 |\n| 6 | 8 | 261441092 | 370001000 |\n\n### List all movies and their combined sal\\es in **millions** of dollars\n\n```sql\nSELECT title, (boxoffice.domestic_sales + boxoffice.international_sales)/1000000 AS total_sales \n\tFROM movies \n\t\tLEFT JOIN boxoffice \n\t\t\tON movies.id=boxoffice.movie_id;\n```\n\n### List all movies and their ratings **in percent**\n\n```sql\nSELECT title, boxoffice.rating * 10 AS total_sales \n\tFROM movies \n\t\tLEFT JOIN boxoffice \n\t\t\tON movies.id=boxoffice.movie_id;\n```\n\n### List all movies that were released on even number years\n\n```sql\nSELECT title \n\tFROM movies \n\t\tLEFT JOIN boxoffice \n\t\t\tON movies.id=boxoffice.movie_id \n\t\t\t\tWHERE movies.year % 2 == 0;\n```\n\n# Queries with Aggregates\n\nSQL supports the use of aggregate expressions (or functions) that allow you to summarize information about a group of rows of data.\n\n```sql\nSELECT AGG_FUNC(column_or_expression) AS aggregate_description, …\n\tFROM mytable\n\t\tWHERE constraint_expression;\n```\n\nWithout a specified grouping, each aggregate function is going to run on the whole set of result rows and return a single value.\n\n| Function | Description |\n| --- | --- |\n| COUNT(*), COUNT(column) | A common function used to counts the number of rows in the group if no column name is specified. Otherwise, count the number of rows in the group with non-NULL values in the specified column. |\n| MIN(column) | Finds the smallest numerical value in the specified column for all rows in the group. |\n| MAX(column) | Finds the largest numerical value in the specified column for all rows in the group. |\n| AVG(column) | Finds the average numerical value in the specified column for all rows in the group. |\n| SUM(column) | Finds the sum of all numerical values in the specified column for the rows in the group. |\n\n## Grouped Aggregate Functions\n\nIn addition to aggregating across all the rows, you can instead apply the aggregate functions to individual groups of data within that group (ie. box office sales for Comedies vs Action movies).\n\nThis would then create as many results as there are unique groups defined as by the **`GROUP BY`** clause.\n\nThe **`GROUP BY`** clause works by grouping rows that have the same value in the column specified.\n\n```sql\nSELECT AGG_FUNC(column_or_expression) AS aggregate_description, …\n\tFROM mytable\n\t\tWHERE constraint_expression\n\t\t\tGROUP BY column;\n```\n\n## Exercise\n\nTable: Employees\n\n| Role | Name | Building | Years_employed |\n| --- | --- | --- | --- |\n| Engineer | Becky A. | 1e | 4 |\n| Engineer | Dan B. | 1e | 2 |\n| Engineer | Sharon F. | 1e | 6 |\n| Engineer | Dan M. | 1e | 4 |\n| Engineer | Malcom S. | 1e | 1 |\n\n### Find the longest time that an employee has been at the studio\n\n```sql\nSELECT MAX(years_employed) \n\tFROM employees;\n```\n\n### For each role, find the average number of years employed by employees in that role\n\n```sql\nSELECT role, AVG(years_employed) \n\tFROM employees \n\t\tGROUP BY role;\n```\n\n### Find the total number of employee years worked in each building\n\n```sql\nSELECT building, SUM(years_employed) \n\tFROM employees \n\t\tGROUP BY building;\n```\n\n## HAVING Clause\n\nIf the **`GROUP BY`**clause is executed after the **`WHERE`** clause (which filters the rows which are to be grouped), then how exactly do we filter the grouped rows?\n\nSQL allows us to do this by adding an additional **`HAVING`** clause which is used specifically with the **`GROUP BY`** clause to allow us to filter grouped rows from the result set.\n\n```sql\nSELECT group_by_column, AGG_FUNC(column_expression) AS aggregate_result_alias, …\n\tFROM mytable\n\t\tWHERE condition\n\t\t\tGROUP BY column\n\t\t\t\tHAVING group_condition;\n```\n\nIf you aren't using the `GROUP BY` clause, a simple `WHERE` clause will suffice for this purpose.\n\n## Exercise\n\nTable: Employees\n\n| Role | Name | Building | Years_employed |\n| --- | --- | --- | --- |\n| Engineer | Becky A. | 1e | 4 |\n| Engineer | Dan B. | 1e | 2 |\n| Engineer | Sharon F. | 1e | 6 |\n| Engineer | Dan M. | 1e | 4 |\n| Engineer | Malcom S. | 1e | 1 |\n\n### Find the number of Artists in the studio (without a **HAVING** clause)\n\n```sql\nSELECT COUNT() FROM employees \n    WHERE role=\"Artist\"\n        GROUP BY building;\n```\n\n### Find the number of Employees of each role in the studio\n\n```sql\nSELECT role, COUNT(*)\n\tFROM employees\n\t\tGROUP BY role;\n```\n\n### Find the total number of years employed by all Engineers\n\n```sql\nSELECT SUM(years_employed)\n\tFROM employees\n\t\tGROUP BY role\n\t\t\tHAVING role=\"Engineer\";\n```\n\n# Order Of Execution\n\nComplete Query:\n\n```sql\nSELECT DISTINCT column, AGG_FUNC(column_or_expression), …\nFROM mytable\n    JOIN another_table\n      ON mytable.column = another_table.column\n    WHERE constraint_expression\n    GROUP BY column\n    HAVING constraint_expression\n    ORDER BY column ASC/DESC\n    LIMIT count OFFSET COUNT;\n```\n\nEach query begins with finding the data that we need in a database, and then filtering that data down into something that can be processed and understood as quickly as possible.\n\n## 1. **`FROM`** and **`JOIN`**s\n\nThe **`FROM`** clause, and subsequent **`JOIN`**s are first executed to determine the total working set of data that is being queried. This includes subqueries in this clause, and can cause temporary tables to be created under the hood containing all the columns and rows of the tables being joined.\n\n## 2. **`WHERE`**\n\nOnce we have the total working set of data, the first-pass **`WHERE`** constraints are applied to the individual rows, and rows that do not satisfy the constraint are discarded. Each of the constraints can only access columns directly from the tables requested in the **`FROM`** clause. Aliases in the **`SELECT`** part of the query are not accessible in most databases since they may include expressions dependent on parts of the query that have not yet executed.\n\n## 3. **`GROUP BY`**\n\nThe remaining rows after the **`WHERE`** constraints are applied are then grouped based on common values in the column specified in the **`GROUP BY`** clause. As a result of the grouping, there will only be as many rows as there are unique values in that column. Implicitly, this means that you should only need to use this when you have aggregate functions in your query.\n\n## 4. **`HAVING`**\n\nIf the query has a **`GROUP BY`** clause, then the constraints in the **`HAVING`** clause are then applied to the grouped rows, discard the grouped rows that don't satisfy the constraint. Like the **`WHERE`** clause, aliases are also not accessible from this step in most databases.\n\n## 5. **`SELECT`**\n\nAny expressions in the **`SELECT`** part of the query are finally computed.\n\n## 6. **`DISTINCT`**\n\nOf the remaining rows, rows with duplicate values in the column marked as **`DISTINCT`**will be discarded.\n\n## 7. **`ORDER BY`**\n\nIf an order is specified by the **`ORDER BY`** clause, the rows are then sorted by the specified data in either ascending or descending order. Since all the expressions in the **`SELECT`** part of the query have been computed, you can reference aliases in this clause.\n\n## 8. **`LIMIT`** / **`OFFSET`**\n\nFinally, the rows that fall outside the range specified by the **`LIMIT`** and **`OFFSET`** are discarded, leaving the final set of rows to be returned from the query.\n\n## Exercise\n\nTable: Movies (Read-Only)\n\n| Id | Title | Director | Year | Length_minutes |\n| --- | --- | --- | --- | --- |\n| 1 | Toy Story | John Lasseter | 1995 | 81 |\n| 2 | A Bug's Life | John Lasseter | 1998 | 95 |\n| 3 | Toy Story 2 | John Lasseter | 1999 | 93 |\n| 4 | Monsters, Inc. | Pete Docter | 2001 | 92 |\n| 5 | Finding Nemo | Andrew Stanton | 2003 | 107 |\n\nTable: Boxoffice (Read-Only)\n\n| Movie_id | Rating | Domestic_sales | International_sales |\n| --- | --- | --- | --- |\n| 5 | 8.2 | 380843261 | 555900000 |\n| 14 | 7.4 | 268492764 | 475066843 |\n| 8 | 8 | 206445654 | 417277164 |\n| 12 | 6.4 | 191452396 | 368400000 |\n| 3 | 7.9 | 245852179 | 239163000 |\n| 6 | 8 | 261441092 | 370001000 |\n\n### Find the total domestic and international sales that can be attributed to each director\n\n```sql\nSELECT director, SUM(boxoffice.domestic_sales + boxoffice.international_sales) AS total_sales \n    FROM movies\n\t    LEFT JOIN boxoffice\n\t\t    ON movies.id=boxoffice.movie_id\n\t\t\t    GROUP BY director;\n```\n\n# Create Table\n\n```sql\nCREATE TABLE IF NOT EXISTS mytable (\n    column DataType TableConstraint DEFAULT default_value,\n    another_column DataType TableConstraint DEFAULT default_value,\n    …\n);\n```\n\n## DataTypes\n\n| Data type | Description |\n| --- | --- |\n| INTEGER, BOOLEAN | The integer datatypes can store whole integer values like the count of a number or an age. In some implementations, the boolean value is just represented as an integer value of just 0 or 1. |\n| FLOAT, DOUBLE, REAL | The floating point datatypes can store more precise numerical data like measurements or fractional values. Different types can be used depending on the floating point precision required for that value. |\n| CHARACTER(num_chars), VARCHAR(num_chars), TEXT | The text based datatypes can store strings and text in all sorts of locales. The distinction between the various types generally amount to underlaying efficiency of the database when working with these columns.\nBoth the CHARACTER and VARCHAR (variable character) types are specified with the max number of characters that they can store (longer values may be truncated), so can be more efficient to store and query with big tables. |\n| DATE, DATETIME | SQL can also store date and time stamps to keep track of time series and event data. They can be tricky to work with especially when manipulating data across timezones. |\n| BLOB | Finally, SQL can store binary data in blobs right in the database. These values are often opaque to the database, so you usually have to store them with the right metadata to requery them. |\n\n## Table Constraints\n\n| Constraint | Description |\n| --- | --- |\n| PRIMARY KEY | This means that the values in this column are unique, and each value can be used to identify a single row in this table. |\n| AUTOINCREMENT | For integer values, this means that the value is automatically filled in and incremented with each row insertion. Not supported in all databases. |\n| UNIQUE | This means that the values in this column have to be unique, so you can't insert another row with the same value in this column as another row in the table. Differs from the `PRIMARY KEY` in that it doesn't have to be a key for a row in the table. |\n| NOT NULL | This means that the inserted value can not be `NULL`. |\n| CHECK (expression) | This allows you to run a more complex expression to test whether the values inserted are valid. For example, you can check that values are positive, or greater than a specific size, or start with a certain prefix, etc. |\n| FOREIGN KEY | This is a consistency check which ensures that each value in this column corresponds to another value in a column in another table.For example, if there are two tables, one listing all Employees by ID, and another listing their payroll information, the `FOREIGN KEY` can ensure that every row in the payroll table corresponds to a valid employee in the master Employee list. |\n\n# Insert Row\n\n```sql\nINSERT INTO mytable\n(column, another_column, …)\nVALUES (value_or_expr, another_value_or_expr, …),\n      (value_or_expr_2, another_value_or_expr_2, …),\n      …;\n```\n\n# Update Row\n\n```sql\nUPDATE mytable\nSET column = value_or_expr, \n    other_column = another_value_or_expr, \n    …\nWHERE condition;\n```\n\n# Delete Row\n\n```sql\nDELETE FROM mytable\nWHERE condition;\n```\n\n# Alter Table\n\n## Adding Columns\n\n```sql\nALTER TABLE mytable\nADD column_name DataType OptionalTableConstraint \n    DEFAULT default_value;\n```\n\n## Removing Columns\n\n```sql\nALTER TABLE mytable\nDROP column_to_be_deleted;\n```\n\n## Renaming Table\n\n```sql\nALTER TABLE mytable\nRENAME TO new_table_name;\n```\n\n# Drop Table\n\n```sql\nDROP TABLE IF EXISTS mytable;\n```\n\n\u003caside\u003e\n💡  If we have another table that is dependent on columns in table you are removing (for example, with a **`FOREIGN KEY`** dependency) then you will have to either update all dependent tables first to remove the dependent rows or to remove those tables entirely.\n\n\u003c/aside\u003e\n\n# Nested Queries / Subqueries\n\n## Example\n\nLets say your company has a list of all Sales Associates, with data on the revenue that each Associate brings in, and their individual salary. Times are tight, and you now want to find out which of your Associates are costing the company more than the average revenue brought per Associate.\n\nFirst, you would need to calculate the average revenue all the Associates are generating:\n\n```sql\nSELECT AVG(revenue_generated)\nFROM sales_associates;\n```\n\nAnd then using that result, we can then compare the costs of each of the Associates against that value. To use it as a subquery, we can just write it straight into the **`WHERE`** clause of the query:\n\n```sql\nSELECT *\nFROM sales_associates\nWHERE salary \u003e \n   **(SELECT AVG(revenue_generated)\n    FROM sales_associates)**;\n```\n\nAs the constraint is executed, each Associate's salary will be tested against the value queried from the inner subquery.\n\n## Correlated subqueries\n\nA more powerful type of subquery is the *correlated subquery* in which the inner query references, and is dependent on, a column or alias from the outer query. Unlike the subqueries above, each of these inner queries need to be run for each of the rows in the outer query, since the inner query is dependent on the current outer query row.\n\n## Example\n\nInstead of the list of just Sales Associates above, imagine if you have a general list of Employees, their departments (engineering, sales, etc.), revenue, and salary. This time, you are now looking across the company to find the employees who perform worse than average in their department.\n\nFor each employee, you would need to calculate their cost relative to the average revenue generated by all people in their department. To take the average for the department, the subquery will need to know what department each employee is in:\n\n```sql\nSELECT *\nFROM employees\nWHERE salary \u003e \n   (SELECT AVG(revenue_generated)\n    FROM employees AS dept_employees\n    **WHERE dept_employees.department = employees.department**);\n```\n\n## Existence tests - Subquery with IN\n\n```sql\nSELECT *, …\nFROM mytable\nWHERE column\n    IN/NOT IN (SELECT another_column\n               FROM another_table);\n```\n\n# TODO\n\n1. Grant\n2. Revoke\n3. Commit\n4. Rollback\n5. Savepoint\n6. Union\n7. Intersection\n8. Exceptions ??\n9. Anti-joins\n\n# Resources-\n\n1. https://sqlbolt.com/lesson/introduction\n2. https://www.techagilist.com/mainframe/db2/db2-join-inner-joins-and-outer-joins/\n3. https://www.w3schools.com/sql/default.asp\n4. https://learn.microsoft.com/en-us/power-query/merge-queries-left-outer\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnoopur-phadkar%2Fsql-cheatsheet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnoopur-phadkar%2Fsql-cheatsheet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnoopur-phadkar%2Fsql-cheatsheet/lists"}