https://github.com/newking9088/sql-guide-to-solve-complex-data-science-problems
This is a comprehensive SQL guide for both MySQL users and PostgreSQL users, covering topics from basic `SELECT` statements to advanced window functions. My SQL learning journey and the suggestions from my mentees, colleagues and seniors to document this were the motivations to write this document.
https://github.com/newking9088/sql-guide-to-solve-complex-data-science-problems
dataanalysis datascience-machinelearning mysql postgresql sql
Last synced: 2 months ago
JSON representation
This is a comprehensive SQL guide for both MySQL users and PostgreSQL users, covering topics from basic `SELECT` statements to advanced window functions. My SQL learning journey and the suggestions from my mentees, colleagues and seniors to document this were the motivations to write this document.
- Host: GitHub
- URL: https://github.com/newking9088/sql-guide-to-solve-complex-data-science-problems
- Owner: newking9088
- License: mit
- Created: 2024-09-17T16:11:05.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-03T23:08:05.000Z (3 months ago)
- Last Synced: 2025-02-28T06:03:37.330Z (3 months ago)
- Topics: dataanalysis, datascience-machinelearning, mysql, postgresql, sql
- Homepage:
- Size: 882 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sql-guide-to-solve-complex-data-science-problems
This comprehensive SQL guide caters to both MySQL and PostgreSQL users, covering everything from basic SELECT statements to advanced window functions. My own SQL learning journey, along with encouragement from my mentees, colleagues, and seniors, inspired me to create this document.## My Journey with SQL
When I started as Experimental Data Scientist at the National High Magnetic Field Laboratory, I had no prior experience with SQL. My background was primarily in Fortran, Matlab, and Python. I was new to data science and quickly realized that managing and analyzing large datasets was a crucial part of my role. One particular challenge involved merging data from multiple data warehouses to extract all the features needed for magnetic strength calculation and predictive modeling. This is when I discovered the power of SQL.
Initially, learning SQL felt overwhelming. The concepts were scattered, and I struggled to find a coherent learning path. Some colleagues used PostgreSQL, others preferred MySQL, and each had its own set of functionalities and quirks. I had to decide which one to focus on, and this decision was not easy.
As a Data Science Analyst at Iontra Inc and a Machine Learning Fellow at NYC Data Science Academy, I often met colleagues who, like myself once, had no prior experience with SQL. They often felt overwhelmed by the material and struggled to grasp the concepts. This motivated me to compile this document, aiming to provide a comprehensive SQL guide with real-world problems, helping users with both MySQL and PostgreSQL. There are only some differences between them; usually, PostgreSQL has more functionalities available, but MySQL is more flexible. This guide aims to bridge those gaps and provide a cohesive learning experience. This guide includes e-commerce, finance, and tech-focused examples, along with 50 LeetCode problems ranging from Medium to Hard difficulty. Each problem comes with a detailed solution and a breakdown of the problem and query.By understanding and solving these problems, you will advance your SQL knowledge to an advanced or expert level as a data scientist.
## Key Five SQL Concepts for Data Science
### Basic SQL Statements
First, we start with basic SQL concepts such as `SELECT`, `CREATE`, `INSERT`, `UPDATE`, and `WHERE` operations. These foundational commands allow you to fetch data from the database, filter results, create new tables, and modify or update existing tables. Mastering these basics is essential for efficiently managing and manipulating data in any relational database.### Grouping Data
One of the most important topics in data science is grouping. Understanding how to use the `GROUP BY` clause was a game changer for me. It allows you to aggregate data and perform calculations on groups of rows, which is essential for summarizing data and gaining insights. For example, calculating the average sales per region or the total revenue per product category becomes straightforward with `GROUP BY`.
### Joins
Another crucial concept is `JOIN`. I often encountered questions about when to use `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, or other types of joins. Joins are fundamental for combining data from multiple tables based on related columns. Understanding the differences and applications of each type of join is vital for merging datasets and ensuring data integrity.
### Common Table Expressions (CTEs)
Sometimes, SQL queries can become long and complex, especially when dealing with nested subqueries. This can make the code difficult to read and maintain. Learning about Common Table Expressions (CTEs) was a breakthrough for me. CTEs allow you to break down complex queries into simpler, more manageable parts, improving readability and maintainability.
### Window Functions
In data science, calculating running totals, cumulative sums, and identifying top performers are common tasks. Window functions are incredibly powerful for these purposes. They allow you to perform calculations across a set of table rows that are related to the current row. Functions like `ROW_NUMBER()`, `RANK()`, and `SUM()` over a window partition are invaluable for advanced data analysis.
Investing time in learning these SQL concepts is a pathway to success in data science. Mastering `GROUP BY`, joins, CTEs, and window functions will significantly enhance your ability to manage and analyze data efficiently. These skills not only make your work easier but also enable you to derive deeper insights and make more informed decisions.
**Enjoy your learning journey and best of luck! Please follow along with the `how_to_sql.md` guide.**