https://github.com/juliargubolin/sql-for-data-analysis

This repository was created in order to insert all the documents, files and notes I took while learning SQL and data analysis through "SQL for Data Analysis: Advanced Techniques for Transforming Data Into Insights" by Cathy Tanimura (O'Reilly).
https://github.com/juliargubolin/sql-for-data-analysis

advanced data-analysis data-science sql

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/juliargubolin/sql-for-data-analysis
Owner: JuliarGubolin
Created: 2024-09-19T21:41:34.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-10-17T22:50:26.000Z (over 1 year ago)
Last Synced: 2025-02-05T21:59:16.626Z (over 1 year ago)
Topics: advanced, data-analysis, data-science, sql
Homepage:
Size: 1.95 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.MD

Awesome Lists containing this project

README

          # SQL INTERMEDIATE/ADVACED PRACTICING

This repository has the aim to present pratical queries and graphics I did/do while I am studying. The SQL content I present here was learned by **"*SQL for Data Analysis*, written by Cathy Tanimura (O'Reilly). Copyright 2021 Cathy Tanimura, 978-1-492-08878-3".**

The topics are based by each chapters's content.

I used BigQuery to practice and datasets I got from **Kaggle** and from **basededados** (a Brazilian team that provides clean databases to apply analysis for free).

## CHAPTER 1 AND CHAPTER 2: INTRODUCTION AND PREPARING DATA FOR ANALYSIS

To practice the examples of this chapter, I downloaded a dataset from Kaggle, which has information about job salaries in Data Science domain. You can check [here](https://www.kaggle.com/datasets/ruchi798/data-science-job-salaries/discussion/344701). The author is **Ruchi Bhatia** and this dataset has data from two years ago. 

The content I am going to practice is: bining and window functions. [Link](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1sdias-de-codigo-alura!2ssalaries_datascience_domain).

Before starts the analysis, I searched for duplicates and null values. I found out there where some dupplicate rows (53 rows). So, I deleted this rows.

- **FIND DUPLICATES:** This query returns a integer number which represents how many duplicated rows are in the dataset. After I deleted all duplicated rows, the result shown was 0.

~~~~

SELECT COUNT(*) AS duplicated_rows

FROM 

(

  SELECT cod_id, work_year, experience_level, employment_type, 

  job_title, salary, salary_currency, salary_in_usd, employee_residence,

  remote_ratio, company_location, company_size,

  COUNT(*) as records

  FROM `dias-de-codigo-alura.salaries_datascience_domain.salaries_datascience`

  GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

) a

WHERE records > 1;

~~~~

There where any null values and the column types were pretty clean

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/juliargubolin/sql-for-data-analysis

Awesome Lists containing this project

README