{"id":25030184,"url":"https://github.com/juliargubolin/sql-for-data-analysis","last_synced_at":"2026-01-11T02:41:46.405Z","repository":{"id":258719508,"uuid":"860125806","full_name":"JuliarGubolin/sql-for-data-analysis","owner":"JuliarGubolin","description":"This repository was created in order to insert all the documents, files and notes I took while learning SQL and data analysis through \"SQL for Data Analysis: Advanced Techniques for Transforming Data Into Insights\" by Cathy Tanimura (O'Reilly).","archived":false,"fork":false,"pushed_at":"2024-10-17T22:50:26.000Z","size":2,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-05T21:59:16.626Z","etag":null,"topics":["advanced","data-analysis","data-science","sql"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JuliarGubolin.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-19T21:41:34.000Z","updated_at":"2025-01-18T04:33:55.000Z","dependencies_parsed_at":"2024-10-20T10:07:00.167Z","dependency_job_id":null,"html_url":"https://github.com/JuliarGubolin/sql-for-data-analysis","commit_stats":null,"previous_names":["juliargubolin/sql-for-data-analysis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliarGubolin%2Fsql-for-data-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliarGubolin%2Fsql-for-data-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliarGubolin%2Fsql-for-data-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliarGubolin%2Fsql-for-data-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JuliarGubolin","download_url":"https://codeload.github.com/JuliarGubolin/sql-for-data-analysis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246367496,"owners_count":20765891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["advanced","data-analysis","data-science","sql"],"created_at":"2025-02-05T21:57:03.295Z","updated_at":"2026-01-11T02:41:46.399Z","avatar_url":"https://github.com/JuliarGubolin.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# SQL INTERMEDIATE/ADVACED PRACTICING\n\nThis repository has the aim to present pratical queries and graphics I did/do while I am studying. The SQL content I present here was learned by **\"*SQL for Data Analysis*, written by Cathy Tanimura (O'Reilly). Copyright 2021 Cathy Tanimura, 978-1-492-08878-3\".**\n\nThe topics are based by each chapters's content.\n\nI used BigQuery to practice and datasets I got from **Kaggle** and from **basededados** (a Brazilian team that provides clean databases to apply analysis for free).\n\n\n## CHAPTER 1 AND CHAPTER 2: INTRODUCTION AND PREPARING DATA FOR ANALYSIS\n\nTo practice the examples of this chapter, I downloaded a dataset from Kaggle, which has information about job salaries in Data Science domain. You can check [here](https://www.kaggle.com/datasets/ruchi798/data-science-job-salaries/discussion/344701). The author is **Ruchi Bhatia** and this dataset has data from two years ago. \n\nThe content I am going to practice is: bining and window functions. [Link](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1sdias-de-codigo-alura!2ssalaries_datascience_domain).\n\nBefore starts the analysis, I searched for duplicates and null values. I found out there where some dupplicate rows (53 rows). So, I deleted this rows.\n\n- **FIND DUPLICATES:** This query returns a integer number which represents how many duplicated rows are in the dataset. After I deleted all duplicated rows, the result shown was 0.\n\n~~~~\nSELECT COUNT(*) AS duplicated_rows\nFROM \n(\n  SELECT cod_id, work_year, experience_level, employment_type, \n  job_title, salary, salary_currency, salary_in_usd, employee_residence,\n  remote_ratio, company_location, company_size,\n  COUNT(*) as records\n  FROM `dias-de-codigo-alura.salaries_datascience_domain.salaries_datascience`\n  GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12\n) a\nWHERE records \u003e 1;\n~~~~\n\nThere where any null values and the column types were pretty clean\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuliargubolin%2Fsql-for-data-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuliargubolin%2Fsql-for-data-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuliargubolin%2Fsql-for-data-analysis/lists"}