{"id":15002669,"url":"https://github.com/ifrankxue/project_sql_job_data_analyst","last_synced_at":"2026-03-11T08:01:50.243Z","repository":{"id":251570039,"uuid":"837780657","full_name":"iFrankXue/project_sql_job_data_analyst","owner":"iFrankXue","description":"Project following the tutorial course created by Luke Barousse, who is an amazing data instructor.","archived":false,"fork":false,"pushed_at":"2024-08-04T22:09:16.000Z","size":299,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-18T12:16:29.737Z","etag":null,"topics":["git","github","postgresql","sql","sqlite","sqlserver","vscode"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iFrankXue.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-04T02:34:53.000Z","updated_at":"2024-08-09T05:15:50.000Z","dependencies_parsed_at":"2024-08-04T19:13:25.235Z","dependency_job_id":"d8a9fb1b-5a37-41fb-b477-894b6e01aa74","html_url":"https://github.com/iFrankXue/project_sql_job_data_analyst","commit_stats":null,"previous_names":["ifrankxue/project_tutorial_sql","ifrankxue/project_sql_job_data_analyst"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iFrankXue%2Fproject_sql_job_data_analyst","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iFrankXue%2Fproject_sql_job_data_analyst/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iFrankXue%2Fproject_sql_job_data_analyst/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iFrankXue%2Fproject_sql_job_data_analyst/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iFrankXue","download_url":"https://codeload.github.com/iFrankXue/project_sql_job_data_analyst/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243146939,"owners_count":20243742,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["git","github","postgresql","sql","sqlite","sqlserver","vscode"],"created_at":"2024-09-24T18:51:43.384Z","updated_at":"2025-12-16T07:23:02.350Z","avatar_url":"https://github.com/iFrankXue.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Introduction\n\nThis is my first 📊 data analysis project with sql, following the steps of the tutorial video \"[SQL for Data Analytics - Learn SQL in 4 Hours](https://www.youtube.com/watch?v=7mz73uXD9DA)\" uploaded by 👨[Luke Barousse](https://github.com/lukebarousse). 👍 Thanks to this fantastic tutorial video, I steped into the whole process of a project. Meanwhile, I also engaged with SQLite 📊 and PostgreSQL 🐘 with the first time. \n\nThis project mainly focus on the datasets of job market. Keeping eye on data analyst roles, this project explores 💰 top-paying jobs, 🔥 in-demand skills, and where 📈 high demand meets highsalary in data analytics.\n\nFor detail sql files, please visit 🔍 here: [Project sql folder](/project_sql/).\n\n# Background\n\nThis project analyzes job market data📊, focusing on data analyst positions. It aims to identify top-paying jobs, valuable skills, and high-demand qualifications to aid job seekers, employers, and educators.\n\nThe dataset of this project comes from [Stackoverflow](https://stackoverflow.com)'s [2023 Developer Survey](https://survey.stackoverflow.co/2023/). The CSV files📃 are available at my [Google Drive](https://drive.google.com/drive/folders/1XGe4dxWJeZyD6lN8oWD-rC6VJ9Pyb2OT?usp=share_link)\n\n# Tools I Used\nFollowing this tutorial, I enhanced some skills and get some new skills for the first time. For detial, the following tools are engaged in this project:\n\n- **SQL**: The backbone of this project, I used sql as the mian method to fullfill the whole analysis steps. It covers the basic sql knowledges to advanced skills.\n- **PostgreSQL**: As it showed in \"[2023 Developer Survey](https://survey.stackoverflow.co/2023/)\", PostgreSQL is the most popular Database Management Tool. I used PostgreSQL to handle the job posting data.\n- **Visual Studio Code**: This is the most used coding software in recent years, and with the help of the SQLTools extension, VS Code can make the query experience much more easy and handy.\n- **Git \u0026 GitHub**: It is important to make version control during programming procedures. Git and GitHub are essential to meet this requirement. \n\n# The Analysis\n\nEach query for this project aimed to investigating specific aspects of the data analyst job market.\nHere's how I approached each question:\n\n### 1. Top Paying Data Analyst Jobs\n\nThis step identify the highest-paying roles. I filtered data analyst positions by average yeraly salary and location, focusing on remote jobs. This query highlights the high paying opportunities in the field.\n\n```sql\nSELECT\n    job_id,\n    job_title,\n    job_location,\n    job_schedule_type,\n    salary_year_avg,\n    job_posted_date::DATE,\n    name AS company_name\nFROM\n    job_postings_fact\nLEFT JOIN company_dim ON job_postings_fact.company_id = company_dim.company_id\nWHERE\n    job_title_short = 'Data Analyst' AND\n    job_location = 'Anywhere' AND\n    salary_year_avg IS NOT NULL\nORDER BY\n    salary_year_avg DESC\nLIMIT 10\n```\nHere's the breakdown of the top data analyst jobs in 2023:\n- **High Compensation for Leadership Roles**: **Director of Analytics, Associate Director - Data Insights**: Leadership positions command higher salaries, reflecting the value placed on strategic oversight and decision-making in analytics.\n- **Remote and Flexible Work Arrangements**: Many high-paying roles are listed as \"Anywhere\" or offer hybrid/remote options, indicating a growing trend towards flexibility in work locations, especially in the tech and data fields.\n- **Diverse Industry Applications**: Companies like **Meta, AT\u0026T, Pinterest, and SmartAsset** highlight that data analysis is a critical function across various industries, from social media and telecommunications to financial services and marketing.\n\n![Top Paying Roles](assets/1_top_paying_jobs.png)\n*Bar graph visualizing the salary for the top 10 salaries for data analysts; ChartGPT generated this graph from my SQL query results.*\n\n### 2. Top Paying Job Skills\n\nThe purpose of this query is to find what skills are required fo the top-pyaing Data Analyst jobs. In order to find out this result, I used the result of query 1 as a CTE(Common Table Expression), and then `INNER JOIN` with `skills_job_dim` and `skill_dim`. Finnaly, I can find the top paying jobs related to specific skill name.\n\n```sql\nWITH top_paying_jobs AS (\n    SELECT\n        job_id,\n        job_title,\n        salary_year_avg,\n        name AS company_name\n    FROM\n        job_postings_fact\n    LEFT JOIN company_dim ON job_postings_fact.company_id = company_dim.company_id\n    WHERE\n        job_title_short = 'Data Analyst' AND\n        job_location = 'Anywhere' AND\n        salary_year_avg IS NOT NULL\n    ORDER BY\n        salary_year_avg DESC\n    LIMIT 10\n)\n\nSELECT \n    top_paying_jobs.*,\n    skills\nFROM \n    top_paying_jobs\nINNER JOIN skills_job_dim ON top_paying_jobs.job_id = skills_job_dim.job_id\nINNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id\nORDER BY\n    salary_year_avg DESC\n```\n\nHere's the breakdown of the most demanded skills for data analysts in 2023, based on job postings:\n***SQL***: Essential across all top-paying jobs with count of 14.\n***Python***: Highly valued for data analysis and scripting with abold count of 11.\n***Tableau***: Important for data visualization and highly recommended for data analyst with a bold count of 8. Other skills like ***R***, ***Snowflake***, ***Pandas***, and ***Excel*** show varying degrees of demand.\n\n![Top 10 Paying Job Skills](/assets/2_top_paying_jobs_skills.png)\n*Bar graph visualizing the count of skills for the top 10 paying jobs for data analysts; ChartGPT generated this graph from my SQL query results.*\n\n### 3. In-Demand Skills for Data Analysts\n\nThis query helped identify the skills most frequently requested in job postings, directing focus to areas with high demand.\n\n```sql\nSELECT \n    skills,\n    COUNT(skills_job_dim.job_id) AS demand_count\n\nFROM job_postings_fact\nINNER JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id\nINNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id\nWHERE\n    job_title_short = 'Data Analyst' AND\n--    job_location = 'Anywhere'\n    job_work_from_home = TRUE\nGROUP BY\n    skills\nORDER BY\n    demand_count DESC\nlIMIT 5\n```\n\nHere's the breakdown of the most demanded skills for data analysts in 2023\n- **SQL** and **Excel** remain fundamental, emphasizing the need for strong foundational skills in data processing and spreadsheet manipulation.\n- Programming and Visualization Tools like **Python**, **Tableau**, and **Power BI** are essential, pointing towards the increasing importance of technical skills in data storytelling and decision support.\n\n| Skills    | Demand Count |\n|-----------|--------------|\n| SQL       | 7291         |\n| Excel     | 4611         |\n| Python    | 4330         |\n| Tableau   | 3745         |\n| Power BI  | 2609         |\n\n*Table of the demand for the top 5 skills in data analyst job postings.*\n\n### 4. Skills Based on Salary\nExploring the average salaries associated with different skills revealed wich skills are the highest paying.\n\n```sql\nSELECT \n    skills,\n    ROUND(AVG(salary_year_avg), 2) AS avg_salary\nFROM job_postings_fact\nINNER JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id\nINNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id\nWHERE\n    job_title_short = 'Data Analyst'\n    AND salary_year_avg IS NOT NULL\n    AND job_work_from_home = TRUE\nGROUP BY\n    skills\nORDER BY\n    avg_salary DESC\nlIMIT 25\n```\n\nHere's the breakdown of the top paying jobs with different skills for data analysts in 2023.\n- **Big Data and AI Tools Dominate**: High-paying skills include big data technologies (PySpark, Databricks) and AI platforms (Watson, DataRobot), highlighting a demand for expertise in managing large datasets and AI-driven solutions.\n- **DevOps and Cloud Proficiency**: Tools like Bitbucket, GitLab, Jenkins, and Kubernetes are crucial, indicating a need for strong DevOps practices and cloud infrastructure knowledge.\n- **Programming and Data Science Skills**: Python libraries (Pandas, Numpy, Scikit-learn) and languages like Swift and Golang are highly valued, reflecting the importance of coding and data manipulation in the field.\n\n| Skills        | Average Salary ($) |\n|---------------|--------------------|\n| pyspark       | 208,172.25         |\n| bitbucket     | 189,154.50         |\n| couchbase     | 160,515.00         |\n| watson        | 160,515.00         |\n| datarobot     | 155,485.50         |\n| gitlab        | 154,500.00         |\n| swift         | 153,750.00         |\n| jupyter       | 152,776.50         |\n| pandas        | 151,821.33         |\n| elasticsearch | 145,000.00         |\n\n*Table of the top 10 skills based on average salary in data analyst job postings.*\n\n### 5. Most Optimal Skills to Learn\n\nIn this section, I will find out what are the optimal skills to learn for higher demands and top-paying opportunities, and offer a strategic focus for skill development.\n\n```sql\n\nSELECT\n    skills_dim.skill_id,\n    skills_dim.skills,\n    COUNT(skills_job_dim.job_id) AS demand_count,\n    ROUND(AVG(job_postings_fact.salary_year_avg), 2) AS avg_salary\nFROM job_postings_fact\nINNER JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id\nINNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id\nWHERE\n    job_title_short = 'Data Analyst'\n    AND job_work_from_home = TRUE\n    AND salarY_year_avg IS NOT NULL\nGROUP BY\n    skills_dim.skill_id\nHAVING\n    COUNT(skills_job_dim.job_id) \u003e 10\nORDER BY\n    avg_salary DESC,\n    demand_count DESC\nLIMIT 25;\n```\n\nHere are some quick insights into the trends from the list of optimal skills for data analysts based on demand and average salary:\n- **Cloud Platforms and Data Warehousing**: **Snowflake, Azure, AWS, BigQuery, Redshift**: High demand for cloud-based data warehousing and cloud platforms indicates a trend towards cloud computing and data storage solutions.\n- **Data Visualization and BI Tools**: **Tableau, Looker, Qlik**: The prevalence of these tools reflects the importance of data visualization and business intelligence in translating data insights into actionable business strategies.\n- **Core Programming and Data Technologies**: **Python, R, SQL Server, Java**: Core programming languages and data management technologies are in high demand, emphasizing the need for foundational skills in data manipulation, analysis, and database management.\n\n| Skill ID | Skills     | Demand Count | Average Salary ($) |\n|----------|------------|--------------|---------------------|\n| 8        | go         | 27           | 115,319.89          |\n| 234      | confluence | 11           | 114,209.91          |\n| 97       | hadoop     | 22           | 113,192.57          |\n| 80       | snowflake  | 37           | 112,947.97          |\n| 74       | azure      | 34           | 111,225.10          |\n| 77       | bigquery   | 13           | 109,653.85          |\n| 76       | aws        | 32           | 108,317.30          |\n| 4        | java       | 17           | 106,906.44          |\n| 194      | ssis       | 12           | 106,683.33          |\n| 233      | jira       | 20           | 104,917.90          |\n\n*Table of the top 10 optimal skills for data analyst based on demand and average salary in data analyst job postings.*\n\n# What I Learned\n\nDuring the over 20 hours learning journey on job market data analysis, I have significantly enhanced my skills in dataset thinking and analysis and enpowered my skills in sql, VSCode,GitHub, etc.\n\n✅ Firstly,I learned how to approach complex datasets methodically, undersatanding the importance of data quality, cleaning, and prepocessing.I gained insights into various analytical techniques and tools to derive meaningful insights from data.\n\n✅ Secondly, I developed a comprehensive understanding of data collection processes. This experience has equipped my with the ability to manage data projects from inception to conclusion, ensuring data-driven decision-making and actionable outcomes.\n\n✅ Thirdly, I acquired a range of technical skills, including SQL, PostgreSQL, VSCode, and GitHub, which have equipped me to tackle more challenging projects in the future.\n\n# Conclusions\n\n### Insights\n\nFrom this data ananlysis process, several general insights emerged:\n\n- ***Top-Paying Data Analyst Jobs***: The top-paying jobs for data analysts that allow remote work offer a wide range of salaries, the highest at $650,000.00！\n- ***Skills for Top-Paying Jobs***: According to the query result of this step, high-paying jobs in Data Analyst field require advanced proficiency in SQL, suggesting it's a critical skill for earning a top salary.\n- ***Most In-Demand Skills***: SQL is also the most demanded skill in data analyst job market, thus making it essential for job seekers.\n- ***Skills with Higher Salaries***: Specialized skills, such as PySpark and Databricks in data technologies, Watson and DataRobot in AI platforms, are assocated with the highest average salaries, indicating a premium on niche expetise.\n- ***Optimal Skills for Job Market Value***: SQL leads the demand and offers for a high average salary, positioning it as one of the most optimal skills for data analysts to learn to maximize their maket value.\n\n### Closing Thoughts\n\nTo be honest, I have learned a lot during this project step by step and be more confident to tackle problems with more challenging.\n\nThanks Luke again, for his dedicated job of offering numerous hours of tutorial videos for free.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fifrankxue%2Fproject_sql_job_data_analyst","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fifrankxue%2Fproject_sql_job_data_analyst","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fifrankxue%2Fproject_sql_job_data_analyst/lists"}