Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alieymsxxn/sql_project_data_job_analysis
This project explores top-paying jobs, in-demand skills, and where high demand meets high salary in data analytics.
https://github.com/alieymsxxn/sql_project_data_job_analysis
data-analysis postgresql sql sqlite
Last synced: about 1 month ago
JSON representation
This project explores top-paying jobs, in-demand skills, and where high demand meets high salary in data analytics.
- Host: GitHub
- URL: https://github.com/alieymsxxn/sql_project_data_job_analysis
- Owner: alieymsxxn
- Created: 2024-09-28T08:09:20.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-11-12T15:16:27.000Z (about 2 months ago)
- Last Synced: 2024-11-12T15:39:01.388Z (about 2 months ago)
- Topics: data-analysis, postgresql, sql, sqlite
- Homepage:
- Size: 206 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Introduction
π Explore the data job market! This project focuses on data analyst roles, uncovering π° the highest-paying positions, π₯ most sought-after skills, and π the intersection of high demand and top salaries in data analytics.π SQL queries? Check them out here: [queries folder](/queries/)
π³ Docker compose files? Check them out here: [docker folder](/docker/)
# Background
Driven by a quest to navigate the data analyst job market more effectively, this project was born from a desire to pinpoint top-paid and in-demand skills, streamlining others work to find optimal jobs.It's packed with insights on job titles, salaries, locations, and essential skills.
# Setting Up PostgreSQL and pgAdmin locally using Docker Compose
To explore this project's database locally, I've provided Docker configurations for easy setup of PostgreSQL and pgAdmin. Here's how to get started:
1. **Set Up Environment Variables**
- Navigate to the `/docker` directory
- Copy `.env_sample` to create your own `.env` file:
```bash
cp .env_sample .env
```
- Modify the `.env` file with your preferred credentials:
```
POSTGRES_USER=your_postgres_user
POSTGRES_PASSWORD=your_postgres_password
POSTGRES_DB=your_database_name
[email protected]
PGADMIN_DEFAULT_PASSWORD=your_pgadmin_password
LOCAL_CSV_DIR=/path/to/your/csv/files
```2. **Start the Services**
- From the `/docker` directory, run:
```bash
docker-compose up -d
```
This will start both PostgreSQL and pgAdmin containers in detached mode.3. **Access pgAdmin**
- Open your browser and navigate to `http://localhost:5050`
- Log in using the email and password you set in the `.env` file
- Connect to the PostgreSQL server using the credentials from your `.env` file
- Host: `postgres`
- Port: `5432`
- Username: Your `POSTGRES_USER`
- Password: Your `POSTGRES_PASSWORD`Now you're ready to run the analysis queries and explore the data job market insights!
# The questions I wanted to answer through my SQL queries were:
1. What are the top-paying data analyst jobs?
2. What skills are required for these top-paying jobs?
3. What skills are most in demand for data analysts?
4. Which skills are associated with higher salaries?
5. What are the most optimal skills to learn?# Tools I Used
For my deep dive into the data analyst job market, I harnessed the power of several key tools:- **SQL:** The backbone of my analysis, allowing me to query the database and unearth critical insights.
- **PostgreSQL:** The chosen database management system, ideal for handling the job posting data.
- **Visual Studio Code:** My go-to for database management and executing SQL queries.
- **Git & GitHub:** Essential for version control and sharing my SQL scripts and analysis, ensuring collaboration and project tracking.# The Analysis
Each query for this project aimed at investigating specific aspects of the data analyst job market. Hereβs how I approached each question:### 1. Top Paying Data Analyst Jobs
To identify the highest-paying roles, I filtered data analyst positions by average yearly salary and location, focusing on remote jobs. This query highlights the high paying opportunities in the field.```sql
SELECT
jpf.job_id id,
jpf.job_title title,
jpf.job_title_short short_title,
jpf.job_location location,
cd.name company_name,
jpf.salary_year_avg avg_yearly_salary,
jpf.job_schedule_type schedule_type,
jpf.job_posted_date::DATE post_date
FROM job_postings_fact jpf
INNER JOIN company_dim cd ON jpf.company_id = cd.company_id
WHERE
jpf.job_location = 'Anywhere'
AND jpf.job_title_short LIKE '%Data%Analyst%'
AND jpf.salary_year_avg IS NOT NULL
ORDER BY jpf.salary_year_avg DESC
LIMIT 10;
```
Here's the breakdown of the top data analyst jobs in 2023:
- **Wide Salary Range:** Top 10 paying data analyst roles span from $184,000 to $650,000, indicating significant salary potential in the field.
- **Diverse Employers:** Companies like SmartAsset, Meta, and AT&T are among those offering high salaries, showing a broad interest across different industries.
- **Job Title Variety:** There's a high diversity in job titles, from Data Analyst to Director of Analytics, reflecting varied roles and specializations within data analytics.![Top Paying Roles](assets/1_top_paying_roles.png)
*Bar graph visualizing the salary for the top 10 salaries for data analysts; ChatGPT generated this graph from my SQL query results*### 2. Skills for Top Paying Jobs
To understand what skills are required for the top-paying jobs, I joined the job postings with the skills data, providing insights into what employers value for high-compensation roles.
```sql
WITH top_10_jobs AS (
SELECT
jpf.job_id,
jpf.job_title_short job_title,
cd.name company,
jpf.job_location location,
jpf.job_posted_date::DATE post_date,
jpf.salary_year_avg salary_avg
FROM job_postings_fact jpf
INNER JOIN company_dim cd ON jpf.company_id = cd.company_id
WHERE
jpf.job_title_short LIKE '%Data%Analyst%'
AND jpf.salary_year_avg IS NOT NULL
AND jpf.job_location = 'Anywhere'
ORDER BY jpf.salary_year_avg DESC
LIMIT 10
)SELECT
sd.skills,
t10j.*
FROM top_10_jobs t10j
INNER JOIN skills_job_dim sjd ON t10j.job_id = sjd.job_id
INNER JOIN skills_dim sd ON sjd.skill_id = sd.skill_id
ORDER BY t10j.salary_avg DESC;
```
Here's the breakdown of the most demanded skills for the top 10 highest paying data analyst jobs in 2023:
- **SQL** is leading with a bold count of 8.
- **Python** follows closely with a bold count of 7.
- **Tableau** is also highly sought after, with a bold count of 6.
Other skills like **R**, **Snowflake**, **Pandas**, and **Excel** show varying degrees of demand.![Top Paying Skills](assets/2_top_paying_roles_skills.png)
*Bar graph visualizing the count of skills for the top 10 paying jobs for data analysts; ChatGPT generated this graph from my SQL query results*### 3. In-Demand Skills for Data Analysts
This query helped identify the skills most frequently requested in job postings, directing focus to areas with high demand.
```sql
SELECT
sd.skills,
COUNT(jpwc.id) jobs
FROM (
SELECT
jpf.job_id id,
jpf.job_title_short title,
jpf.salary_year_avg avg_salary
FROM job_postings_fact jpf
WHERE job_title_short = 'Data Analyst'
) jpwc
INNER JOIN skills_job_dim sjd ON jpwc.id = sjd.job_id
INNER JOIN skills_dim sd ON sjd.skill_id = sd.skill_id
GROUP BY sd.skills
ORDER BY jobs DESC
LIMIT 5;
```
Here's the breakdown of the most demanded skills for data analysts in 2023
- **SQL** and **Excel** remain fundamental, emphasizing the need for strong foundational skills in data processing and spreadsheet manipulation.
- **Programming** and **Visualization Tools** like **Python**, **Tableau**, and **Power BI** are essential, pointing towards the increasing importance of technical skills in data storytelling and decision support.| Skills | Demand Count |
|----------|--------------|
| SQL | 92628 |
| Excel | 67031 |
| Python | 57326 |
| Tableau | 46554 |
| Power BI | 39468 |*Table of the demand for the top 5 skills in data analyst job postings*
### 4. Skills Based on Salary
Exploring the average salaries associated with different skills revealed which skills are the highest paying.
```sql
SELECT
sd.skills skill,
ROUND(AVG(jpf.salary_year_avg), 2) avg_salary
FROM job_postings_fact jpf
INNER JOIN skills_job_dim sjd ON jpf.job_id = sjd.job_id
INNER JOIN skills_dim sd ON sjd.skill_id = sd.skill_id
WHERE
jpf.job_title_short = 'Data Analyst'
AND jpf.salary_year_avg IS NOT NULL
AND jpf.job_work_from_home = True
GROUP BY sd.skills
ORDER BY AVG(jpf.salary_year_avg) DESC
LIMIT 10;
```
Here's a breakdown of the results for top paying skills for Data Analysts:
- **High Demand for Big Data & ML Skills:** Top salaries are commanded by analysts skilled in big data technologies (PySpark, Couchbase), machine learning tools (DataRobot, Jupyter), and Python libraries (Pandas, NumPy), reflecting the industry's high valuation of data processing and predictive modeling capabilities.
- **Software Development & Deployment Proficiency:** Knowledge in development and deployment tools (GitLab, Kubernetes, Airflow) indicates a lucrative crossover between data analysis and engineering, with a premium on skills that facilitate automation and efficient data pipeline management.
- **Cloud Computing Expertise:** Familiarity with cloud and data engineering tools (Elasticsearch, Databricks, GCP) underscores the growing importance of cloud-based analytics environments, suggesting that cloud proficiency significantly boosts earning potential in data analytics.| Skills | Average Salary ($) |
|---------------|-------------------:|
| pyspark | 208,172 |
| bitbucket | 189,155 |
| couchbase | 160,515 |
| watson | 160,515 |
| datarobot | 155,486 |
| gitlab | 154,500 |
| swift | 153,750 |
| jupyter | 152,777 |
| pandas | 151,821 |
| elasticsearch | 145,000 |*Table of the average salary for the top 10 paying skills for data analysts*
### 5. Most Optimal Skills to Learn
Combining insights from demand and salary data, this query aimed to pinpoint skills that are both in high demand and have high salaries, offering a strategic focus for skill development.
```sql
SELECT
skills_dim.skill_id,
skills_dim.skills,
COUNT(skills_job_dim.job_id) AS demand_count,
ROUND(AVG(job_postings_fact.salary_year_avg), 0) AS avg_salary
FROM job_postings_fact
INNER JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id
INNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id
WHERE
job_title_short = 'Data Analyst'
AND salary_year_avg IS NOT NULL
AND job_work_from_home = True
GROUP BY
skills_dim.skill_id
HAVING
COUNT(skills_job_dim.job_id) > 10
ORDER BY
avg_salary DESC,
demand_count DESC
LIMIT 25;
```| Skill ID | Skills | Demand Count | Average Salary ($) |
|----------|------------|--------------|-------------------:|
| 8 | go | 27 | 115,320 |
| 234 | confluence | 11 | 114,210 |
| 97 | hadoop | 22 | 113,193 |
| 80 | snowflake | 37 | 112,948 |
| 74 | azure | 34 | 111,225 |
| 77 | bigquery | 13 | 109,654 |
| 76 | aws | 32 | 108,317 |
| 4 | java | 17 | 106,906 |
| 194 | ssis | 12 | 106,683 |
| 233 | jira | 20 | 104,918 |*Table of the most optimal skills for data analyst sorted by salary*
Here's a breakdown of the most optimal skills for Data Analysts in 2023:
- **High-Demand Programming Languages:** Python and R stand out for their high demand, with demand counts of 236 and 148 respectively. Despite their high demand, their average salaries are around $101,397 for Python and $100,499 for R, indicating that proficiency in these languages is highly valued but also widely available.
- **Cloud Tools and Technologies:** Skills in specialized technologies such as Snowflake, Azure, AWS, and BigQuery show significant demand with relatively high average salaries, pointing towards the growing importance of cloud platforms and big data technologies in data analysis.
- **Business Intelligence and Visualization Tools:** Tableau and Looker, with demand counts of 230 and 49 respectively, and average salaries around $99,288 and $103,795, highlight the critical role of data visualization and business intelligence in deriving actionable insights from data.
- **Database Technologies:** The demand for skills in traditional and NoSQL databases (Oracle, SQL Server, NoSQL) with average salaries ranging from $97,786 to $104,534, reflects the enduring need for data storage, retrieval, and management expertise.# What I Learned
Throughout this adventure, I've turbocharged my SQL toolkit with some serious firepower:
- **𧩠Complex Query Crafting:** Mastered the art of advanced SQL, merging tables like a pro and wielding WITH clauses for ninja-level temp table maneuvers.
- **π Data Aggregation:** Got cozy with GROUP BY and turned aggregate functions like COUNT() and AVG() into my data-summarizing sidekicks.
- **π‘ Analytical Wizardry:** Leveled up my real-world puzzle-solving skills, turning questions into actionable, insightful SQL queries.# Conclusions
### Insights
From the analysis, several general insights emerged:1. **Top-Paying Data Analyst Jobs**: The highest-paying jobs for data analysts that allow remote work offer a wide range of salaries, the highest at $650,000!
2. **Skills for Top-Paying Jobs**: High-paying data analyst jobs require advanced proficiency in SQL, suggesting itβs a critical skill for earning a top salary.
3. **Most In-Demand Skills**: SQL is also the most demanded skill in the data analyst job market, thus making it essential for job seekers.
4. **Skills with Higher Salaries**: Specialized skills, such as SVN and Solidity, are associated with the highest average salaries, indicating a premium on niche expertise.
5. **Optimal Skills for Job Market Value**: SQL leads in demand and offers for a high average salary, positioning it as one of the most optimal skills for data analysts to learn to maximize their market value.### Closing Thoughts
This project enhanced my SQL skills and provided valuable insights into the data analyst job market. The findings from the analysis serve as a guide to prioritizing skill development and job search efforts. Aspiring data analysts can better position themselves in a competitive job market by focusing on high-demand, high-salary skills. This exploration highlights the importance of continuous learning and adaptation to emerging trends in the field of data analytics.