https://github.com/rizkipragustono/data_analysis_spark
Exploration: Data Analysis using Spark
https://github.com/rizkipragustono/data_analysis_spark
apache-spark data-analysis pyspark python spark-sql sql
Last synced: about 1 month ago
JSON representation
Exploration: Data Analysis using Spark
- Host: GitHub
- URL: https://github.com/rizkipragustono/data_analysis_spark
- Owner: rizkipragustono
- Created: 2025-05-21T07:20:34.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-21T07:26:33.000Z (about 1 year ago)
- Last Synced: 2025-09-10T10:26:53.204Z (10 months ago)
- Topics: apache-spark, data-analysis, pyspark, python, spark-sql, sql
- Language: Jupyter Notebook
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Analysis using Spark
## Scenario
You have been tasked by the HR department of a company to create a data pipeline that can take in employee data in a CSV format. Your responsibilities include analyzing the data, applying any required transformations, and facilitating the extraction of valuable insights from the processed data.
Given your role as a data engineer, you've been requested to leverage Apache Spark components to accomplish the tasks.
## Project Overview
Create a DataFrame by loading data from a CSV file and apply transformations and actions using Spark SQL. This needs to be achieved by performing the following tasks:
- Task 1: Generate DataFrame from CSV data.
- Task 2: Define a schema for the data.
- Task 3: Display schema of DataFrame.
- Task 4: Create a temporary view.
- Task 5: Execute an SQL query.
- Task 6: Calculate Average Salary by Department.
- Task 7: Filter and Display IT Department Employees.
- Task 8: Add 10% Bonus to Salaries.
- Task 9: Find Maximum Salary by Age.
- Task 10: Self-Join on Employee Data.
- Task 11: Calculate Average Employee Age.
- Task 12: Calculate Total Salary by Department.
- Task 13: Sort Data by Age and Salary.
- Task 14: Count Employees in Each Department.
- Task 15: Filter Employees with the letter o in the Name.