https://github.com/dmarks84/coursework_project_data-analysis-apache-spark
Project for IBM Data Engineering & Python course on ETL & Big Data -- Read in data, wrote to SQL database and performed queries, performed statistical analysis and issued reports
https://github.com/dmarks84/coursework_project_data-analysis-apache-spark
apache-sprk automation dag data-modeling eda elt etl numpy pandas pipelines python sql statistics visualization
Last synced: 2 months ago
JSON representation
Project for IBM Data Engineering & Python course on ETL & Big Data -- Read in data, wrote to SQL database and performed queries, performed statistical analysis and issued reports
- Host: GitHub
- URL: https://github.com/dmarks84/coursework_project_data-analysis-apache-spark
- Owner: dmarks84
- License: bsd-3-clause
- Created: 2024-01-17T17:08:24.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-17T23:06:23.000Z (over 2 years ago)
- Last Synced: 2025-02-15T10:32:08.234Z (over 1 year ago)
- Topics: apache-sprk, automation, dag, data-modeling, eda, elt, etl, numpy, pandas, pipelines, python, sql, statistics, visualization
- Language: Jupyter Notebook
- Homepage:
- Size: 6.84 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Project(Project_Data-Analysis-Apache-Spark)
### Part of the Coursera series: IBM Data Engineering & Python
## Summary
My Jupyter notebook did not save correctly, unfortunately. My work has been deleted, but the outline of the project belies the steps I understook. Primarily, we completed these tasks:
Task 1: Generate DataFrame from CSV data.
Task 2: Define a schema for the data.
Task 3: Display schema of DataFrame.
Task 4: Create a temporary view.
Task 5: Execute an SQL query.
Task 6: Calculate Average Salary by Department.
Task 7: Filter and Display IT Department Employees.
Task 8: Add 10% Bonus to Salaries.
Task 9: Find Maximum Salary by Age.
Task 10: Self-Join on Employee Data.
Task 11: Calculate Average Employee Age.
Task 12: Calculate Total Salary by Department.
Task 13: Sort Data by Age and Salary.
Task 14: Count Employees in Each Department.
Task 15: Filter Employees with the letter o in the Name.
## Skills (Developed & Applied)
Programming, Python, RDBMS & SQL, Databases, Statistics, Numpy, Pandas, Dataframes, ETL &| ELT & Data Pipelines, DAGs, Apache Spark, Automation, Data Modeling, EDA, Data Visualization, Data Summarization