An open API service indexing awesome lists of open source software.

https://github.com/dmarks84/coursework_project_data-analysis-apache-spark

Project for IBM Data Engineering & Python course on ETL & Big Data -- Read in data, wrote to SQL database and performed queries, performed statistical analysis and issued reports
https://github.com/dmarks84/coursework_project_data-analysis-apache-spark

apache-sprk automation dag data-modeling eda elt etl numpy pandas pipelines python sql statistics visualization

Last synced: 2 months ago
JSON representation

Project for IBM Data Engineering & Python course on ETL & Big Data -- Read in data, wrote to SQL database and performed queries, performed statistical analysis and issued reports

Awesome Lists containing this project

README

          

## Project(Project_Data-Analysis-Apache-Spark)
### Part of the Coursera series: IBM Data Engineering & Python

## Summary
My Jupyter notebook did not save correctly, unfortunately. My work has been deleted, but the outline of the project belies the steps I understook. Primarily, we completed these tasks:
Task 1: Generate DataFrame from CSV data.
Task 2: Define a schema for the data.
Task 3: Display schema of DataFrame.
Task 4: Create a temporary view.
Task 5: Execute an SQL query.
Task 6: Calculate Average Salary by Department.
Task 7: Filter and Display IT Department Employees.
Task 8: Add 10% Bonus to Salaries.
Task 9: Find Maximum Salary by Age.
Task 10: Self-Join on Employee Data.
Task 11: Calculate Average Employee Age.
Task 12: Calculate Total Salary by Department.
Task 13: Sort Data by Age and Salary.
Task 14: Count Employees in Each Department.
Task 15: Filter Employees with the letter o in the Name.

## Skills (Developed & Applied)
Programming, Python, RDBMS & SQL, Databases, Statistics, Numpy, Pandas, Dataframes, ETL &| ELT & Data Pipelines, DAGs, Apache Spark, Automation, Data Modeling, EDA, Data Visualization, Data Summarization