https://github.com/dmarks84/coursework_project_data-analysis-apache-spark

Project for IBM Data Engineering & Python course on ETL & Big Data -- Read in data, wrote to SQL database and performed queries, performed statistical analysis and issued reports
https://github.com/dmarks84/coursework_project_data-analysis-apache-spark

apache-sprk automation dag data-modeling eda elt etl numpy pandas pipelines python sql statistics visualization

Last synced: 3 months ago
JSON representation

Project for IBM Data Engineering & Python course on ETL & Big Data -- Read in data, wrote to SQL database and performed queries, performed statistical analysis and issued reports

Host: GitHub
URL: https://github.com/dmarks84/coursework_project_data-analysis-apache-spark
Owner: dmarks84
License: bsd-3-clause
Created: 2024-01-17T17:08:24.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-01-17T23:06:23.000Z (over 2 years ago)
Last Synced: 2025-02-15T10:32:08.234Z (over 1 year ago)
Topics: apache-sprk, automation, dag, data-modeling, eda, elt, etl, numpy, pandas, pipelines, python, sql, statistics, visualization
Language: Jupyter Notebook
Homepage:
Size: 6.84 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## Project(Project_Data-Analysis-Apache-Spark)
### Part of the Coursera series: IBM Data Engineering & Python

## Summary
My Jupyter notebook did not save correctly, unfortunately. My work has been deleted, but the outline of the project belies the steps I understook. Primarily, we completed these tasks:
Task 1: Generate DataFrame from CSV data.
Task 2: Define a schema for the data.
Task 3: Display schema of DataFrame.
Task 4: Create a temporary view.
Task 5: Execute an SQL query.
Task 6: Calculate Average Salary by Department.
Task 7: Filter and Display IT Department Employees.
Task 8: Add 10% Bonus to Salaries.
Task 9: Find Maximum Salary by Age.
Task 10: Self-Join on Employee Data.
Task 11: Calculate Average Employee Age.
Task 12: Calculate Total Salary by Department.
Task 13: Sort Data by Age and Salary.
Task 14: Count Employees in Each Department.
Task 15: Filter Employees with the letter o in the Name.

## Skills (Developed & Applied)
Programming, Python, RDBMS & SQL, Databases, Statistics, Numpy, Pandas, Dataframes, ETL &| ELT & Data Pipelines, DAGs, Apache Spark, Automation, Data Modeling, EDA, Data Visualization, Data Summarization

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dmarks84/coursework_project_data-analysis-apache-spark

Awesome Lists containing this project

README