Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/leftcoastnerdgirl/extract_transform_load
This mini project introduces data cleaning through ETL
https://github.com/leftcoastnerdgirl/extract_transform_load
data-cleaning etl extract-transform-load json merge-sort numpy pandas-dataframe pandas-python
Last synced: 10 days ago
JSON representation
This mini project introduces data cleaning through ETL
- Host: GitHub
- URL: https://github.com/leftcoastnerdgirl/extract_transform_load
- Owner: LeftCoastNerdGirl
- Created: 2024-01-20T21:47:31.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-07-27T20:53:14.000Z (4 months ago)
- Last Synced: 2024-07-28T20:00:43.992Z (4 months ago)
- Topics: data-cleaning, etl, extract-transform-load, json, merge-sort, numpy, pandas-dataframe, pandas-python
- Language: Jupyter Notebook
- Homepage: https://extension.berkeley.edu/search/publicCourseSearchDetails.do?method=load&courseId=35106003
- Size: 620 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Using an Extract, Transform, Load (ETL) process to clean up crowdfunding campaign data
This mini project required a fair amount of data cleaning and prep. This was Group Project #2, though some 'groups' were individuals. I completed this project as a 'group' of one.
First I worked on the crowdfunding spreadsheet
-Imported as a pandas dataframe.
-Split 'category & sub-category' column into two separate columns.
-Created variables to store lists of the category and subcategory names.
-Created category and subcategory ids and paired them with the variables in the last step.
-Used those pairs to create category and subcategory csv files and exported.Campaign data
-Made a new dataframe using the crowdfunding info.
-Renamed columns, changed data types, formatted data as needed.
-Merged with category and subcategory files.
-Removed unwanted columns to simplify.
-Created csv file and exported.Contacts list
-Used pandas to import contact info.
-Contact id, name, and email were all in a single cell so I split them into separate columns.
-Split the name into two columns with first name and last name.
-Reordered columns.
-Created csv file and exported.