Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/andrewjmack/crowdfunding_etl
DU Data Analytics Project 2 (May 2024)
https://github.com/andrewjmack/crowdfunding_etl
Last synced: about 8 hours ago
JSON representation
DU Data Analytics Project 2 (May 2024)
- Host: GitHub
- URL: https://github.com/andrewjmack/crowdfunding_etl
- Owner: andrewjmack
- Created: 2024-05-09T03:44:01.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-05-29T01:46:02.000Z (6 months ago)
- Last Synced: 2024-05-29T15:37:13.726Z (6 months ago)
- Language: Jupyter Notebook
- Size: 3.95 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Crowdfunding_ETL
#### Peter Lovberg & Andrew Mack
#### DU Data Analytics Project 2 | May 2024## Purpose
A collaborative extract-transform-load ("ETL") mini project, which involved building an ETL pipeline with Python, Pandas and PostgresSQL. Through data transformation of Excel data in a Jupyter Notebook, four CSV files were created, which formed the basis for an Entity Relationship Diagram ("ERD") and table schema. Finally, the CSV file data was loaded into a Postgres SQL database (see screenshots of database tables).## Summary of Instructions
- Create the Category and Subcategory DataFrames
- Create the Campaign DataFrame
- Create the Contacts DataFrame
- Create the Crowdfunding Database## Contents
- README.md
- ETL_Mini_Project_PLovberg_AMack.ipynb
- Database
- ERD
- crowdfunding_db_schema.sql
- campaign screenshot
- category screenshot
- contacts screenshot
- subcategory screenshot
- Resources
- campaign.csv
- category.csv
- contacts.csv
- subcategory.csv
- contacts.xlsx
- crowdfunding.xlsx## Resources
- Class instruction and activities
- Starter content: initial Jupyter Notebook and .xlsx data
- Bootcampspot instructions and hints
- Reference for merging three dataframes:
https://stackoverflow.com/questions/23668427/pandas-three-way-joining-multiple-dataframes-on-columns## Detailed Project Requirements
- A Category DataFrame is Created
- The DataFrame contains a "category_id" column that has entries going sequentially from "cat1" to "catn", where n is the number of unique categories
- The DataFrame has a "category" column that contains only the category titles
- The category DataFrame is exported as category.csv
- A Subcategory DataFrame is Created
- The DataFrame contains a "subcategory_id" column that has entries going sequentially from "subcat1" to "subcatn", where n is the number of unique subcategories
- The DataFrame contains a "subcategory" column that contains only the subcategory titles
- The subcategory DataFrame is exported as subcategory.csv
- A Campaign DataFrame is Created
- The DataFrame has the following columns:
- A "cf_id" column
- A "contact_id" column
- A "company_name" column
- A "description" column
- A "goal" column that is a float data type
- A "pledged" column that is a float data type
- An "outcome" column
- A "backers_count" column
- A "country" column
- A "currency" column
- A "launch_date" with the time formatted as "YYYY-MM-DD"
- An "end_date" with the time formatted as "YYYY-MM-DD"
- A "category_id" column that contains the unique identification numbers matching those in the "category_id" column of the category DataFrame
- A "subcategory_id" column that contains the unique identification numbers matching those in the "subcategory_id" column of the subcategory DataFrame
- The campaign DataFrame is exported as campaign.csv
- A Contacts DataFrame is Created
- The DataFrame has the following columns:
- A "contact_id" column
- A "first_name" column
- A "last_name" column
- An "email" column
- The contacts DataFrame is exported as contacts.csv
- A Crowdfunding Database is Created
- A database schema labeled, crowdfunding_db_schema.sql is created
- A crowdfunding_db is created using the crowdfunding_db_schema.sql file
- The database has the appropriate primary and foreign keys and relationships
- Each CSV file is imported into the appropriate table without errors
- The data from each table is displayed using a SELECT statement