{"id":27646176,"url":"https://github.com/fbarffmann/project2","last_synced_at":"2026-05-05T15:34:42.345Z","repository":{"id":287749712,"uuid":"832901994","full_name":"fbarffmann/Project2","owner":"fbarffmann","description":"Built an ETL pipeline to clean and load crowdfunding campaign data into a PostgreSQL database using Python and SQL.","archived":false,"fork":false,"pushed_at":"2025-04-13T17:23:07.000Z","size":296,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-13T18:34:21.830Z","etag":null,"topics":["crowdfunding","data-cleaning","data-engineering","database-design","erd","etl","pandas","postgresql","python","sql"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fbarffmann.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-07-24T00:54:53.000Z","updated_at":"2025-04-13T17:23:10.000Z","dependencies_parsed_at":"2025-04-13T18:46:09.083Z","dependency_job_id":null,"html_url":"https://github.com/fbarffmann/Project2","commit_stats":null,"previous_names":["fbarffmann/project2"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fbarffmann%2FProject2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fbarffmann%2FProject2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fbarffmann%2FProject2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fbarffmann%2FProject2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fbarffmann","download_url":"https://codeload.github.com/fbarffmann/Project2/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250540924,"owners_count":21447428,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crowdfunding","data-cleaning","data-engineering","database-design","erd","etl","pandas","postgresql","python","sql"],"created_at":"2025-04-24T01:17:20.279Z","updated_at":"2026-05-05T15:34:42.315Z","avatar_url":"https://github.com/fbarffmann.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Crowdfunding_ETL\n\n## Team Members:\n- Finn Arffmann\n- Parisha Gupta\n- Sanem Gingery\n- John McNamara\n- Jackson Whited\n\n## Class:\nNorthwestern Data Bootcamp\n\n## Date:\n7/23/2024\n\n## Project Files:\n- `ETL_Mini_Project_FArffmann_PGupta_SGingery_JMcNamara_JWhited.ipynb`\n- `QuickDBD-ETL_Mini_Project_ERD.png`\n- `crowdfunding_db_schema.sql`\n- `campaign.csv`\n- `category.csv`\n- `contacts.csv`\n- `subcategory.csv`\n\n## Project Overview:\nThis ETL (Extract, Transform, Load) Mini Project is a collaborative effort aimed at creating a database from a set of Excel files related to crowdfunding campaigns. The project is divided into four main tasks: creating Category and Subcategory DataFrames, creating the Campaign DataFrame, creating the Contacts DataFrame, and setting up the Crowdfunding Database. Below are the detailed instructions and steps we followed to accomplish each task.\n\n## Tasks and Steps:\n\n### 1. Create the Category and Subcategory DataFrames:\n\n#### a) Category DataFrame\nWe began by extracting and transforming data from the `crowdfunding.xlsx` file to create a Category DataFrame. This DataFrame includes:\n- `category_id`: Sequentially generated entries from \"cat1\" to \"cat*n*\", where *n* represents the number of unique categories.\n- `category`: Titles of the categories.\n\nThe resulting DataFrame was exported as `category.csv` and saved to our GitHub repository.\n\n#### b) Subcategory DataFrame\nSimilarly, we extracted and transformed the data to create a Subcategory DataFrame, which includes:\n- `subcategory_id`: Sequentially generated entries from \"subcat1\" to \"subcat*n*\", where *n* represents the number of unique subcategories.\n- `subcategory`: Titles of the subcategories.\n\nThis DataFrame was exported as `subcategory.csv` and saved to our GitHub repository.\n\n### 2. Create the Campaign DataFrame:\nWe extracted and transformed the data from `crowdfunding.xlsx` to create a Campaign DataFrame with the following columns:\n- `cf_id`\n- `contact_id`\n- `company_name`\n- `description` (renamed from \"blurb\")\n- `goal` (converted to float)\n- `pledged` (converted to float)\n- `outcome`\n- `backers_count`\n- `country`\n- `currency`\n- `launch_date` (renamed from \"launched_at\" and converted to datetime)\n- `end_date` (renamed from \"deadline\" and converted to datetime)\n- `category_id` (matched with the Category DataFrame)\n- `subcategory_id` (matched with the Subcategory DataFrame)\n\nThis DataFrame was exported as `campaign.csv` and saved to our GitHub repository.\n\n### 3. Create the Contacts DataFrame:\nWe imported data from the `contacts.xlsx` file and extracted the `contact_id`, `name`, and `email` columns using regular expressions. The `contact_id` column was converted to integer type, and the `name` column was split into `first_name` and `last_name`. The cleaned DataFrame was exported as `contacts.csv` and saved to our GitHub repository.\n\n### 4. Create the Crowdfunding Database:\n\n#### a) ERD and Table Schema:\nWe inspected the four CSV files and created an Entity-Relationship Diagram (ERD) using QuickDBD. Using the ERD, we defined the table schema, including data types, primary keys, foreign keys, and other constraints. The schema was saved as `crowdfunding_db_schema.sql`. The ERD was saved as `QuickDBD-ETL_Mini_Project_ERD.png`. Both were uploaded to our GitHub repository.\n\n#### b) Database Creation:\nWe created a new Postgres database named `crowdfunding_db` and used the schema to create the tables in the correct order to manage foreign keys. Each table was verified with a `SELECT` statement to ensure successful creation.\n\n#### c) Data Import:\nFinally, we imported each CSV file into its corresponding SQL table and verified the data with `SELECT` statements to ensure the correctness of the data.\n\n### Conclusion:\nThis ETL Mini Project demonstrates our ability to extract, transform, and load data into a structured database, following best practices for data handling and database management. The project files, including the transformed data and database schema, are available in this GitHub repository for further reference and use.\n\n\n### Sources:\n- [MongoDB Documentation](https://docs.mongodb.com/)\n- [Pandas Documentation](https://pandas.pydata.org/pandas-docs/stable/)\n- [QuickDBD](https://app.quickdatabasediagrams.com/#/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffbarffmann%2Fproject2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffbarffmann%2Fproject2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffbarffmann%2Fproject2/lists"}