Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/k-bloch/data-generator

This is a python script that generates fake data in CSV format for data analysis projects.
https://github.com/k-bloch/data-generator

Last synced: about 1 month ago
JSON representation

This is a python script that generates fake data in CSV format for data analysis projects.

Awesome Lists containing this project

README

        

# data-generator
This Python script generates fake donation data in CSV format, simulating contributions to a charity or non-profit organization. It's designed for data analysis projects that need structured datasets to explore metrics like donor acquisition, retention, and donation patterns over time.

📖 Data Dictionary

- **donor_id**: Unique ID
- **donor_type**: Individual or Organization
- **donation_dates**: Comma-separated donation dates
- **donation_amounts**: Comma-separated donation amounts
- **acquisition channel**: Source channel (Direct Mail, Online Event, etc.)
- **age, gender, location**: Donor demographics

## Functions
### generate_unique_dates
This function takes three arguments: start_year, end_year, and total_dates. The first two are global variables that the user can modify. The total_dates is a randomly generated number between 1 and 10, which determines how many unique dates will be created for a given donor_id. Dates are generated by randomly selecting a year, month, and day, iterating the number of times specified by total_dates.

### generate_donor_data
This function needs three inputs: num_donors, earliest_year, and latest_year, with the last two using the same global variables as generate_unique_dates. Each donor_id has a single row in the data table, with donation details organized into two columns: donation_date and donation_amount. Each of these columns holds multiple values for the donor, separated by commas. Each date in the donation_date column directly corresponds to an amount in the donation_amount column, so when separating the data, the user must make sure to keep the order of the values intact. The distribution of attributes reflects real life—like 30% of donors being organizations and 70% individuals.

## Notes
There’s definitely room to improve this script, like adding weights to acquisition channels to better mimic real data. But it’s a solid foundation for exploring data cleaning, handling date formats, and creating visuals, making it a practical starting point for analysis projects.