Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/k-bloch/data-generator
This is a python script that generates fake data in CSV format for data analysis projects.
https://github.com/k-bloch/data-generator
Last synced: about 1 month ago
JSON representation
This is a python script that generates fake data in CSV format for data analysis projects.
- Host: GitHub
- URL: https://github.com/k-bloch/data-generator
- Owner: K-Bloch
- Created: 2024-11-03T17:00:11.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-03T17:27:03.000Z (2 months ago)
- Last Synced: 2024-11-03T18:24:55.242Z (2 months ago)
- Language: Python
- Size: 247 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# data-generator
This Python script generates fake donation data in CSV format, simulating contributions to a charity or non-profit organization. It's designed for data analysis projects that need structured datasets to explore metrics like donor acquisition, retention, and donation patterns over time.📖 Data Dictionary
- **donor_id**: Unique ID
- **donor_type**: Individual or Organization
- **donation_dates**: Comma-separated donation dates
- **donation_amounts**: Comma-separated donation amounts
- **acquisition channel**: Source channel (Direct Mail, Online Event, etc.)
- **age, gender, location**: Donor demographics## Functions
### generate_unique_dates
This function takes three arguments: start_year, end_year, and total_dates. The first two are global variables that the user can modify. The total_dates is a randomly generated number between 1 and 10, which determines how many unique dates will be created for a given donor_id. Dates are generated by randomly selecting a year, month, and day, iterating the number of times specified by total_dates.### generate_donor_data
This function needs three inputs: num_donors, earliest_year, and latest_year, with the last two using the same global variables as generate_unique_dates. Each donor_id has a single row in the data table, with donation details organized into two columns: donation_date and donation_amount. Each of these columns holds multiple values for the donor, separated by commas. Each date in the donation_date column directly corresponds to an amount in the donation_amount column, so when separating the data, the user must make sure to keep the order of the values intact. The distribution of attributes reflects real life—like 30% of donors being organizations and 70% individuals.## Notes
There’s definitely room to improve this script, like adding weights to acquisition channels to better mimic real data. But it’s a solid foundation for exploring data cleaning, handling date formats, and creating visuals, making it a practical starting point for analysis projects.