https://github.com/ssaishruthi/insight_donation
This repo contains program given for insight data engineer program
https://github.com/ssaishruthi/insight_donation
src
Last synced: about 2 months ago
JSON representation
This repo contains program given for insight data engineer program
- Host: GitHub
- URL: https://github.com/ssaishruthi/insight_donation
- Owner: SSaishruthi
- Created: 2018-02-13T05:12:31.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-02-13T07:23:12.000Z (over 8 years ago)
- Last Synced: 2025-03-23T18:52:19.047Z (about 1 year ago)
- Topics: src
- Language: Jupyter Notebook
- Size: 18.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Insight_donation
This repo contains program given for insight data engineer program
Problem Statement
Identify repeated donors and calculate the following for each combination of recipient, zip code and calendar year
1. Total dollars received
2. Total number of contributions received
3. Donation amount in a given percentile
Input
1. Contribution file
2. Percentile file
Language - Python
Libraries needed
1. Pandas
2. Numpy
3. re
4. csv
5. os
6. datetime
7. decimal
Input filter conditions
Eliminate records if it satisfies following conditions
1. Get only first digits from the zip code and eliminate if the field is empty or contains digits less than length 5
2. Other_id has value
3. committee id is empty
4. Amount field is empty
5. Improper date field or date in range specified
6. Improper name field
Input needed
1. start and end year
2. Current year for which final output processing has to be done
*** Path of input and output file needs to be updated before running ***
*** Both jupyter and py files are in src folder **