{"id":16509354,"url":"https://github.com/gitjeff05/nih-grant-awards","last_synced_at":"2026-05-06T19:40:38.586Z","repository":{"id":70564414,"uuid":"267026900","full_name":"gitjeff05/nih-grant-awards","owner":"gitjeff05","description":"Analysis of NIH grant awards from ExPORTER data files","archived":false,"fork":false,"pushed_at":"2020-06-22T21:46:50.000Z","size":53576,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-12T18:34:18.543Z","etag":null,"topics":["data-visualization","jupyter-notebook","jupyterlab","matplotlib","pandas","python3","seaborn"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gitjeff05.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-26T11:40:48.000Z","updated_at":"2020-06-22T21:46:53.000Z","dependencies_parsed_at":"2023-03-11T08:46:11.916Z","dependency_job_id":null,"html_url":"https://github.com/gitjeff05/nih-grant-awards","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitjeff05%2Fnih-grant-awards","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitjeff05%2Fnih-grant-awards/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitjeff05%2Fnih-grant-awards/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitjeff05%2Fnih-grant-awards/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gitjeff05","download_url":"https://codeload.github.com/gitjeff05/nih-grant-awards/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241465113,"owners_count":19967243,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-visualization","jupyter-notebook","jupyterlab","matplotlib","pandas","python3","seaborn"],"created_at":"2024-10-11T15:50:11.560Z","updated_at":"2026-05-06T19:40:33.558Z","avatar_url":"https://github.com/gitjeff05.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NIH Awards Analysis\n\nThis project aims to analyze and provide some summary statistics for [NIH grant award data from ExPORTER data files](https://exporter.nih.gov/about.aspx).\n\n### For fiscal years 2019 and 2020, the NIH has funded:\n\n\u003cimg src=\"https://user-images.githubusercontent.com/548922/85337721-f5d66880-b4ae-11ea-80ed-c204947de808.png\" alt=\"Summary statistics for NIH grant award data\" style=\"text-align:center\"/\u003e\n\n\u003cimg src=\"https://user-images.githubusercontent.com/548922/85338207-d429b100-b4af-11ea-8f0e-08c227867227.png\" alt=\"Rolling median for select NIH administering agencies\" style=\"text-align:center\"/\u003e\n\n# Preprocessing the data with Preprocessing.ipynb\n\n### The notebook and relevant directories\n\nThe data are cleaned and preprocessed in the `Preprocessing.ipynb` notebook. This is meant to be run against the raw data from `./data/exporter` and the output is saved to `./out/csv` with the format: `out/csv/post_processed_{ISO_DATE}.csv.gzip`.\n\n### Fetching the data\n\nThis project does not house the ExPORTER data files (~200MB). As a convenience, we include a bash script `./utils/get_csvs` which programmatically downloads all of the files listed in `./utils/reporter_files.txt`. Once you have the exporter files downloaded in the `./data/exporter` directory, the preprocessing notebook will be able to find them. The files list in `reporter_files.txt` is up to date at the time of this publishing. However, as new files are added weekly, you should add the new files to `reporter_files.txt` and re-run `./utils/get_csvs`. A more efficient way might be to just download the new file directly to `./data/exporter` with `curl` like so:\n\n```bash\n\u003e curl https://exporter.nih.gov/CSVs/final/RePORTER_PRJ_C_FY2020_031.zip -o \"./data/exporter/RePORTER_PRJ_C_FY2020_031.zip\"\n```\n\n### Converting column dtypes\n\nAny column meant to represent a date is converted to `datetime64[ns]` using `pandas.to_datetime`. Some columns that are strings are explicitly converted to `StringDtype` as it is [the recommended way to store strings in pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html).\n\n### Dropping rows:\n\nSome rows are dropped during preprocessing as they would be unhelpful for analysis. This is primarily because they:\n\n- have no associated costs\n- are duplicates of another award\n\nWhen filtering out the above cases, about 3.433% of rows were dropped.\n\nAlthough ~98.3% of ExPORTER data reference awards that are administered by an NIH agency, the data also includes awards from other government agencies (CDC, FDA, VA). Since this analysis is only relevant to NIH, rows that are not NIH awards are dropped. Also, we drop all rows where `ORG_COUNTRY` (\"country in which the business office of the grantee organization or contractor is located\") is *not* the United States.\n\n### Geospatial data\n\nSince all US award data have related zip codes for the organization, we added additional columns for lat/long coordinates. The \"[US Zip Code Latitude and Longitude](https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/information/)\" by [CivicSpace Labs]() is licensed under [Creative Commons Attribution-ShareAlike](https://creativecommons.org/licenses/by-sa/2.0/). Copyright 2004 CivicSpace Labs.\n\n## Project Structure\n\n```bash\n.\n├── Analysis.ipynb                                               # this notebook\n├── Preprocess.ipynb                                             # preprocessing notebook\n├── CHANGELOG.md                                                 # When data or models are generated, a corresponding entry goes here\n├── README.md                                                    # Project details\n├── data\n│   ├── exporter                                                 # exporter files (not included in repo) \n│   ├── geo                                                      # shapefiles for maps\n│   │   └── cb_2019_us_state_500k.{cpg,dbf,prj,shp,shx,xml}\n│   └── zips                                                     # Zip to lat/long\n│       └── us-zip-code-latitude-and-longitude.csv \n├── out\n│   ├── csv                                                      # output of preprocessing\n│   │   └── post_processed_2020-05-29T12:36:20+00:00.csv.gzip    # sample (Note: this is not the entire dataset, see utils/get_csvs for script to get all files for yourself)\n│   ├── json                                                     # any json output goes here\n│   └── models                                                   # any saved models go here\n├── styles                                                       \n│   └── styles-nih.css                                           # external styles\n└── utils\n    ├── get_csvs                                                 # function to get data\n    ├── reporter_files.txt                                       # list of reporter files to retrieve\n    └── style_dataframes.py                                      # Helper methods to style dataframes  \n\n9 directories, 19 files\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgitjeff05%2Fnih-grant-awards","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgitjeff05%2Fnih-grant-awards","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgitjeff05%2Fnih-grant-awards/lists"}