https://github.com/titaniumbones/nsf-awards-downloader
https://github.com/titaniumbones/nsf-awards-downloader
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/titaniumbones/nsf-awards-downloader
- Owner: titaniumbones
- License: other
- Created: 2025-02-07T22:57:21.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-07T22:58:57.000Z (over 1 year ago)
- Last Synced: 2025-04-04T22:11:35.949Z (about 1 year ago)
- Language: Python
- Size: 3.91 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# NSF Awards Downloader
A Python script to download and process NSF (National Science Foundation) award data.
## Overview
This script downloads award data from the NSF website, processes it, and creates both individual year files and a combined dataset. It handles historical NSF award data from 1959 to present.
## Features
- Downloads award data ZIP files from NSF's website
- Resumes interrupted downloads (skips existing files)
- Processes multiple JSON files per ZIP
- Creates per-year JSON files
- Combines all awards into a single dataset
- Provides detailed logging and progress updates
## Directory Structure
```
nsf-awards-downloader/
├── README.md
├── requirements.txt
├── download_awards.py
├── zips/ # Downloaded ZIP files
│ └── .gitkeep
└── json/ # Extracted JSON files
└── .gitkeep
```
## Installation
1. Clone this repository:
```bash
git clone https://github.com/yourusername/nsf-awards-downloader.git
cd nsf-awards-downloader
```
2. Create a virtual environment (optional but recommended):
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scriptsctivate
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
## Usage
Run the script:
```bash
python download_awards.py
```
The script will:
1. Create `zips/` and `json/` directories if they don't exist
2. Download ZIP files from NSF (skipping any that already exist)
3. Extract all JSON files from each ZIP
4. Create yearly JSON files in the `json/` directory
5. Create a `combined.json` file with all awards
## Output Files
- `zips/YEAR.zip`: Original downloaded ZIP files from NSF
- `json/YEAR.json`: Extracted and combined JSON for each year
- `combined.json`: All awards combined into a single file
## Error Handling
The script includes robust error handling:
- Verifies download completeness
- Handles corrupted ZIP files
- Skips invalid JSON files
- Provides detailed error logging
## Requirements
- Python 3.7+
- See requirements.txt for package dependencies
## License
This project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details.