An open API service indexing awesome lists of open source software.

https://github.com/titaniumbones/nsf-awards-downloader


https://github.com/titaniumbones/nsf-awards-downloader

Last synced: about 1 year ago
JSON representation

Awesome Lists containing this project

README

          

# NSF Awards Downloader

A Python script to download and process NSF (National Science Foundation) award data.

## Overview

This script downloads award data from the NSF website, processes it, and creates both individual year files and a combined dataset. It handles historical NSF award data from 1959 to present.

## Features

- Downloads award data ZIP files from NSF's website
- Resumes interrupted downloads (skips existing files)
- Processes multiple JSON files per ZIP
- Creates per-year JSON files
- Combines all awards into a single dataset
- Provides detailed logging and progress updates

## Directory Structure

```
nsf-awards-downloader/
├── README.md
├── requirements.txt
├── download_awards.py
├── zips/ # Downloaded ZIP files
│ └── .gitkeep
└── json/ # Extracted JSON files
└── .gitkeep
```

## Installation

1. Clone this repository:
```bash
git clone https://github.com/yourusername/nsf-awards-downloader.git
cd nsf-awards-downloader
```

2. Create a virtual environment (optional but recommended):
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scriptsctivate
```

3. Install dependencies:
```bash
pip install -r requirements.txt
```

## Usage

Run the script:
```bash
python download_awards.py
```

The script will:
1. Create `zips/` and `json/` directories if they don't exist
2. Download ZIP files from NSF (skipping any that already exist)
3. Extract all JSON files from each ZIP
4. Create yearly JSON files in the `json/` directory
5. Create a `combined.json` file with all awards

## Output Files

- `zips/YEAR.zip`: Original downloaded ZIP files from NSF
- `json/YEAR.json`: Extracted and combined JSON for each year
- `combined.json`: All awards combined into a single file

## Error Handling

The script includes robust error handling:
- Verifies download completeness
- Handles corrupted ZIP files
- Skips invalid JSON files
- Provides detailed error logging

## Requirements

- Python 3.7+
- See requirements.txt for package dependencies

## License

This project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details.