https://github.com/cldellow/iem2parquet
Export Iowa Environmental Mesonet data to Parquet files.
https://github.com/cldellow/iem2parquet
Last synced: over 1 year ago
JSON representation
Export Iowa Environmental Mesonet data to Parquet files.
- Host: GitHub
- URL: https://github.com/cldellow/iem2parquet
- Owner: cldellow
- License: apache-2.0
- Created: 2019-01-13T15:50:58.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-02-05T04:23:21.000Z (over 7 years ago)
- Last Synced: 2025-03-29T12:13:20.955Z (over 1 year ago)
- Language: Python
- Size: 10.7 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# iem2parquet
The [Iowa Environmental Mesonet](http://mesonet.agron.iastate.edu/) archives automated weather
sensor data from stations around the world. They take raw data, published in the standard
[METAR](https://en.wikipedia.org/wiki/METAR) format, do minimal processing and expose it
via a web service.
The web service is a bit pokey and exposes the data as a TSV. These scripts automate
the retrieval of data and conversion to a Parquet file, suitable for further processing
with a big data tool of your choice (or, y'know, [SQLite](https://github.com/cldellow/sqlite-parquet-vtable)).
If you use this, please be considerate of the remote server's capacity.
## Prerequisites
This tool depends on the [`csv2parquet`](https://github.com/cldellow/csv2parquet) package. Install it via:
```
pip install pyarrow csv2parquet
```
## Usage
```
# Download TSV for a period of time.
./fetch CYKF 2018-1-1 2018-1-31 > ykf.tsv
# Convert to Parquet file.
./pq ykf.tsv
# Convert to Parquet file, retain the raw METAR field.
INCLUDE_RAW_METAR=1 ./pq ykf.tsv
```
## Space savings
This table compares the uncompressed TSV size vs the Parquet size for my hometown's
data.
| Duration | TSV size | Parquet size (incl METAR / excl METAR) | Size decrease |
|----------|------------|----------------------------------------|---------------|
| 1 day | 15,821 | 9,641 / 7,714 | 39.1% / 51.2% |
| 1 month | 331,242 | 54,080 / 25,067 | 83.7% / 92.4% |
| 1 year | 3,686,065 | 496,247 / 171,390 | 86.5% / 95.4% |
| 1 decade | 37,528,193 | 4,768,956 / 1,580,540 | 87.3% / 95.8% |
## Useful Links
- [CSV of stations](https://mesonet.agron.iastate.edu/sites/networks.php?special=allasos&format=csv&nohtml)
- [Field descriptions](http://mesonet.agron.iastate.edu/request/download.phtml) (scroll to bottom)