Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scotthosking/get-station-data
Easily grab weather station data from around the globe (e.g. GHCN)
https://github.com/scotthosking/get-station-data
python
Last synced: about 2 months ago
JSON representation
Easily grab weather station data from around the globe (e.g. GHCN)
- Host: GitHub
- URL: https://github.com/scotthosking/get-station-data
- Owner: scotthosking
- License: mit
- Created: 2017-02-28T20:56:10.000Z (almost 8 years ago)
- Default Branch: main
- Last Pushed: 2023-10-31T17:29:27.000Z (about 1 year ago)
- Last Synced: 2024-10-29T22:30:36.162Z (2 months ago)
- Topics: python
- Language: Jupyter Notebook
- Homepage:
- Size: 1.03 MB
- Stars: 26
- Watchers: 1
- Forks: 11
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- open-sustainable-technology - get-station-data - A set of Python tools to make it easier to extract weather station data (e.g., temperature, precipitation) from the Global Historical Climatology Network Daily. (Climate Change / Climate Data Access and Visualization)
README
# Get daily weather station data (Global)
A set of Python tools to make it easier to extract weather station data (e.g., temperature, precipitation) from the [Global Historical Climatology Network - Daily (GHCND)](https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily)
> *"The Global Historical Climatology Network daily (GHCNd) is an integrated database of daily climate summaries from land surface stations across the globe. GHCNd is made up of daily climate records from numerous sources that have been integrated and subjected to a common suite of quality assurance reviews. GHCNd contains records from more than 100,000 stations in 180 countries and territories. NCEI provides numerous daily variables, including maximum and minimum temperature, total daily precipitation, snowfall, and snow depth. About half the stations only report precipitation. Both record length and period of record vary by station and cover intervals ranging from less than a year to more than 175 years."* [source](https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily)
More information on the data can be found [here](https://www.ncei.noaa.gov/pub/data/ghcn/daily/readme.txt)
## Installation
1. **Install from the source code**:
* Clone the repository source code:
```bash
git clone https://github.com/scotthosking/get-station-data.git
```* Install along with its dependencies:
```bash
cd /path/to/my/get-station-data
pip install -v -e .
```## Worked through example
```python
from get_station_data import ghcnd
from get_station_data.util import nearest_stn%matplotlib inline
```### Read station metadata
```python
stn_md = ghcnd.get_stn_metadata()
```### Choose a location (lon/lat) and number of nearest neighbours
```python
london_lon_lat = -0.1278, 51.5074
my_stns = nearest_stn(stn_md,
london_lon_lat[0], london_lon_lat[1],
n_neighbours=5 )
my_stns
```
station
lat
lon
elev
name
52113
UKE00105915
51.5608
0.1789
137.0
HAMPSTEAD
52165
UKM00003772
51.4780
-0.4610
25.3
HEATHROW
52098
UKE00105900
51.8067
0.3581
128.0
ROTHAMSTED
52191
UKW00035054
51.2833
0.4000
91.1
WEST MALLING
52131
UKE00107650
51.4789
0.4489
25.0
HEATHROW
### Download and extract data into a pandas DataFrame
```python
df = ghcnd.get_data(my_stns)df.head()
```
station
year
month
day
element
value
mflag
qflag
sflag
date
lon
lat
elev
name
0
UKE00105915
1959
12
1
TMAX
NaN
1959-12-01
0.1789
51.5608
137.0
HAMPSTEAD
1
UKE00105915
1959
12
2
TMAX
NaN
1959-12-02
0.1789
51.5608
137.0
HAMPSTEAD
2
UKE00105915
1959
12
3
TMAX
NaN
1959-12-03
0.1789
51.5608
137.0
HAMPSTEAD
3
UKE00105915
1959
12
4
TMAX
NaN
1959-12-04
0.1789
51.5608
137.0
HAMPSTEAD
4
UKE00105915
1959
12
5
TMAX
NaN
1959-12-05
0.1789
51.5608
137.0
HAMPSTEAD
### Filter data for, e.g., a single variable
```python
var = 'PRCP' # precipitation
df = df[ df['element'] == var ]### Tidy up columns
df = df.rename(index=str, columns={"value": var})
df = df.drop(['element'], axis=1)df.head()
```
station
year
month
day
PRCP
mflag
qflag
sflag
date
lon
lat
elev
name
93
UKE00105915
1960
1
1
2.5
E
1960-01-01
0.1789
51.5608
137.0
HAMPSTEAD
94
UKE00105915
1960
1
2
1.5
E
1960-01-02
0.1789
51.5608
137.0
HAMPSTEAD
95
UKE00105915
1960
1
3
1.0
E
1960-01-03
0.1789
51.5608
137.0
HAMPSTEAD
96
UKE00105915
1960
1
4
0.8
E
1960-01-04
0.1789
51.5608
137.0
HAMPSTEAD
97
UKE00105915
1960
1
5
0.0
E
1960-01-05
0.1789
51.5608
137.0
HAMPSTEAD
```python
df.drop(columns=['mflag','qflag','sflag']).tail(n=10)
```
station
year
month
day
PRCP
date
lon
lat
elev
name
83938
UKE00107650
2016
12
22
0.0
2016-12-22
0.4489
51.4789
25.0
HEATHROW
83939
UKE00107650
2016
12
23
1.4
2016-12-23
0.4489
51.4789
25.0
HEATHROW
83940
UKE00107650
2016
12
24
0.0
2016-12-24
0.4489
51.4789
25.0
HEATHROW
83941
UKE00107650
2016
12
25
1.0
2016-12-25
0.4489
51.4789
25.0
HEATHROW
83942
UKE00107650
2016
12
26
0.0
2016-12-26
0.4489
51.4789
25.0
HEATHROW
83943
UKE00107650
2016
12
27
0.0
2016-12-27
0.4489
51.4789
25.0
HEATHROW
83944
UKE00107650
2016
12
28
0.2
2016-12-28
0.4489
51.4789
25.0
HEATHROW
83945
UKE00107650
2016
12
29
0.4
2016-12-29
0.4489
51.4789
25.0
HEATHROW
83946
UKE00107650
2016
12
30
0.0
2016-12-30
0.4489
51.4789
25.0
HEATHROW
83947
UKE00107650
2016
12
31
0.4
2016-12-31
0.4489
51.4789
25.0
HEATHROW
### Save to file
```python
df.to_csv('London_5stns_GHCN-D.csv', index=False)
```### Plot histogram of all data
```python
df['PRCP'].plot.hist(bins=40)
```
![png](http://scotthosking.com/images/notebooks/ghcn_daily_data/output_14_1.png)
### Plot time series for one station
```python
heathrow = df[ df['name'] == 'HEATHROW' ]
heathrow['PRCP'].plot()
```
![png](http://scotthosking.com/images/notebooks/ghcn_daily_data/output_16_1.png)