https://github.com/pnuu/fmiopendata
Python interface for FMI open data
https://github.com/pnuu/fmiopendata
fmi hactoberfest ilmatieteenlaitos ilmatieteenlaitos-avoin-data python
Last synced: 6 months ago
JSON representation
Python interface for FMI open data
- Host: GitHub
- URL: https://github.com/pnuu/fmiopendata
- Owner: pnuu
- License: gpl-3.0
- Created: 2020-07-04T15:52:59.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2024-04-10T08:01:44.000Z (over 1 year ago)
- Last Synced: 2025-03-26T11:51:11.198Z (7 months ago)
- Topics: fmi, hactoberfest, ilmatieteenlaitos, ilmatieteenlaitos-avoin-data, python
- Language: Python
- Homepage:
- Size: 134 KB
- Stars: 57
- Watchers: 9
- Forks: 8
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# fmiopendata
Python interface for FMI open data## Resources
[FMI open data](https://en.ilmatieteenlaitos.fi/open-data)[FMI WFS guide](https://en.ilmatieteenlaitos.fi/open-data-manual-wfs-examples-and-guidelines)
## Installation
```bash
pip install fmiopendata
```For `grid` datasets install also `eccodes`. Both the library and
Python bindings are needed. The former is easiest to install with
`conda` and the latter via `pip`:```bash
conda install eccodes -c conda-forge
pip install eccodes
```For `radar` datasets `rasterio` is needed. It can be installed with
`pip`, although usage of `miniconda` is strongly encouraged.## Available data
This library provides two very simple scripts that list all the available data on
FMI open data WMS and WFS services.Create `wms.html` that lists all the available WMS layers:
```bashwms_html.py
```The `Layer ID` strings are used to identify WMS (image) datasets. Examples to be added.
Create `wfs.html` that lists all the available WFS layers:
```bashwfs_html.py
```The `Query ID` is the handle that can be used to request data from WFS stored queries. See examples below.
## Examples
* [Download and parse latest soundings](#download-and-parse-latest-soundings)
* [Download and calibrate latest radar reflectivity (dBZ) composite](#download-and-calibrate-latest-radar-reflectivity-dBZ-composite)
* [Download and parse latest lightning data](#download-and-parse-latest-lightning-data)
* [Download and parse grid data](#download-and-parse-grid-data)
* [Download and parse observation data](#download-and-parse-observation-data)### Download and parse latest soundings
```pythonfrom fmiopendata.wfs import download_stored_query
snd = download_stored_query("fmi::observations::weather::sounding::multipointcoverage")
```Now `snd.soundings` is a list of the soundings, and the data within each can be reached from the attributes:
```python
sounding = snd.soundings[0]
sounding.name # Name of the sounding station
sounding.id # Station ID of the sounding station
sounding.nominal_time # Nominal time of the sounding
sounding.start_time # Actual start time of the sounding
sounding.end_time # Actual end time of the sounding
sounding.lats # Numpy array of the measurement location latitudes [degrees]
sounding.lons # Numpy array of the measurement location longitudes [degrees]
sounding.altitudes # Numpy array of the measurement location altitudes [m]
sounding.times # Numpy array of the measurement times [datetime]
sounding.pressures # Numpy array of measured pressures [hPa]
sounding.temperatures # Numpy array of measured temperatures [°C]
sounding.dew_points # Numpy array of measured dew points [°C]
sounding.wind_speeds # Numpy array of measured wind speeds [m/s]
sounding.wind_directions # Numpy array of measured wind directions [°]
```More refined search results can be retrieved by passing a list of arguments that will be added to the WFS request URL.
```python
snd = download_stored_query("fmi::observations::weather::sounding::multipointcoverage",
["place=Jokioinen"])
```### Download and calibrate latest radar reflectivity (dBZ) composite
This parser requires the `rasterio` library to be installed. All radar queries will work.
```python
from fmiopendata.wfs import download_stored_query
composites = download_stored_query("fmi::radar::composite::dbz")
```Now `composites.data` is a list of the individual reflectivity composites and `composites.times` is a list of the nominal measurement times.
```python
import numpy as np
composite = composites.data[0]
# Download the image data
composite.download()
# Calibrate the data from image values to dBZ
# Calls `composite.download() if data are not already downloaded
composite.calibrate()
# Get mask for area outside the radar reach
# Calls `composite.download() if data are not already downloaded
mask1 = composite.get_area_mask()
# Get mask for invalid data
# Calls `composite.download() if data are not already downloaded
mask2 = composite.get_data_mask()
# Mask all the invalid areas using the above masks
composite.data[mask1 | mask2] = np.nan# Plot the data for preview
import matplotlib.pyplot as pltplt.imshow(composite.data[0, :, :])
plt.show()
```The following attributes are available in the underlying `Radar` class for all radar data:
```python
Radar.data # The downloaded data, if .download() has been called
Radar.elevation # Measurement angle for single radar datasets
Radar.etop_threshold # Reflectivity limit for `etop` datasets
Radar.label # Clear-text name for the data
Radar.max_velocity # Maximum wind speed for `vrad` datasets
Radar.name # Name of the dataset
Radar.projection # WKT projection string for the dataset
Radar.time # Nominal measurement time of the dataset
Radar.unit # Unit of the calibrated data
Radar.url # Direct download URL for the data
```### Download and parse latest lightning data
```pythonfrom fmiopendata.wfs import download_stored_query
# These both give identical results, the first one is much faster
lightning1 = download_stored_query("fmi::observations::lightning::multipointcoverage")
lightning2 = download_stored_query("fmi::observations::lightning::simple")
```The following attributes hold the lightning data.
```python
lightning1.latitudes # Latitude of the lightning event [° North]
lightning1.longitudes # Longitude of the lightning event [° East]
lightning1.times # Time of the lightning event [datetime]
lightning1.cloud_indicator # Indicator for cloud flashes (1 == cloud lightning)
lightning1.multiplicity # Multiplicity of the lightning event
lightning1.peak_current # Maximum current of the lightning event [kA]
lightning1.ellipse_major # Location accuracy of the lightning event [km]
```### Download and parse grid data
The example below uses `fmi::forecast::harmonie::surface::grid` for demonstration. Also other
`grid` stored queries should work, as long as the data are available in GRIB format.
This parser requires `eccodes` library to be installed.```python
import datetime as dt
from fmiopendata.wfs import download_stored_query# Limit the time
now = dt.datetime.utcnow()
# Depending on the current time and availability of the model data, adjusting
# the hours below might be necessary to get any data
start_time = now.strftime('%Y-%m-%dT00:00:00Z')
end_time = now.strftime('%Y-%m-%dT18:00:00Z')
model_data = download_stored_query("fmi::forecast::harmonie::surface::grid",
args=["starttime=" + start_time,
"endtime=" + end_time,
"bbox=18,55,35,75"])
```Here we limit both the time frame and area. By default the WFS interface would offer all data
fields, full globe and all time steps, which would result in a huge amount of data. It's better
to split the retrieval in smaller time intervals and do the downloading in parallel and start
processing the each chunk of data as soon as the download completes.The above call doesn't download any actual data, it just collects some metadata:
```python
print(model_data.data)
# -> {datetime.datetime(2023, 1, 6, 6, 0): ,
# datetime.datetime(2023, 1, 6, 9, 0): }
```Here the dictionary keys are the model initialization times.
The structure of the underlying `Grid` class is:
```python
Grid.data # Parsed data, more info below
Grid.delete_file() # Delete the downloaded GRIB2 file
Grid.download() # Download the data
Grid.end_time # Last valid time of the data [datetime]
Grid.init_time # Initialization time of the data [datetime]
Grid.latitudes # Latitudes of the parsed data [Numpy array]
Grid.longitudes # Longitudes of the parsed data [Numpy array]
Grid.parse() # Download and parse the data
Grid.start_time # First valid time of the data [datetime]
Grid.url # The URL where the data are downloaded from
```Now, download and parse the data from the latest run:
```python
latest_run = max(model_data.data.keys()) # datetime.datetime(2020, 7, 7, 12, 0)
data = model_data.data[latest_run]
# This will download the data to a temporary file, parse the data and delete the file
data.parse(delete=True)
```The `data.data` dictionary now has the following structure:
```python
# The first level has the valid times
valid_times = data.data.keys()
print(list(valid_times))
# -> [datetime.datetime(2023, 1, 6, 9, 0),
# datetime.datetime(2023, 1, 6, 10, 0),
# datetime.datetime(2023, 1, 6, 11, 0),
# datetime.datetime(2023, 1, 6, 12, 0),
# datetime.datetime(2023, 1, 6, 13, 0),
# datetime.datetime(2023, 1, 6, 14, 0),
# datetime.datetime(2023, 1, 6, 15, 0),
# datetime.datetime(2023, 1, 6, 16, 0),
# datetime.datetime(2023, 1, 6, 17, 0),
# datetime.datetime(2023, 1, 6, 18, 0)]
```The second level of the dictionary has the data vertical levels, here the earliest time step is shown:
```python
earliest_step = min(valid_times)
data_levels = data.data[earliest_step].keys()
print(list(data_levels))
# -> [0, 2, 10]
```Lets loop over these levels and print the dataset names and the units.
```python
for level in data_levels:
datasets = data.data[earliest_step][level]
for dset in datasets:
unit = datasets[dset]["units"]
data_array = datasets[dset]["data"] # Numpy array of the actual data
print("Level: %d, dataset name: %s, data unit: %s" % (level, dset, unit))
```This will print:
```
Level: 0, dataset name: Mean sea level pressure, data unit: Pa
Level: 0, dataset name: Orography, data unit: m
Level: 0, dataset name: Surface net thermal radiation, data unit: J m**-2
Level: 0, dataset name: Surface net solar radiation, data unit: J m**-2
Level: 0, dataset name: Time-integrated surface direct short wave radiation flux, data unit: J m**-2
Level: 2, dataset name: 2 metre temperature, data unit: K
Level: 2, dataset name: 2 metre dewpoint temperature, data unit: K
Level: 2, dataset name: 2 metre relative humidity, data unit: %
Level: 10, dataset name: Mean wind direction, data unit: degrees
Level: 10, dataset name: 10 metre wind speed, data unit: m s**-1
Level: 10, dataset name: 10 metre U wind component, data unit: m s**-1
Level: 10, dataset name: 10 metre V wind component, data unit: m s**-1
Level: 10, dataset name: surface precipitation amount, rain, convective, data unit: kg m**-2
Level: 10, dataset name: Convective available potential energy, data unit: J kg**-1
Level: 10, dataset name: Total Cloud Cover, data unit: %
Level: 10, dataset name: Low cloud cover, data unit: (0 - 1)
Level: 10, dataset name: Medium cloud cover, data unit: (0 - 1)
Level: 10, dataset name: High cloud cover, data unit: (0 - 1)
Level: 10, dataset name: Global radiation flux, data unit: W m**-2
Level: 10, dataset name: Visibility, data unit: m
Level: 10, dataset name: 10 metre wind gust since previous post-processing, data unit: m s**-1
Level: 10, dataset name: Surface solar radiation downwards, data unit: J m**-2
```The whole structure `Grid.data[valid_time][level][dataset_name]` with `'data'` and `'units'`
members is common to all `grid` type WFS data.The data arrays will have invalid values replaced with `np.nan`.
## Download and parse observation data
```python
import datetime as dtfrom fmiopendata.wfs import download_stored_query
# Retrieve the latest hour of data from a bounding box
end_time = dt.datetime.utcnow()
start_time = end_time - dt.timedelta(hours=1)
# Convert times to properly formatted strings
start_time = start_time.isoformat(timespec="seconds") + "Z"
# -> 2020-07-07T12:00:00Z
end_time = end_time.isoformat(timespec="seconds") + "Z"
# -> 2020-07-07T13:00:00Zobs = download_stored_query("fmi::observations::weather::multipointcoverage",
args=["bbox=18,55,35,75",
"starttime=" + start_time,
"endtime=" + end_time])
```The structure of the returned `MultiPoint` class is
```python
MultiPoint.data # The observation data
MultiPoint.location_metadata # Location information for the observation locations
```The `data` dictionary has the following structure, which is designed to be used for
collecting data for distinct timesteps:```python
# The observation times are the primary key
print(sorted(obs.data.keys()))
# -> [datetime.datetime(2023, 1, 6, 14, 5),
# datetime.datetime(2023, 1, 6, 14, 6),
# datetime.datetime(2023, 1, 6, 14, 7),
# ...
# datetime.datetime(2023, 1, 6, 15, 4)]# The next level has the names of the observation stations as keys
latest_tstep = max(obs.data.keys())
print(sorted(obs.data[latest_tstep].keys()))
# Will print a list of weather station names, e.g. "Kustavi Isokari", which we'll use
# as an example below# On the third level we find the names of the observed parameters
print(sorted(obs.data[latest_tstep]["Kustavi Isokari"].keys()))
# -> ['Air temperature',
# 'Cloud amount',
# 'Dew-point temperature',
# 'Gust speed',
# 'Horizontal visibility',
# 'Precipitation amount',
# 'Precipitation intensity',
# 'Present weather (auto)',
# 'Pressure (msl)',
# 'Relative humidity',
# 'Snow depth',
# 'Wind direction',
# 'Wind speed']# And on the last level we find the value and unit of the observation
print(obs.data[latest_tstep]["Kustavi Isokari"]["Air temperature"])
# -> {'value': -6.7, 'units': 'degC'}
```To get the location for the stations, one can use the `location_metadata` dictionary:
```python
print(obs.location_metadata["Kustavi Isokari"])
# -> {'fmisid': 101059, 'latitude': 60.7222, 'longitude': 21.02681}
```It is also possible to collect the data to a structure more usable for timeseries
analysis by adding `"timeseries=True"` to the arguments:```python
from fmiopendata.wfs import download_stored_query
obs = download_stored_query("fmi::observations::weather::multipointcoverage",
args=["bbox=25,60,25.5,60.5",
"timeseries=True"])
```Now the data will be organized by location:
```python
print(sorted(obs.data.keys()))
# -> ['Helsinki Malmi lentokenttä',
# 'Helsinki Vuosaari Käärmeniementie',
# 'Helsinki Vuosaari satama',
# 'Järvenpää Sorto',
# 'Sipoo Itätoukki']# The times are as a list of datetime objects
times = obs.data['Helsinki Malmi lentokenttä']['times']
# Other data fields have another extra level, one for values and one for the unit
print(len(obs.data['Helsinki Malmi lentokenttä']['Air temperature']['values']))
# -> 71
print(obs.data['Helsinki Malmi lentokenttä']['Air temperature']['unit'])
# -> 'degC'
```This parser supports at least the following stored queries:
* `fmi::forecast::hirlam::surface::obsstations::multipointcoverage`
* `fmi::forecast::oaas::sealevel::point::multipointcoverage`
* `fmi::observations::airquality::hourly::multipointcoverage`
* `fmi::observations::mareograph::daily::multipointcoverage`
* `fmi::observations::mareograph::instant::multipointcoverage`
* `fmi::observations::mareograph::monthly::multipointcoverage`
* `fmi::observations::mareograph::multipointcoverage`
* `fmi::observations::radiation::multipointcoverage`
* `fmi::observations::wave::multipointcoverage`
* `fmi::observations::weather::cities::multipointcoverage`
* `fmi::observations::weather::multipointcoverage`
* `stuk::observations::air::radionuclide-activity-concentration::multipointcoverage`
* `stuk::observations::external-radiation::latest::multipointcoverage`
* `stuk::observations::external-radiation::multipointcoverage`
* `urban::observations::airquality::hourly::multipointcoverage`