An open API service indexing awesome lists of open source software.

https://github.com/rxavier/econuy

Wrangling Uruguayan economic data so you don't have to.
https://github.com/rxavier/econuy

data economy python uruguay

Last synced: 16 days ago
JSON representation

Wrangling Uruguayan economic data so you don't have to.

Awesome Lists containing this project

README

          




# Overview

This project simplifies gathering and processing of Uruguayan economic statistics. Data is retrieved from (mostly) government sources, processed into a familiar tabular format, tagged with useful metadata and can be transformed in several ways (converting to dollars, calculating rolling averages, resampling to other frequencies, etc.).

If [this screenshot](https://i.imgur.com/Ku5OR0y.jpg) gives you anxiety, this package should be of interest.

A webapp with a limited but interactive version of econuy is available at [econ.uy](https://econ.uy). Check out the [repo](https://github.com/rxavier/econuy-web) as well.

The most basic econuy workflow goes like this:

```python
from econuy import load_dataset, load_datasets_parallel

data1 = load_dataset("cpi")
```

# Installation

* PyPI:

```bash
pip install econuy
```

* Git:

```bash
git clone https://github.com/rxavier/econuy.git
cd econuy
python setup.py install
```

# Usage

**[Full API documentation available at RTD](https://econuy.readthedocs.io/en/latest/api.html)**

### Cache directory

econuy saves and reads data to a directory which by default is at the system `home / .cache / econuy`. This can be modified for all data loading by setting `ECONUY_DATA_DIR` or directly in `load_dataset(data_dir=...)`.

### Dataset load branching

1. Check that the dataset exists in the `REGISTRY`.
2. Cache check:
- If `skip_cache=True`, **download dataset**
- If `skip_cache=False` (default):
- Check whether the dataset exists in the cache.
- If it exists:
- Recency check:
- If it was created in the last day, **return existing dataset**.
- If it was created prior to the last day and `skip_update=False`, **download dataset**.
- If it was created prior to the last day and `skip_update=True`, **return existing dataset**.
- If it does not exist, **download dataset**
3. If the dataset was downloaded, try to update the cache:
- Validation:
- If `force_overwrite=True`, **overwrite dataset**.
- If `force_overwrite=False` (default):
- If the new dataset is similar to the cached dataset, **overwrite dataset**.
- If the new dataset is not similar to the cached dataset, **do not overwrite dataset**.

### Loading and transforming data

```python
from econuy import load_dataset, load_datasets_parallel

# load a single dataset
data1 = load_dataset("cpi")

# load a single dataset and chain transformations
data2 = (
load_dataset("fiscal_balance_nonfinancial_public_sector")
.select(names="Ingresos: SPNF")
.resample("QE-DEC", "sum")
.decompose(method="x13", component="t-c")
.filter(start_date="2014-01-01")
)
```
This returns a `Dataset` object, which contains a `Metadata` object.

You can also load multiple datasets fast:
```python
# load multiple datasets using threads or processes
data3 = load_datasets_parallel(["nxr_monthly", "ppi"])
```

### Finding datasets

```python
from econuy.utils.operations import REGISTRY

REGISTRY.list_available()
REGISTRY.list_by_area("activity")
```
### Dataset metadata

Datasets include the following metadata per indicator:

1. Indicator name
2. Area
3. Frequency
4. Currency
5. Inflation adjustment
6. Unit
7. Seasonal adjustment
8. Type (stock or flow)
9. Cumulative periods

### Transformation methods

`Dataset` objects have multiple methods to transform their underlying data and update their metadata.

* `resample()` - resample data to a different frequency, taking into account whether data is of stock or flow type.
* `chg_diff()` - calculate percent changes or differences for same period last year, last period or at annual rate.
* `decompose()` - seasonally decompose series into trend or seasonally adjusted components.
* `convert()` - convert to US dollars, constant prices or percent of GDP.
* `rebase()` - set a period or window as 100, scale rest accordingly
* `rolling()` - calculate rolling windows, either average or sum.

## External binaries and libraries

### unrar libraries

The [patool](https://github.com/wummel/patool) package is used in order to access data provided in `.rar` format. This package requires that you have the `unrar` binaries in your system, which in most cases you should already have. You can can get them from [here](https://www.rarlab.com/rar_add.htm) if you don't.

----

# Caveats

This project is heavily based on getting data from online sources that could change without notice, causing methods that download data to fail. While I try to stay on my toes and fix these quickly, it helps if you create an issue when you find one of these (or even submit a fix!).