An open API service indexing awesome lists of open source software.

https://github.com/hydrostat/alea

R package for hydrological frequency analysis with distribution fitting, return levels, diagnostics, AI-assisted selection, batch analysis, and publication-ready plots.
https://github.com/hydrostat/alea

ai-assisted extreme-values frequency-analysis goodness-of-fit hydrology r r-package statistical-hydrology

Last synced: about 1 month ago
JSON representation

R package for hydrological frequency analysis with distribution fitting, return levels, diagnostics, AI-assisted selection, batch analysis, and publication-ready plots.

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# ALEA-R

[![R-CMD-check](https://github.com/hydrostat/ALEA/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/hydrostat/ALEA/actions/workflows/R-CMD-check.yaml)

ALEA-R is an R package for hydrological frequency analysis.

It provides tools for exploratory summaries, probability distribution fitting,
return-level estimation, bootstrap confidence intervals, goodness-of-fit
assessment, sample diagnostics, AI-assisted distribution-selection support,
batch analysis, publication-ready plots, and export helpers.

## Project status

ALEA-R is available as a public GitHub release for applied hydrological
frequency-analysis workflows, teaching, validation, and reproducible examples.

The package currently passes GitHub Actions R CMD check and local CRAN-oriented
checks with 0 errors and 0 warnings. The latest local CRAN-oriented check
produced 3 explainable notes: new submission, installed package size due to the
bundled FADS_AI light model, and local inability to verify current time.

CRAN submission is intentionally deferred while the package is used more
broadly through GitHub.

The public API is intentionally limited to the initial supported distribution
families: GEV, GPA, PE3, LN2, LN3, and GUM. LP3 is not included in the current
implementation.

FADS_AI output should be interpreted as model-based decision-support evidence,
not as proof of the true generating distribution.

## Installation

You can install the public GitHub release with:

```{r installation, eval = FALSE}
# install.packages("remotes")
remotes::install_github(
"hydrostat/ALEA",
dependencies = TRUE,
upgrade = "never"
)
```

If you also want the installed vignettes to be available through
`utils::browseVignettes("ALEA")`, install with `build_vignettes = TRUE`:

```{r installation-vignettes, eval = FALSE}
# install.packages("remotes")
remotes::install_github(
"hydrostat/ALEA",
dependencies = TRUE,
upgrade = "never",
build_vignettes = TRUE
)
```

After installation, load the package with:

```{r load-package, eval = FALSE}
library(ALEA)
```

## Main features

ALEA-R supports:

- exploratory summaries for hydrological samples;
- fitting of probability distributions used in the FADS_AI selection study;
- return-level estimation;
- percentile bootstrap confidence intervals for return levels;
- goodness-of-fit statistics and information criteria;
- sample diagnostics for data-quality and frequency-analysis assumptions;
- AI-assisted distribution-selection support;
- batch analysis for multiple stations or sites;
- ggplot-based plotting methods;
- plot and table export helpers.

## Supported distributions

The initial implementation supports six candidate distributions:

| Code | Distribution |
|---|---|
| `gev` | Generalized Extreme Value |
| `gpa` | Generalized Pareto |
| `pe3` | Pearson type III |
| `ln2` | Two-parameter lognormal |
| `ln3` | Three-parameter lognormal |
| `gum` | Gumbel |

LP3 is not included in the initial implementation.

## Basic workflow

```{r basic-workflow, eval = FALSE}
library(ALEA)

x <- c(
42.1, 38.5, 51.3, 47.0, 62.4,
55.2, 49.8, 58.1, 60.3, 45.7
)

fit <- alea_fit(
x,
distribution = "gev",
method = "lmom"
)

fit
coef(fit)
```

## Return levels

```{r return-levels, eval = FALSE}
rl <- alea_return_level(
fit,
return_period = c(10, 25, 50, 100)
)

rl
plot(rl)
```

## Bootstrap confidence intervals

```{r bootstrap-ci, eval = FALSE}
ci <- confint(
fit,
parm = "return_level",
return_period = c(10, 25, 50, 100),
level = 0.95,
method = "bootstrap",
n_boot = 500,
seed = 123
)

ci
plot(ci)
```

The initial implementation provides percentile bootstrap confidence intervals
for return levels. Parameter confidence intervals, asymptotic return-level
intervals, and generic delta-method intervals are not implemented in the
initial release.

## Goodness-of-fit and diagnostics

```{r gof-diagnostics, eval = FALSE}
gof <- alea_gof(fit)
gof
plot(gof, type = "statistic")

diagnostics <- alea_diagnostics(fit)
diagnostics
plot(diagnostics, type = "status")
```

Goodness-of-fit results report empirical distribution function statistics and
information criteria. Calibrated goodness-of-fit p-values and chi-square
goodness-of-fit tests are deferred.

Diagnostics are sample-level checks intended to flag possible data-quality
issues or assumption concerns. Diagnostic warnings do not automatically
invalidate a fitted model.

## AI-assisted distribution selection

```{r ai-selection, eval = FALSE}
selection <- alea_select(x)

selection
as.data.frame(selection)
plot(selection)
```

ALEA-R includes AI-assisted distribution-selection support through the bundled
FADS_AI lightweight operational application model.

FADS_AI output should be interpreted as model-based decision-support evidence
for candidate distribution families. It is not proof of the true generating
distribution and should not replace goodness-of-fit assessment, diagnostics,
return-level uncertainty evaluation, or hydrological judgement.

## Batch analysis

```{r batch-analysis, eval = FALSE}
batch <- alea_batch_fit(
data = annual_maxima,
station = "station",
time = "year",
value = "value",
distributions = c("gev", "gpa", "pe3", "ln2", "ln3", "gum"),
methods = c("lmom"),
return_period = c(10, 25, 50, 100),
gof = TRUE,
diagnostics = TRUE,
select = "ai"
)

alea_results(batch, "stations")
alea_results(batch, "fits")
alea_results(batch, "selected_models")
alea_results(batch, "return_levels")
alea_results(batch, "gof")
alea_results(batch, "diagnostics")
alea_results(batch, "errors")
```

Batch workflows use structured error capture. A failure for one station,
distribution, method, return-level calculation, goodness-of-fit calculation,
diagnostic calculation, or selection step does not stop the full workflow.

## Plotting and export

All plot methods return `ggplot` objects.

```{r plots, eval = FALSE}
p <- plot(fit, type = "return_level")

alea_save_plot(
p,
filename = "return_level_plot.png",
width = 7,
height = 5,
dpi = 300
)
```

ALEA-R can also export data frames and flat batch result tables:

```{r export, eval = FALSE}
alea_export(rl, path = "return_levels.csv")
alea_export(batch, path = "batch_results", type = "all")
```

## Learning examples

Teaching-oriented ALEA-R workflow scripts are available in the `examples/`
folder.

These examples use public Paraopeba hydrological data and demonstrate common
frequency-analysis workflows:

- single-site frequency analysis;
- comparison of candidate distributions;
- return levels and bootstrap confidence intervals;
- goodness-of-fit, diagnostics, and AI-assisted selection;
- small batch analysis;
- plots and exports.

Run the examples from the package root directory. For example:

```{r learning-examples-source, eval = FALSE}
source("examples/01_single_site_basic_workflow.R")
```

The example data files are stored in:

```text
examples/data/
```

Generated teaching outputs, such as exported plots and CSV files, are written to:

```text
examples/output/
```

The examples are designed for learning and classroom use. They use only the
public ALEA-R API and the distributions supported in the current release:
`gev`, `gpa`, `pe3`, `ln2`, `ln3`, and `gum`.

Installed vignettes can be listed with:

```{r browse-vignettes, eval = FALSE}
utils::browseVignettes("ALEA")
```

When installing from GitHub, use `build_vignettes = TRUE` if you want these
vignettes to be installed locally.

## Current limitations

The initial implementation does not include:

- LP3;
- Portuguese API aliases;
- calibrated goodness-of-fit p-values;
- chi-square goodness-of-fit tests;
- parameter confidence intervals;
- asymptotic or delta-method return-level confidence intervals;
- HidroWeb data access in the core package.

## Citation

Please cite ALEA-R in reports and publications where it supports the analysis.

```{r citation, eval = FALSE}
citation("ALEA")
```