Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/fabnavarro/sgwt-sure

Data-driven Thresholding in Denoising with Spectral Graph Wavelet Transform.
https://github.com/fabnavarro/sgwt-sure

data-science graph-signal-processing machine-learning statistics

Last synced: 15 days ago
JSON representation

Data-driven Thresholding in Denoising with Spectral Graph Wavelet Transform.

Awesome Lists containing this project

README

        

---
output:
#pdf_document:
md_document:
variant: markdown_github
bibliography: readmebib.bib
---
```{r Setup, include=FALSE}
library(gasper)
knitr::opts_chunk$set(
collapse = TRUE,
fig.align="center",
comment = "#>"
)
```

# Data-driven Thresholding in Denoising with Spectral Graph Wavelet Transform

The codes in this repository implement an efficient generalization of the Stein's Unbiased Risk Estimate (SURE) for signal denoising/regression on graphs using Spectral Graph Wavelet Transform @hammond2011wavelets. In particular, they allow to reproduce the simulations presented in @de2019data.

## Installation

The codes are based on the [gasper](https://github.com/fabnavarro/gasper) package @de2020gasper. To install the package, download the latest [ version](https://github.com/fabnavarro/gasper/releases) and run the following command in a terminal:
```{bash, eval=FALSE}
R CMD INSTALL --build gasper_1.1.1.tar.gz
```

Another possibility is to install the development version:
```{r, eval=FALSE}
devtools::install_github("fabnavarro/gasper")
```

or the CRAN version:
```{r, eval=FALSE}
install.packages("gasper")
```

> NOTE
Avoid gasper version 1.1.0 which contains an error in the `laplacian_mat` function (for full matrices).

The package and code execution require the installation of the following external libraries:
```{r, eval=FALSE}
package_list <- c("scatterplot3d",
"Rcpp",
"RcppArmadillo",
"igraph",
"genlasso",
"ggplot2",
"foreach",
"doMC",
"gridExtra",
"RColorBrewer",
"xtable",
"rwavelet",
"dplyr",
"tidyverse",
"sf",
"R.matlab")
```

Install missing packages:
```{r, eval=FALSE}
isinstall <- sapply(package_list,
function(x) x %in% rownames(installed.packages()))
package_list[isinstall]
sapply(package_list[!isinstall], install.packages)
```

For `figure_pitt.R`, the package `sf` is required. This package depends on some dev libraries from your distribution (`libgdal-dev` and `libudunits2-dev` for Ubuntu 18.04).

The matlab scripts are thoses provided by the authors of @WanShaSmoTib:16 and satisfy the same dependencies.

All the data files associated with the noisy realizations of the signals considered for the various graphs as well as the results of the simulations are contained in the folders with the names of the considered graphs.

## Reproduction of figures and tables

Below, all the information are gathered to reproduce the numerical results presented in the paper.

### Note

The comparisons made with graph trend filtering (GTF) involves the matlab and python codes available on the webpage of one of the authors (i.e. via this [url](https://sites.cs.ucsb.edu/~yuxiangw/resources.html), [gtf_code.zip](https://sites.cs.ucsb.edu/~yuxiangw/codes/gtf_code.zip) and [code-to-run-wavelets.zip](https://sites.cs.ucsb.edu/~yuxiangw/codes/code-to-run-wavelets.zip)).

For $k=0$, the R `genlasso` package was also used. In particular, for the simulations corresponding to the Pittsbug graph using the data of @WanShaSmoTib:16 as well as Figure 4. The major difference between the R package and the matlab code lies in the algorithm for solving the underlying optimization problem. Indeed, the matlab code is based on the methodology and C++ code developed in @chambolle2009total and therefore is much faster than the R package.

Since the calculations can be long (e.g. several days for the numerical experiments performed for the facebook graph for $k>0$), the results have been stored in `.mat` files and the scripts to regenerate them are also provided.

### The Minnesota Roads Graph

Load `figure_minnesota.R` to reproduce Figure 1.
```{r, eval=TRUE, fig.width=3, fig.height=3}
source("figure_minnesota.R")
```

The results (associated with our methodology) presented in the table 1 have been stored in the files `res001A2MC10.Rdata` and `res0001A4MC10.Rdata`. The table can be regenerated by running the script `res2LaTeXtable.R`
```{r, eval=TRUE, warning=FALSE, message=FALSE}
source("res2LaTeXtable.R")
```
The realizations are stored in the three `table_minesota_f*_sig*.mat*.mat` files (one file per noise level/signal). The results corresponding to trend filtering ($k=0,1,2$) appearing in the table 1 are obtained by running the script `table_minesota_trend.m`.

The set of values that the regularization parameter can take is the same length as the one used for our methodology (i.e. the number of nodes in the graph). It allows to avoid the need to manually calibrate the bounds and the grid step for the GTF.

In addition, computations for our methodology can be run from the files `table_minnesota_nonp.R`.
```{r, eval=FALSE}
source("table_minnesota_nonp.R")
```

A parallel version is also available in `table_minnesota.R` (with `DoMC` to be tuned according to your hardware). Though, this version can not be run in GUI mode!

### The Facebook Graph

Run `facebook_randWalkPoisson.m` to reproduce Figure 2. The results associated with the level dependent SGWT can be regenerated by executing the script `facebookR.R`. Those for the trend part by executing `facebook_randWalk.m`, `facebook_poisson_sparse.m` and `facebook_poisson_dense.m` (they are extracted from the sources and require the matlab toolbox mentioned in the Note). The execution of each script takes between 3 to 4 days on a standard laptop. Note also that, for the calibration of the trend regularization parameter, an exhaustive search on a set of values similar in lenth to the set used with the Donoho trick for our methodology (i.e. 4039) is not feasible here. The boundaries and the number of grid points correspond to those present in the original matlab code file provided by the authors of @WanShaSmoTib:16.

Figure 3 can be regenerated by running the script `facebookb2.m`.

### The Pittsburgh Census Tract Graph

Load `figure_pitt.R` to reproduce the figure of the Pittsburgh graph.
```{r, eval=TRUE, message=FALSE}
source("figure_pitt.R")
```

Comparison results using the data from the "Graph Trend Filtering" paper example are reproducible by running the `pitt_f_y_formGTF.R` file. The `pittsburgh.mat` file (containing $f$ and $y$) comes from the code associated with the GTF paper and was downloaded here https://sites.cs.ucsb.edu/~yuxiangw/resources.html (i.e. `gft_code` folder here https://sites.cs.ucsb.edu/~yuxiangw/codes/gtf_code.zip), then stored in the `pittsburgh.rda` file. The adjacency matrix associated with the Pittsburgh graph was obtained from the `Exp_10copies_several_wavelets.m` script accessible via the same url.
```{r, eval=FALSE}
source("pitt_f_y_formGTF.R")
```

For this example and the figure of the Pittsburgh graph, we have added a comparison of calculation times between the two methods (although this was not mentioned in the article). For one run our approach requires less than 1 second (on a standard laptop, Intel Core [email protected] LP-DDR3@2133MHz) and the fused lasso (with a default maximum iteration number of 2000) about 20. For the ten runs, our approach requires less than 2 seconds and the fused lasso about 4 minutes. For our approach most of the computation time is driven by the diagonalization and construction of the frame, but these only need to be computed once. The evaluation of the SURE afterwards is very fast. For the fused lasso, when the number of iterations has to be increased (to ensure convergence), calculation times may increase considerably. Note that the use of the C++ provided by @chambolle2009total makes it possible to compensate for this computational cost for trend filtering with $k=0$. Therefore, in order to facilitate the execution of `table_minnesota.R` and `table_pitt.R` we have parallelized the code.

The results (associated with our methodology) in table 2 have been stored in the file `resPitt.Rdata`, the table can be regenerated by running the script `res2LaTeXtable.R` (uncomment the line corresponding to the file path).
```{r, eval=FALSE}
source("res2LateXtable.R")
```
The corresponding realizations are stored in the three `table_pitt_sig*.mat` files (one file per noise level). The results for the trend filtering ($k=0,1,2$) as well as two other wavelet estimators of @sharpnack2013detecting appearing in the table 1 are obtained by running the script `table_pitt_trend.m`. For comparison, this script also provides the two other wavelet approaches considered in @WanShaSmoTib:16.

In addition, calculations (associated with our methodology) can be re-run from the files `table_pitt.R`.
```{r, eval=FALSE}
source("table_pitt.R")
```

### Real Dataset: New York City Taxis
The results presented in section Real Dataset: New York City Taxis are reproducible by executing the script `nyc.R`.
```{r, eval=FALSE}
source("nyc.R")
```

### Correlated Noise
The results presented in section Correlated Noise correlated noise are reproducible by running the script `colored.R`.
```{r, eval=FALSE}
source("colored.R")
```

### Further Experiments with Block Thresholding
The tests performed for the block method are in the script `blockPitt.R`.
```{r, eval=FALSE}
source("blockPitt.R")
```

# References