Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/jonathom/sar_and_soil

Experimental time series analysis in R, working with S1 and precipitation data to derive information about the underlying soil and vegetation
https://github.com/jonathom/sar_and_soil
Last synced: 19 days ago
JSON representation
Experimental time series analysis in R, working with S1 and precipitation data to derive information about the underlying soil and vegetation
Host: GitHub
URL: https://github.com/jonathom/sar_and_soil
Owner: jonathom
Created: 2020-10-30T15:18:38.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2021-03-28T19:12:20.000Z (almost 4 years ago)
Last Synced: 2024-10-30T07:02:50.270Z (2 months ago)
Language: HTML
Homepage:
Size: 73.3 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        Report on Internship at EFTAS

=============================

This is a summary on what I did during my internship at EFTAS in October

2020. The overall topic was to make a connection between SAR backscatter

and soil moisture, also including precipitation data. The work can be

split into two main approaches: the time series and the classification

approach.

Data

====

All functions are adapted to the years 2017-2018, for which the `.tif`s

contain 29 and 31 images, respectively. Thus, 60 images are overall

available, a fact that is hardcoded at various places. If you wish to

adapt this script to another dataset, change this value to whatever

number of bands is available.

### Packages

``` r

setwd("/home/petra/Praktikum")

library(stars)

library(dplyr)

library(rgdal)

library(gdalUtils)

library(raster)

library(lubridate)

library(sp)

library(ggplot2)

library(zoo)

library(e1071)

```

### Intensity and Shapefiles

The .tif files each contain data for a year. They are dB - scaled

intensity measurements from Sentinel 1 A. All scenes have a offset of 12

days, except for a data gap in January 2017.

``` r

# load files as proxy

int_VV_2017 <- read_stars("VV_2017_clip.tif", proxy=TRUE)

int_VH_2017 <- read_stars("VH_2017_clip.tif", proxy=TRUE)

int_VV_2018 <- read_stars("VV_clip.tif", proxy=TRUE)

int_VH_2018 <- read_stars("VH_clip.tif", proxy=TRUE)

# assign names

names(int_VV_2017) <- "VV"

names(int_VH_2017) <- "VH"

names(int_VV_2018) <- "VV"

names(int_VH_2018) <- "VH"

# water training shapefiles

water_shape_2017 <- read_sf("water (copy).shp")

water_shape_2018 <- read_sf("water.shp")

water_shape_2017_july07 <- read_sf("water_shape_intersec.shp")

# study area

study_area <- read_sf("study_area.shp")

# shape

shape <- read_sf("antragsfl_16_17_18.shp")

shape <- st_transform(shape, crs=st_crs(int_VV_2018))

# moni

moni <- read_sf("moni_shap.shp")

# double bounce

db <- read_sf("double_bounce.shp")

```

Dates of scenes are read via `gdalinfo()`.

``` r

# make dates

info = gdalinfo("VV_2017_clip.tif")

descr = info[grepl(info, pattern = "Description = ")]

descr = gsub(descr, pattern = "Description = ", replacement = "")

dates2017 <- as_date(descr)

info = gdalinfo("VV_clip.tif")

descr = info[grepl(info, pattern = "Description = ")]

descr = gsub(descr, pattern = "Description = ", replacement = "")

dates2018 <- as_date(descr)

```

Data is then made available as a `stars-proxy` pointer and a date

vector. The two are only combined when specific areas or time steps are

read into memory via the `loadPolygon` function declared later on, to

save time and memory usage.

``` r

# make proxy of both years

proxy1718 <- c(int_VV_2017, int_VV_2018, along = "band")

# c() dates

dates <- c(dates2017, dates2018)

```

### Precipitation Data

Is loaded as a separate object, as it allows for more flexibility in how

to aggregate the rain data. `stars` objects must have the exact same

dimensions (spatial and temporal) and by handling rain data separately

we save much time otherwise used on warping the rain geometry. The

precipitation dataset is averaged over the wider bog area since the data

is not precise enough to be meaningfully cut to even smaller areas.

``` r

# load stars, read and assign dates as dimension

rain_all_2017 <- read_stars("rain_2017_clip_ordered.tif")

info = gdalinfo("rain_2017_clip_ordered.tif")

descr = info[grepl(info, pattern = "Description = ")]

descr = gsub(descr, pattern = "Description = ", replacement = "")

rainDates2017 <- as_date(descr)

rain_all_2017 <- st_set_dimensions(rain_all_2017, 3, values = rainDates2017, names = "time")

# clip to wider bog area, convert to dataframe

study_area_rain <- st_transform(study_area, crs=st_crs(rain_all_2017))

rain_all_2017 <- rain_all_2017[study_area_rain[1,]]

```

    ## although coordinates are longitude/latitude, st_intersects assumes that they are planar

``` r

rain.all.2017.df <- as.data.frame(st_apply(rain_all_2017, "time", mean, na.rm = TRUE))

# another method: make vector of all days of a year, assign as dimension

rain_all_2018 <- read_stars("rain_clip.tif")

all_days <- seq(as.Date("2018-01-01"), as.Date("2018-12-31"), by="days")

rain_all_2018 <- st_set_dimensions(rain_all_2018, 3, values = all_days, names = "time")

rain_all_2018 <- rain_all_2018[study_area_rain[1,]]

```

    ## although coordinates are longitude/latitude, st_intersects assumes that they are planar

``` r

rain.all.2018.df <- as.data.frame(st_apply(rain_all_2018, "time", mean, na.rm = TRUE))

# create rain dataframe, rows = days, columns = time, rain measurement

rain_all <- rbind(rain.all.2017.df, rain.all.2018.df)

```

### loadPolygon Function

A function that changed with the different requirements but is now in

its final stage. It takes a polygon (or shape) as its spatial extent and

an interval as temproal extent and loads the corresponding `stars`

object into memory (since all backscatter `*.tif`s are huge and not held

in memory).

``` r

# function takes a shape as an AOI and a year (2017 or 2018)

loadPolygon <- function(shape, interval) {

  dates <- dates[interval[1]:interval[2]]

  int_vv <- st_as_stars(proxy1718[shape], along = "band")

  int_vv <- int_vv[,,,interval[1]:interval[2]]

  int_vv <- st_set_dimensions(int_vv, 3, val = dates, names = "time")

  names(int_vv) <- c("VV")

  return(int_vv)

}

```

Time Series Approach

====================

The idea of this approach is to extract information about different

kinds of scatterers or different soil moisture contents by looking at a

time series of values.

### Compare Two Different Patches

An earlier approach: compare two patches to find out more about their

behaviour throughout the year, in context to each other.

``` r

# for 2017, load two polygons

water_patch <- loadPolygon(moni[8,], c(1,29)) %>% st_apply(., "time", mean,  na.rm = TRUE) %>% as.data.frame()

field_patch <- loadPolygon(moni[5,], c(1,29)) %>% st_apply(., "time", mean,  na.rm = TRUE) %>% as.data.frame()

names(water_patch) <- c("time", "VV_water")

names(field_patch) <- c("time2", "VV_field")

patches <- cbind(field_patch, water_patch) # create one df

ggplot(patches, aes(x = time)) +

  geom_line(aes(y = VV_water, color = "Water")) +

  geom_line(aes(y = VV_field, color = "Field")) +

  scale_color_manual(name = "Patch", values = c("Water"="blue", "Field"="brown")) +

  theme(legend.position = "bottom") +

  ggtitle("Backscatter Mean of Selected Polygons") + ylab("backscatter")

another_water_patch <- loadPolygon(moni[9,], c(1,29)) %>% st_apply(., "time", mean,  na.rm = TRUE) %>% as.data.frame()

names(another_water_patch) <- c("time3", "VV_water2")

water_patches <- cbind(water_patch, another_water_patch) # create another df

ggplot(water_patches, aes(x = time)) +

  geom_line(aes(y = VV_water, color = "Water")) +

  geom_line(aes(y = VV_water2, color = "Water2")) +

  scale_color_manual(name = "Patch", values = c("Water"="blue", "Water2"="lightblue")) +

  theme(legend.position = "bottom") +

  ggtitle("Backscatter Mean of Selected Polygons") + ylab("backscatter")

```



### Aggregate Rain

The next step was to plot backscatter and rain data. An important

question here is how the rain should be aggregated (since depicting

single days might confuse the plot). Different solutions were found. At

first, a three day sum was calculated. Then, computations based on what

happens between two acquisitions were considered. The following code

simply calculates the rain sum between each scene.

``` r

# cut the bog polygons to our study area

inter_shape <- shape[lengths(st_intersects(shape, study_area[1,])) != 0,]

# cut geometries depending on sponsorship status

spons <- inter_shape[inter_shape$Förderung == "in Förderung",]

nospons <- inter_shape[is.na(inter_shape$Förderung),]

# load polygon, aggregate as mean and convert to data frame

spons.df <- loadPolygon(spons, c(1,60)) %>% st_apply(., "time", mean, na.rm = TRUE) %>% as.data.frame()

nospons.df <- loadPolygon(nospons, c(1,60)) %>% st_apply(., "time", mean, na.rm = TRUE) %>% as.data.frame()

# bind, select, rename

df <- cbind(spons.df, nospons.df) %>% .[,c(1,2,4)]

names(df) <- c("time", "sponsVV", "nosponsVV")

# rain aggregation over both years

agg_rain <- c(rain_all[1,2] + rain_all[2,2] + rain_all[3,2] + rain_all[4,2] + rain_all[5,2] + rain_all[6,2] + rain_all[7,2] + rain_all[8,2]) # to handle the first sum that only has 8 values

for (i in 2:nrow(df)) { # i corresponds to row in S1 acquisitions

  offset <- 8 # first S1 acquisition is 08/01, rain data starts 01/01

  inter <- 12 # 12 days between acquisitions

  start <- (i - 2) * inter + offset + 1 # exclude last from prev sum

  end <- (i - 1) * inter + offset

  sum <- 0 # create sum as 0

  # for the calculated rows of rain data frame

  for (j in start:end) {

    sum <- sum + rain_all[j,2] # add day to day rain

  }

  agg_rain <- c(agg_rain, sum) # append to rain vector

}

df <- cbind(df, agg_rain) # bind to intensity data

names(df) <- c("time", "sponsVV", "nosponsVV", "rain")

ggplot(df, aes(x = time)) + 

  geom_bar(aes(x = time - 6, y = rain), stat = 'identity', fill = "lightblue", alpha = 0.8) + # time - 6 so that rain bars are displayed in between S1 acquisition dates

  geom_line(aes(y = sponsVV * 4 + 70, color = "in Förderung")) + # scale VV as enhancement of effects

  geom_point(aes(y = sponsVV * 4 + 70, color = "in Förderung")) +

  geom_line(aes(y = nosponsVV * 4 + 70, color = "nicht in Förderung")) +

  coord_cartesian(ylim = c(0,60)) + 

  ggtitle("Mean of VV in Bog Polygons and Precipitation") + xlab("Time") +

  ylab("Precipitation in mm/m²") + 

  scale_y_continuous(sec.axis = sec_axis(~. *0.25 -17.5, name = "Intensity in dB")) + 

  scale_color_manual(name = "Förderung", values = c("in Förderung"="blue", "nicht in Förderung"="black")) +

  theme(legend.position = "bottom")

```



Or: build weighted sums of the precipitation between scenes. The

following depicts a weight of 1/(\# of day before acquisition), so that

the day of the acquisition has weight 1, the day before has weight 1/2,

1/3 (…) and so on \[code hidden\].



Classification Count Approach

=============================

Goal of the classification approach was to find thresholds that

characterize different scatterers and thereby distinguish land cover

types. In a vegetated areas a C-band radar signal inhibits different

grades of volume scattering. Water however reflects the signal almost

perfectly, leading to very low backscatter measures. In a case where

vegetation is standing in water, total reflection and also double bounce

effects have to be considered. Whether a signal is volume- or

surface-scattered also depends on its polarization. As a start, the

detection of water and the detection of double bounce was carried out by

digitizing training polygons and finding thresholds with a SVM.

### Water Thresholds

Because it is the most simple approach, only training data from one

point in time (03/01/2018) were generated. The found Threshold was \~

-15dB. This threshold was then applied on a heavy rain event in the

summer of 2017. To give in idea on the magnitude and circumstances on

this example:

``` r

# plot closeup of prec data and S1 dates

ggplot(rain_all, aes(x=time)) +

  geom_bar(aes(y=mean), fill = "lightblue", stat='identity') +

  xlim(as.Date(c("2017-06-15", "2017-07-15"))) +

  geom_vline(xintercept = as.Date("2017-06-25")) +

  geom_vline(xintercept = as.Date("2017-07-07")) +

  ggtitle("Closeup of Precipitation Data + S1 Acquisition Dates") +

  ylab("precipitation in mm/m²")

heavyrain <- loadPolygon(study_area[1,], c(14,15)) # load data from 2017 for the area

plot(heavyrain) # plot the time steps in question

```



We observe an overall increase in backscatter, most likely due to the

increase in soil moisture. Then we plot the time steps again, with what

was found as a threshold for open water on fields / on the bog area:

``` r

threshold <- -15.34187 # set found threshold

image(heavyrain[,,,1], col = c("white", "black"), breaks = c(-35, threshold, 35), main="25/06, White: < -15.3; Black: > -15.3")

image(heavyrain[,,,2], col = c("white", "black"), breaks = c(-35, threshold, 35), main="07/07, White: < -15.3; Black: > -15.3")

```



The images show a decrease in surfaces below the found threshold for

water. With a rain event of this magnitude we would expect several

puddles and dents to flood, leading to more surface below the water

threshold. Experimentally, a much lower threshold was applied, leading

to the anticipated effect:

``` r

threshold <- -20.5 # apply a much lower threshold

image(heavyrain[,,,1], col = c("white", "black"), breaks = c(-35, threshold, 35), main="25.06., White: < -20.5; Black: > -20.5")

image(heavyrain[,,,2], col = c("white", "black"), breaks = c(-35, threshold, 35), main="07.07., White: < -20.5; Black: > -20.5")

```



Therefore, the validity of the used training data was questioned. A new

SVM training was applied. While before, only training data for january

of different years was computed, this approach used two scenes from

winter and summer of 2017 in order to possible seasonal influences. 2017

was chosen as the yeare with more precipitation and therefore a higher

chance to generate correct training data (since training polygons could

only be guessed with no ground truth available). A function is written

to generate training data from a shapefile that is time step specific

and contains a `$type` attribute for training and an `$area` attribute

that describes the surrounding area.

``` r

# function to generate training data from shapefiles

trainPolygons <- function(water_shape, timestep)    {   

  end <- length(water_shape$type) # must contain type attr

  for (k in 1:end) {

    type <- water_shape[k,]$type

    area <- water_shape[k,]$area

    # select which classes should be included

    if(type != "street" && !is.null(area) && (area == "field" || area == "lake")) {

      poly <- loadPolygon(water_shape[k,], c(1,60))

      # set VV, timestep

      poly <- poly[1,,,timestep]

      poly.df <- as.data.frame(poly)

      poly.df$type <- water_shape[k,]$type

      poly.df$sitch <- paste0(water_shape[k,]$type, " in ", water_shape[k,]$area)

      poly.df <- poly.df[complete.cases(poly.df),]

      if(k == 1) {

        val.svm <- poly.df[,4:6]

      } else {

        val.svm <- rbind(val.svm, poly.df[,4:6])

      }

    }

  }

  return(val.svm)

}

```

``` r

# make training vector with function

val1 <- trainPolygons(water_shape_2017, 1)

val2 <- trainPolygons(water_shape_2017_july07, 15)

val.svm <- rbind(val1, val2)

# prepare data

val.svm$type <- as.factor(val.svm$type)

attach(val.svm)

x <- subset(val.svm, select=c(-type, -sitch))

y <- type

svmmod <- svm(x,y) # make model

table(predict(svmmod,x), y) # make prediction, confusion matrix

```

    ##        y

    ##         field water

    ##   field   967     0

    ##   water     0  1252

``` r

svmmod$x.scale$`scaled:center` # threshold is

```

    ## [1] -18.82414

``` r

ggplot(val.svm, aes(x=sitch, y=VV)) + ylab("VV Backscatter") +

  geom_boxplot() + xlab("Type and Surrounding Area") + 

  geom_hline(yintercept = svmmod$x.scale$`scaled:center`, color = "darkblue") +

  ggtitle("Threshold for Open Water in Field/Bog Areas")

```

![](report_files/figure-markdown_github/water-svm-1.png)

### Double Bounce Threshold

Double bounce training data is constructed similar to the example above,

except that it only uses a scene in january 2018 \[code hidden\].

    ## The following objects are masked from val.svm:

    ## 

    ##     type, VV

``` r

table(predict(svm.db,x), y) # make prediction, confusion matrix

```

    ##        y

    ##         other urban

    ##   other  1214    29

    ##   urban     1   221

``` r

svm.db$x.scale$`scaled:center` # threshold is

```

    ## [1] -8.348805

``` r

# boxplot

ggplot(val.db, aes(x=type, y=VV)) +

  geom_boxplot() + ylab("VV Backscatter") + xlab("Type") +

  geom_hline(yintercept = svm.db$x.scale$`scaled:center`, color = "yellow")

```



### Count Rasters - Single Threshold

The idea behind count rasters is to find out how often certain areas are

classified as a certain class, i.e. their pixel values lie in a

specified range that is suspected of representing e.g. water. At first

such sums were calculated for 2017 and 2018 separately in order to

compare the years. A function was written that cuts given areas up into

tiles to keep memory usage at acceptable levels during this calculation.

The code is hidden because a similar function is shown later on. The

machanism of how 0/1 counts are assigned is shown anyway:

``` r

scene[scene >= threshold] <- 1

scene[scene < threshold] <- 0

```

For now, these rasters are read from file

``` r

ras2017 <- raster("./report/water_thresh/water_thresh_count_2017-01-08_2017-12-22.tif")

ras2018 <- raster("./report/water_thresh/water_thresh_count_2018-01-03_2018-12-29.tif")

colo = viridisLite::inferno(29)

breaks = seq(0, 29, 1)

image(ras2017, col = colo, breaks = breaks, main="2017")

colo = viridisLite::inferno(31)

breaks = seq(0, 31, 1)

image(ras2018, col = colo, breaks = breaks, main="2018")

```



The fact that the much more dry year of 2018 exhibits more counts of

values below the threshold might point to more heavy rain events (more

flooding) or to the possibility that values below our threshold do not

point exclusively towards flooded terrain but also to other reasons for

low backscatter (e.g. dry soil). Water surfaces were detected in both

years.

### Count Rasters - Classification

This classification can also be done for multiple classes. But in order

to do so, we also need more thresholds. As calculated before we can use

-18.8 for water and the double bounce threshold of -8.3. Another

threshold for open water in field areas was calculated with a different

set of training data: -17.6. We can read another threshold from the

figure with the title “Threshold for Open Water in Field/Bog Areas”:

-22.5 as a threshold for open water (as in lakes). Missing threshold

values between -17 and -8 were filled by simply making up the values of

-10, -13 and -14. The function to calculate these rasters is the same

tiling-function as used before, with a slight change in how 0/1 values

are assigned:

``` r

# interval <- c(lower, upper) threshold

scene[scene > interval[2]] <- 0

scene[scene < interval[1]] <- 0

scene[scene != 0] <- 1

```

By assigning these values as count thresholds, different count rasters

can be computed. The most interesting RGB combinations are shown. In the

first plot, all lower classes are shown. We see that areas rarely fall

lower than -22.5. Water tends to range in our dedicated water class and

reveals domains that tend to flood. We see that the bog area has overall

more counts of low backscatter (blue).

``` r

# results of these classifications are saved in 'completesum_ALL.tif'

# band order is down 1-3, middle 1-3, up 1-3, mean

stack_all <- raster::stack("./report/class_count_ras/completesum_ALL.tif")

plotRGB(stack_all, r=1, g=2, b=3, scale = 60, axes=TRUE, main="R: x < -22.5 | G: -22.5 < x < -17.6 | B: -17.6 < x < -14")

```



In the following plot we see a classification over almost the full range

of the backscatter spectrum. Red depict areas with large high

backscatter counts (while double bounce pixels above -8.3 are masked

from the image and appear black, as do pixels \< -22.5). Grenn shows the

counts in the mid-range, and blue is assigned to the water class.

``` r

plotRGB(stack_all, r=8, g=5, b=2, stretch="lin", axes = TRUE, main = "R: -13 < x < -8.3 | G: -17.6 < x < -13 | B: -22.5 < x < -17.6")

```



### Count Rasters - Imaging Change

In this approach it was tried to avoid setting too many thresholds

manually and let the data speak instead. Different Measures of change

were quantified and can be displayed in different RGB combinations. The

idea for this was given in this

[paper](https://www.mdpi.com/2072-4292/8/10/795/htm#). The question was

how we could distinguish surfaces with alternating moisture content from

each other. The calculated bands are:

-   change count

    -   how often does the pixel change? -\> count of change

    -   ranges from 0 to \# of time steps

    -   higher values with much change

    -   parameter `change detection threshold [cdt]`: sensitivity (what

        counts as a change?)

-   change frequency

    -   how stable is the pixel? -\> mean of no-change-streaks (duration

        of periods with no change is calculated, then averaged and

        rounded)

    -   ranges from 0 to \# of time steps

    -   higher values correspond to longer periods of no change

    -   it might also be useful to `ceiling()` these values, or take the

        `max()`

-   change range

    -   what magnitude does the change have? -\> difference of max and

        min

    -   difference of `max` and `min` in pixel time series is computed

    -   higher values indicate a wider range

-   mean

    -   higher values for a higher mean backscatter

-   threshold count

    -   count over the time period of one specific range of values -\>

        how often were the pixels between a and b (see chapter “Count

        Rasters - Single Threshold”)

    -   ranges from 0 to \# of time steps

    -   high values depict high counts in this class

They are calculated as follows:

``` r

#### calculate length of intervals of no change #

calcChangeLength <- function(freq) {

  # count streaks of change / no change

  counts <- rle(freq)

  # convert to matrix

  counts <- matrix(c(counts[[1]], counts[[2]]), ncol = length(counts[[1]]), byrow = TRUE)

  # rounded mean streak length of 'no change'

  round(mean(counts[1, counts[2,] == 0]))

}

#### stitch rasters from folders ################

stitchRasters <- function(dirName, folderName, fileName, dates) {

  dir_list <- list.files(dirName)

  for (a in 1:length(dir_list)) {

    str <- paste0(dirName, "/", dir_list[a])

    ras <- raster(str)

    if(a < 2) {

      allRas <- ras

    } else {

      allRas <- raster::merge(allRas, ras)

    }

  }

  name <- paste0("/", fileName, "_", dates[1], "_", dates[2], "flag.tif")

  writeRaster(allRas, paste0(folderName, name), overwrite = TRUE)

}

#### characterize function #######################

characterize <- function(ext, interval, thresholds, cdt, folderName) {

  

  # make the folders

  dir.create(folderName)

  

  change_dir <- paste0(folderName, "/change_count_tiles")

  dir.create(change_dir)

  

  range_dir <- paste0(folderName, "/change_range_tiles")

  dir.create(range_dir)

  

  mean_dir <- paste0(folderName, "/mean_tiles")

  dir.create(mean_dir)

  

  freq_dir <- paste0(folderName, "/change_freq_tiles")

  dir.create(freq_dir)

  

  thresh_dir <- paste0(folderName, "/threshold_count_tiles")

  dir.create(thresh_dir)

  

  #### TILING ####################################

  # bbox can be used for stars subsetting []

  ext <- st_bbox(ext) # ext is xmin ymin xmax ymax

  # study area one has 11860, 10.000 seems OK tile size

  # calculate span

  x.span <- ext[3] - ext[1] # X

  y.span <- ext[4] - ext[2] # Y

  # calc good number of cuts

  x.cuts <- ceiling(x.span / 10000)

  y.cuts <- ceiling(y.span / 10000)

  # calc cut length

  x.cut.by <- x.span / x.cuts

  y.cut.by <- y.span / y.cuts

  

  # tile counter

  count <- 0

  # go through all cuts in X direction

  for (i in 1:x.cuts) {

    # go through all cuts in Y direction

    for (j in 1:y.cuts) {

      count <- count + 1

      # make extent object

      xmin <- ext[1] + (i - 1) * x.cut.by

      xmax <- ext[1] + i * x.cut.by

      ymin <- ext[2] + (j - 1) * y.cut.by

      ymax <- ext[2] + j * y.cut.by

      

      cutbox <- ext

      cutbox[1] <- xmin

      cutbox[2] <- ymin

      cutbox[3] <- xmax

      cutbox[4] <- ymax

      

      #### DO FOR EACH TILE #####################

      

      # make tile name

      if(count < 10) {

        name <- paste0("/tile_0", count, ".tif")

      } else {

        name <- paste0("/tile_", count, ".tif")

      }

      

      # load stars  

      tile <- loadPolygon(cutbox, interval)

      

      # make 0 change_count obj

      change_count <- tile[,,,1]

      change_count[change_count < 1000] <- 0

      

      # assign full time span stars to build mean frequency

      stars_freq <- tile

      

      lengt <- length(interval[1]:interval[2])

      leng <- lengt - 1

      # go through time steps of tile

      for (k in 1:leng) {

        mag <- abs(abs(tile[,,,k]) - abs(tile[,,,k+1]))

        # direction of change can be extracted with one less "abs" and sum

        mag[mag < cdt] <- 0

        mag[mag >= cdt] <- 1

        change_count <- change_count + mag

        

        # frequency of change

        if(k < 2) {

          # create freq as first change image

          freq <- mag

        } else {

          # bind new mag to freq to get time series of change

          freq <- c(freq, mag)

        }

      }

      

      write_stars(change_count, paste0(change_dir, name))

      

      freq <- st_apply(freq, c("x", "y"), FUN = calcChangeLength)

      write_stars(freq, paste0(freq_dir, name))

      

      # change range 

      # mask changes from < cdt ?

      range_max <- st_apply(tile, c("x", "y"), FUN = "max")

      range_min <- st_apply(tile, c("x", "y"), FUN = "min")

      range <- range_max - range_min

      

      # rescale

      # max_value <- interval[2] - interval[1] + 1

      # range <- rescale(range, to = c(0, max_value))

      

      write_stars(range, paste0(range_dir, name))

      

      # mean

      tile_mean <- st_apply(tile, c("x", "y"), mean)

      write_stars(tile_mean, paste0(mean_dir, name))

      

      # create thresholds sum

      for (l in 1:lengt) {

        scene <- tile[,,,l]

        scene[scene > thresholds[2]] <- 0

        scene[scene < thresholds[1]] <- 0

        scene[scene != 0] <- 1

        

        if(l == 1) {

          thresh_sum <- scene

        }

        else {

          thresh_sum <- thresh_sum + scene

        }

      }

      # write

      write_stars(thresh_sum, paste0(thresh_dir, name))

    }

  }

  

  # stitch rasters

  dateName <- c(dates[interval[1]], dates[interval[2]])

  

  stitchRasters(change_dir, folderName, "change_count", dateName)

  stitchRasters(freq_dir, folderName, "change_freq", dateName)

  stitchRasters(range_dir, folderName, "change_range", dateName)

  stitchRasters(mean_dir, folderName, "mean", dateName)

  stitchRasters(thresh_dir, folderName, "thresh_count", dateName)

  

  # flag keeps preexisting raster from being included

  raster_list <- list.files(folderName, rec=FALSE, pattern = "*flag.tif")

  raster_list <- paste0(folderName, "/", raster_list)

  

  tifName <- paste0(folderName, "/all_", dateName[1], "_", dateName[2], ".tif")

  

  writeRaster(raster::stack(raster(raster_list[1]), raster(raster_list[2]), raster(raster_list[3]), raster(raster_list[4]), raster_list[5]), tifName, overwrite = TRUE)

  

  print("bands are in order:")

  print(raster_list)

}

```

And can be called as:

``` r

## enter parameters

## folder name of this specific run with ./

folderName <- "./charac_test_run"

## extent

ext <- study_area[2,]

## time span interval 1,29 is 2017, 30,60 is 2018

interval <- c(1,29)

# which interval should be classified and counted?

thresholds <- c(-22.5, -17.6) # water threshold is given

## change detection threshold, in dB (sensitivity of change detection)

cdt <- 2

characterize(ext, interval, thresholds, cdt, folderName)

```

First, we’re looking at the bands one by one:

``` r

sta <- read_stars("./report/charac_five_attr_both_years/all_2017-01-08_2018-12-29.tif")

study_area <- st_transform(study_area, crs=st_crs(sta))

sta_small <- sta[study_area[1,]]

plot(sta_small[,,,1], main = "Change Count")

plot(sta_small[,,,2], main = "No-Change Frequency")

plot(sta_small[,,,3], main = "Change Range")

plot(sta_small[,,,4], main = "Mean")

plot(sta_small[,,,5], main = "Threshold Count")

```



Example use case RGB to identify water / estimate wetness of the bog

area. The band over which such areas can be identified is always

assigned to the blue band, explanations can be found in the comments.

``` r

ras <- raster::stack("./report/charac_five_attr_both_years/all_2017-01-08_2018-12-29.tif")

ras <- crop(ras, study_area[1,])

ras <- mask(ras, study_area[1,])

# identify water via the water threshold, display change measures

plotRGB(ras, r=1, g=3, b=5, stretch = "lin", axes = TRUE, main = "R: change count, G: change range, B: water threshold count")

```



``` r

# identify water / wet areas via water threshold and a low mean (less green)

plotRGB(ras, r=2, g=4, b=5, stretch = "hist", axes = TRUE, main = "R: pixel stability, G: mean, B: water threshold count")

```



``` r

# identify water via a large change range

plotRGB(ras, r=1, g=2, b=3, stretch = "lin", axes = TRUE, main = "R: change count, G: no-change frequency, B: change range")

```



Connecting Time Series and Classification Count

===============================================

The approaches above show some interesting results. However they do so

by collapsing either the temporal or the spatial dimension. Two ideas

have been developed in which collapsing dimensions doesn’t play such a

big role.

### Raster Time Series

In this case, the raster does not contain summarized data for a given

interval but is a form of looking at *all* time steps of the series at

once, as efficiently as possible.

``` r

# time series

plotTS <- function(area, threshold, start, end) {

  # load obj

  area <- st_transform(area, crs = st_crs(proxy1718))

  obj <- loadPolygon(area, c(1,60))

  # make colors for main part of VV dB range of values

  colo = viridisLite::inferno(85)

  breaks = seq(-25, -8, 0.2)

  # build rain statistics

  jahresniederschlag = sum(rain_all$mean)

  durchschnitt = jahresniederschlag / 365

  # plot for start:end

  for (i in start:end) {

    # rain interval

    start.rain <- 8 + (i - 1) * 12 + 1

    if(i == 1) {start.rain <- 0}

    end.rain <- i * 12 + 8

    df <- rain_all[start.rain:end.rain,]

    gesamt <- floor((sum(df$mean)) * 100) / 100

    percent <- (floor((gesamt * 100 / jahresniederschlag) * 100)) / 100

    titl = paste0(df[1,1], " until ", df[nrow(df),1], " // ", gesamt, "mm or ", percent, "% of all rain")

    print(ggplot(df, aes(x = time, y = mean)) +

            geom_bar(stat='identity') + ylab("Precipitation") + xlab("Time") +

            ggtitle(titl) + theme(plot.title = element_text(size = 20)))

    image(obj[1,,,i], col = colo, breaks = breaks, main = dates[i])

    image(obj[1,,,i], col = c("white", "black"), breaks = c(-35, threshold, 35), main = dates[i])

  }

}

```

The previously defined rain data frame is used for rain plotting and

statistics. Some example time steps are shown instead of a whole year.

``` r

plotTS(study_area[1,], -18.82414, 14, 16)

```



### Connecting Count Rasters and Time Series Approach

In the time series above we have observed the phenomenon of the tipping

points. The mean of backscatter increases with increasing rain, until

the curve tips and decreases, presumably because the soil is saturated

with water and the area starts to flood, leading to a total reflection

of the radar signal. This approach aims at excluding these surfaces from

the time series analysis in order to explore how all not-flooded areas

are responding to heavy rain. There have been functions written that do

this task for larger raster areas with a tiling approach and using

`lapply` for the time steps, but here we only look at the observed bog

polygons which we can process on-the-fly.

``` r

# load stars with time steps

values_sta <- loadPolygon(inter_shape, c(1,60))

thresh <- -18.82414

# quickly rebuild sum

for (i in 1:60) {

  tile <- values_sta[,,,i]

  tile[tile > thresh] <- 0

  tile[tile < thresh] <- 1

  if(i == 1) {

    count <- tile

  } else {

    count <- count + tile # count is then the threshold count raster

  }

}

# assign NA

values_sta[count > 0] <- NA # now we have assigned NA to every pixel that ever falls below the water threshold

plot(values_sta[,,,1], main = "Water-Masked Bog Polygons")

```

![](report_files/figure-markdown_github/ts-and-counts-1.png)

We now want to build a time series pot just as before, but this time

with our masked `values_sta` object. We’re using objects defined under

“Time Series Approach” and then “Aggregate Rain”.

``` r

# load polygon, aggregate as mean and convert to data frame

spons.thresh.df <- values_sta[spons] %>% st_apply(., "time", mean, na.rm = TRUE) %>% as.data.frame()

nospons.thresh.df <- values_sta[nospons] %>% st_apply(., "time", mean, na.rm = TRUE) %>% as.data.frame()

# bind, select, rename

dff <- cbind(spons.thresh.df, nospons.thresh.df) %>% .[,c(1,2,4)]

names(dff) <- c("time", "sponsVV", "nosponsVV")

dff <- cbind(dff, agg_rain) # bind to intensity data

names(dff) <- c("time", "sponsVV", "nosponsVV", "rain")

ggplot(dff, aes(x = time)) + 

  geom_bar(aes(x = time - 6, y = rain), stat = 'identity', fill = "lightblue", alpha = 0.8) + # time - 6 so that rain bars are displayed in between S1 acquisition dates

  geom_line(aes(y = sponsVV * 4 + 70, color = "in Förderung")) + # scale VV as enhancement of effects

  geom_point(aes(y = sponsVV * 4 + 70, color = "in Förderung")) +

  geom_line(aes(y = nosponsVV * 4 + 70, color = "nicht in Förderung")) +

  coord_cartesian(ylim = c(0,60)) + 

  ggtitle("Mean of VV in Bog Polygons (that are never assumed to flood) and Precipitation") + xlab("Time") +

  ylab("Precipitation in mm/m²") + 

  scale_y_continuous(sec.axis = sec_axis(~. *0.25 -17.5, name = "Intensity in dB")) + 

  scale_color_manual(name = "Förderung", values = c("in Förderung"="blue", "nicht in Förderung"="black")) +

  theme(legend.position = "bottom")

```

![](report_files/figure-markdown_github/unnamed-chunk-13-1.png)

``` r

# compare the two methods mask/nomask

all.df <- df[,1:3] %>% cbind(., dff$sponsVV, dff$nosponsVV, dff$rain)

all.df$nomaskMean <- rowMeans(all.df[,2:3])

all.df$maskMean <- rowMeans(all.df[,4:5])

ggplot(all.df, aes(x = time)) +

  geom_line(aes(y = maskMean, color = "masked")) +

  geom_line(aes(y = nomaskMean, color = "unmasked")) +

  xlab("Time") + ylab("Intensity VV in dB") + ggtitle("Difference Between Bog Polygons and Bog Polygons with Masked Water")

```

![](report_files/figure-markdown_github/unnamed-chunk-13-2.png)