{"id":18019201,"url":"https://github.com/jonathom/sar_and_soil","last_synced_at":"2026-01-19T16:32:18.809Z","repository":{"id":132551295,"uuid":"308667936","full_name":"jonathom/sar_and_soil","owner":"jonathom","description":"Experimental time series analysis in R, working with S1 and precipitation data to derive information about the underlying soil and vegetation","archived":false,"fork":false,"pushed_at":"2021-03-28T19:12:20.000Z","size":76901,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-04T16:38:48.507Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jonathom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-30T15:18:38.000Z","updated_at":"2021-03-28T19:12:24.000Z","dependencies_parsed_at":null,"dependency_job_id":"ad29768c-87cc-471c-baa9-9cfe48b8d36f","html_url":"https://github.com/jonathom/sar_and_soil","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jonathom/sar_and_soil","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathom%2Fsar_and_soil","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathom%2Fsar_and_soil/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathom%2Fsar_and_soil/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathom%2Fsar_and_soil/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jonathom","download_url":"https://codeload.github.com/jonathom/sar_and_soil/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathom%2Fsar_and_soil/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28574328,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-19T16:29:19.148Z","status":"ssl_error","status_checked_at":"2026-01-19T16:29:17.772Z","response_time":67,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-30T05:09:28.810Z","updated_at":"2026-01-19T16:32:18.790Z","avatar_url":"https://github.com/jonathom.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"Report on Internship at EFTAS\n=============================\n\nThis is a summary on what I did during my internship at EFTAS in October\n2020. The overall topic was to make a connection between SAR backscatter\nand soil moisture, also including precipitation data. The work can be\nsplit into two main approaches: the time series and the classification\napproach.\n\nData\n====\n\nAll functions are adapted to the years 2017-2018, for which the `.tif`s\ncontain 29 and 31 images, respectively. Thus, 60 images are overall\navailable, a fact that is hardcoded at various places. If you wish to\nadapt this script to another dataset, change this value to whatever\nnumber of bands is available.\n\n### Packages\n\n``` r\nsetwd(\"/home/petra/Praktikum\")\nlibrary(stars)\nlibrary(dplyr)\nlibrary(rgdal)\nlibrary(gdalUtils)\nlibrary(raster)\nlibrary(lubridate)\nlibrary(sp)\nlibrary(ggplot2)\nlibrary(zoo)\nlibrary(e1071)\n```\n\n### Intensity and Shapefiles\n\nThe .tif files each contain data for a year. They are dB - scaled\nintensity measurements from Sentinel 1 A. All scenes have a offset of 12\ndays, except for a data gap in January 2017.\n\n``` r\n# load files as proxy\nint_VV_2017 \u003c- read_stars(\"VV_2017_clip.tif\", proxy=TRUE)\nint_VH_2017 \u003c- read_stars(\"VH_2017_clip.tif\", proxy=TRUE)\nint_VV_2018 \u003c- read_stars(\"VV_clip.tif\", proxy=TRUE)\nint_VH_2018 \u003c- read_stars(\"VH_clip.tif\", proxy=TRUE)\n# assign names\nnames(int_VV_2017) \u003c- \"VV\"\nnames(int_VH_2017) \u003c- \"VH\"\nnames(int_VV_2018) \u003c- \"VV\"\nnames(int_VH_2018) \u003c- \"VH\"\n# water training shapefiles\nwater_shape_2017 \u003c- read_sf(\"water (copy).shp\")\nwater_shape_2018 \u003c- read_sf(\"water.shp\")\nwater_shape_2017_july07 \u003c- read_sf(\"water_shape_intersec.shp\")\n# study area\nstudy_area \u003c- read_sf(\"study_area.shp\")\n# shape\nshape \u003c- read_sf(\"antragsfl_16_17_18.shp\")\nshape \u003c- st_transform(shape, crs=st_crs(int_VV_2018))\n# moni\nmoni \u003c- read_sf(\"moni_shap.shp\")\n# double bounce\ndb \u003c- read_sf(\"double_bounce.shp\")\n```\n\nDates of scenes are read via `gdalinfo()`.\n\n``` r\n# make dates\ninfo = gdalinfo(\"VV_2017_clip.tif\")\ndescr = info[grepl(info, pattern = \"Description = \")]\ndescr = gsub(descr, pattern = \"Description = \", replacement = \"\")\ndates2017 \u003c- as_date(descr)\n\ninfo = gdalinfo(\"VV_clip.tif\")\ndescr = info[grepl(info, pattern = \"Description = \")]\ndescr = gsub(descr, pattern = \"Description = \", replacement = \"\")\ndates2018 \u003c- as_date(descr)\n```\n\nData is then made available as a `stars-proxy` pointer and a date\nvector. The two are only combined when specific areas or time steps are\nread into memory via the `loadPolygon` function declared later on, to\nsave time and memory usage.\n\n``` r\n# make proxy of both years\nproxy1718 \u003c- c(int_VV_2017, int_VV_2018, along = \"band\")\n# c() dates\ndates \u003c- c(dates2017, dates2018)\n```\n\n### Precipitation Data\n\nIs loaded as a separate object, as it allows for more flexibility in how\nto aggregate the rain data. `stars` objects must have the exact same\ndimensions (spatial and temporal) and by handling rain data separately\nwe save much time otherwise used on warping the rain geometry. The\nprecipitation dataset is averaged over the wider bog area since the data\nis not precise enough to be meaningfully cut to even smaller areas.\n\n``` r\n# load stars, read and assign dates as dimension\nrain_all_2017 \u003c- read_stars(\"rain_2017_clip_ordered.tif\")\ninfo = gdalinfo(\"rain_2017_clip_ordered.tif\")\ndescr = info[grepl(info, pattern = \"Description = \")]\ndescr = gsub(descr, pattern = \"Description = \", replacement = \"\")\nrainDates2017 \u003c- as_date(descr)\nrain_all_2017 \u003c- st_set_dimensions(rain_all_2017, 3, values = rainDates2017, names = \"time\")\n\n# clip to wider bog area, convert to dataframe\nstudy_area_rain \u003c- st_transform(study_area, crs=st_crs(rain_all_2017))\nrain_all_2017 \u003c- rain_all_2017[study_area_rain[1,]]\n```\n\n    ## although coordinates are longitude/latitude, st_intersects assumes that they are planar\n\n``` r\nrain.all.2017.df \u003c- as.data.frame(st_apply(rain_all_2017, \"time\", mean, na.rm = TRUE))\n\n# another method: make vector of all days of a year, assign as dimension\nrain_all_2018 \u003c- read_stars(\"rain_clip.tif\")\nall_days \u003c- seq(as.Date(\"2018-01-01\"), as.Date(\"2018-12-31\"), by=\"days\")\nrain_all_2018 \u003c- st_set_dimensions(rain_all_2018, 3, values = all_days, names = \"time\")\n\nrain_all_2018 \u003c- rain_all_2018[study_area_rain[1,]]\n```\n\n    ## although coordinates are longitude/latitude, st_intersects assumes that they are planar\n\n``` r\nrain.all.2018.df \u003c- as.data.frame(st_apply(rain_all_2018, \"time\", mean, na.rm = TRUE))\n\n# create rain dataframe, rows = days, columns = time, rain measurement\nrain_all \u003c- rbind(rain.all.2017.df, rain.all.2018.df)\n```\n\n### loadPolygon Function\n\nA function that changed with the different requirements but is now in\nits final stage. It takes a polygon (or shape) as its spatial extent and\nan interval as temproal extent and loads the corresponding `stars`\nobject into memory (since all backscatter `*.tif`s are huge and not held\nin memory).\n\n``` r\n# function takes a shape as an AOI and a year (2017 or 2018)\nloadPolygon \u003c- function(shape, interval) {\n  dates \u003c- dates[interval[1]:interval[2]]\n  int_vv \u003c- st_as_stars(proxy1718[shape], along = \"band\")\n  int_vv \u003c- int_vv[,,,interval[1]:interval[2]]\n  int_vv \u003c- st_set_dimensions(int_vv, 3, val = dates, names = \"time\")\n  names(int_vv) \u003c- c(\"VV\")\n  return(int_vv)\n}\n```\n\nTime Series Approach\n====================\n\nThe idea of this approach is to extract information about different\nkinds of scatterers or different soil moisture contents by looking at a\ntime series of values.\n\n### Compare Two Different Patches\n\nAn earlier approach: compare two patches to find out more about their\nbehaviour throughout the year, in context to each other.\n\n``` r\n# for 2017, load two polygons\nwater_patch \u003c- loadPolygon(moni[8,], c(1,29)) %\u003e% st_apply(., \"time\", mean,  na.rm = TRUE) %\u003e% as.data.frame()\nfield_patch \u003c- loadPolygon(moni[5,], c(1,29)) %\u003e% st_apply(., \"time\", mean,  na.rm = TRUE) %\u003e% as.data.frame()\n\nnames(water_patch) \u003c- c(\"time\", \"VV_water\")\nnames(field_patch) \u003c- c(\"time2\", \"VV_field\")\npatches \u003c- cbind(field_patch, water_patch) # create one df\n\nggplot(patches, aes(x = time)) +\n  geom_line(aes(y = VV_water, color = \"Water\")) +\n  geom_line(aes(y = VV_field, color = \"Field\")) +\n  scale_color_manual(name = \"Patch\", values = c(\"Water\"=\"blue\", \"Field\"=\"brown\")) +\n  theme(legend.position = \"bottom\") +\n  ggtitle(\"Backscatter Mean of Selected Polygons\") + ylab(\"backscatter\")\n\nanother_water_patch \u003c- loadPolygon(moni[9,], c(1,29)) %\u003e% st_apply(., \"time\", mean,  na.rm = TRUE) %\u003e% as.data.frame()\nnames(another_water_patch) \u003c- c(\"time3\", \"VV_water2\")\nwater_patches \u003c- cbind(water_patch, another_water_patch) # create another df\n\nggplot(water_patches, aes(x = time)) +\n  geom_line(aes(y = VV_water, color = \"Water\")) +\n  geom_line(aes(y = VV_water2, color = \"Water2\")) +\n  scale_color_manual(name = \"Patch\", values = c(\"Water\"=\"blue\", \"Water2\"=\"lightblue\")) +\n  theme(legend.position = \"bottom\") +\n  ggtitle(\"Backscatter Mean of Selected Polygons\") + ylab(\"backscatter\")\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/compare-patches-1.png\" width=\"50%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/compare-patches-2.png\" width=\"50%\" /\u003e\n\n### Aggregate Rain\n\nThe next step was to plot backscatter and rain data. An important\nquestion here is how the rain should be aggregated (since depicting\nsingle days might confuse the plot). Different solutions were found. At\nfirst, a three day sum was calculated. Then, computations based on what\nhappens between two acquisitions were considered. The following code\nsimply calculates the rain sum between each scene.\n\n``` r\n# cut the bog polygons to our study area\ninter_shape \u003c- shape[lengths(st_intersects(shape, study_area[1,])) != 0,]\n# cut geometries depending on sponsorship status\nspons \u003c- inter_shape[inter_shape$Förderung == \"in Förderung\",]\nnospons \u003c- inter_shape[is.na(inter_shape$Förderung),]\n# load polygon, aggregate as mean and convert to data frame\nspons.df \u003c- loadPolygon(spons, c(1,60)) %\u003e% st_apply(., \"time\", mean, na.rm = TRUE) %\u003e% as.data.frame()\nnospons.df \u003c- loadPolygon(nospons, c(1,60)) %\u003e% st_apply(., \"time\", mean, na.rm = TRUE) %\u003e% as.data.frame()\n# bind, select, rename\ndf \u003c- cbind(spons.df, nospons.df) %\u003e% .[,c(1,2,4)]\nnames(df) \u003c- c(\"time\", \"sponsVV\", \"nosponsVV\")\n\n# rain aggregation over both years\nagg_rain \u003c- c(rain_all[1,2] + rain_all[2,2] + rain_all[3,2] + rain_all[4,2] + rain_all[5,2] + rain_all[6,2] + rain_all[7,2] + rain_all[8,2]) # to handle the first sum that only has 8 values\nfor (i in 2:nrow(df)) { # i corresponds to row in S1 acquisitions\n  offset \u003c- 8 # first S1 acquisition is 08/01, rain data starts 01/01\n  inter \u003c- 12 # 12 days between acquisitions\n  start \u003c- (i - 2) * inter + offset + 1 # exclude last from prev sum\n  end \u003c- (i - 1) * inter + offset\n  sum \u003c- 0 # create sum as 0\n  # for the calculated rows of rain data frame\n  for (j in start:end) {\n    sum \u003c- sum + rain_all[j,2] # add day to day rain\n  }\n  agg_rain \u003c- c(agg_rain, sum) # append to rain vector\n}\n\ndf \u003c- cbind(df, agg_rain) # bind to intensity data\nnames(df) \u003c- c(\"time\", \"sponsVV\", \"nosponsVV\", \"rain\")\n\nggplot(df, aes(x = time)) + \n  geom_bar(aes(x = time - 6, y = rain), stat = 'identity', fill = \"lightblue\", alpha = 0.8) + # time - 6 so that rain bars are displayed in between S1 acquisition dates\n  geom_line(aes(y = sponsVV * 4 + 70, color = \"in Förderung\")) + # scale VV as enhancement of effects\n  geom_point(aes(y = sponsVV * 4 + 70, color = \"in Förderung\")) +\n  geom_line(aes(y = nosponsVV * 4 + 70, color = \"nicht in Förderung\")) +\n  coord_cartesian(ylim = c(0,60)) + \n  ggtitle(\"Mean of VV in Bog Polygons and Precipitation\") + xlab(\"Time\") +\n  ylab(\"Precipitation in mm/m²\") + \n  scale_y_continuous(sec.axis = sec_axis(~. *0.25 -17.5, name = \"Intensity in dB\")) + \n  scale_color_manual(name = \"Förderung\", values = c(\"in Förderung\"=\"blue\", \"nicht in Förderung\"=\"black\")) +\n  theme(legend.position = \"bottom\")\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/aggregate-rain-1.png\" width=\"100%\" /\u003e\n\nOr: build weighted sums of the precipitation between scenes. The\nfollowing depicts a weight of 1/(\\# of day before acquisition), so that\nthe day of the acquisition has weight 1, the day before has weight 1/2,\n1/3 (…) and so on \\[code hidden\\].\n\u003cimg src=\"report_files/figure-markdown_github/rain-weighted-1.png\" width=\"100%\" /\u003e\n\nClassification Count Approach\n=============================\n\nGoal of the classification approach was to find thresholds that\ncharacterize different scatterers and thereby distinguish land cover\ntypes. In a vegetated areas a C-band radar signal inhibits different\ngrades of volume scattering. Water however reflects the signal almost\nperfectly, leading to very low backscatter measures. In a case where\nvegetation is standing in water, total reflection and also double bounce\neffects have to be considered. Whether a signal is volume- or\nsurface-scattered also depends on its polarization. As a start, the\ndetection of water and the detection of double bounce was carried out by\ndigitizing training polygons and finding thresholds with a SVM.\n\n### Water Thresholds\n\nBecause it is the most simple approach, only training data from one\npoint in time (03/01/2018) were generated. The found Threshold was \\~\n-15dB. This threshold was then applied on a heavy rain event in the\nsummer of 2017. To give in idea on the magnitude and circumstances on\nthis example:\n\n``` r\n# plot closeup of prec data and S1 dates\nggplot(rain_all, aes(x=time)) +\n  geom_bar(aes(y=mean), fill = \"lightblue\", stat='identity') +\n  xlim(as.Date(c(\"2017-06-15\", \"2017-07-15\"))) +\n  geom_vline(xintercept = as.Date(\"2017-06-25\")) +\n  geom_vline(xintercept = as.Date(\"2017-07-07\")) +\n  ggtitle(\"Closeup of Precipitation Data + S1 Acquisition Dates\") +\n  ylab(\"precipitation in mm/m²\")\n\nheavyrain \u003c- loadPolygon(study_area[1,], c(14,15)) # load data from 2017 for the area\nplot(heavyrain) # plot the time steps in question\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/rain-in-2017-1.png\" width=\"50%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/rain-in-2017-2.png\" width=\"50%\" /\u003e\n\nWe observe an overall increase in backscatter, most likely due to the\nincrease in soil moisture. Then we plot the time steps again, with what\nwas found as a threshold for open water on fields / on the bog area:\n\n``` r\nthreshold \u003c- -15.34187 # set found threshold\nimage(heavyrain[,,,1], col = c(\"white\", \"black\"), breaks = c(-35, threshold, 35), main=\"25/06, White: \u003c -15.3; Black: \u003e -15.3\")\nimage(heavyrain[,,,2], col = c(\"white\", \"black\"), breaks = c(-35, threshold, 35), main=\"07/07, White: \u003c -15.3; Black: \u003e -15.3\")\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-3-1.png\" width=\"50%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-3-2.png\" width=\"50%\" /\u003e\nThe images show a decrease in surfaces below the found threshold for\nwater. With a rain event of this magnitude we would expect several\npuddles and dents to flood, leading to more surface below the water\nthreshold. Experimentally, a much lower threshold was applied, leading\nto the anticipated effect:\n\n``` r\nthreshold \u003c- -20.5 # apply a much lower threshold\nimage(heavyrain[,,,1], col = c(\"white\", \"black\"), breaks = c(-35, threshold, 35), main=\"25.06., White: \u003c -20.5; Black: \u003e -20.5\")\nimage(heavyrain[,,,2], col = c(\"white\", \"black\"), breaks = c(-35, threshold, 35), main=\"07.07., White: \u003c -20.5; Black: \u003e -20.5\")\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-4-1.png\" width=\"50%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-4-2.png\" width=\"50%\" /\u003e\nTherefore, the validity of the used training data was questioned. A new\nSVM training was applied. While before, only training data for january\nof different years was computed, this approach used two scenes from\nwinter and summer of 2017 in order to possible seasonal influences. 2017\nwas chosen as the yeare with more precipitation and therefore a higher\nchance to generate correct training data (since training polygons could\nonly be guessed with no ground truth available). A function is written\nto generate training data from a shapefile that is time step specific\nand contains a `$type` attribute for training and an `$area` attribute\nthat describes the surrounding area.\n\n``` r\n# function to generate training data from shapefiles\ntrainPolygons \u003c- function(water_shape, timestep)    {   \n  end \u003c- length(water_shape$type) # must contain type attr\n  for (k in 1:end) {\n    type \u003c- water_shape[k,]$type\n    area \u003c- water_shape[k,]$area\n    # select which classes should be included\n    if(type != \"street\" \u0026\u0026 !is.null(area) \u0026\u0026 (area == \"field\" || area == \"lake\")) {\n      poly \u003c- loadPolygon(water_shape[k,], c(1,60))\n      # set VV, timestep\n      poly \u003c- poly[1,,,timestep]\n      poly.df \u003c- as.data.frame(poly)\n      poly.df$type \u003c- water_shape[k,]$type\n      poly.df$sitch \u003c- paste0(water_shape[k,]$type, \" in \", water_shape[k,]$area)\n      poly.df \u003c- poly.df[complete.cases(poly.df),]\n      if(k == 1) {\n        val.svm \u003c- poly.df[,4:6]\n      } else {\n        val.svm \u003c- rbind(val.svm, poly.df[,4:6])\n      }\n    }\n  }\n  return(val.svm)\n}\n```\n\n``` r\n# make training vector with function\nval1 \u003c- trainPolygons(water_shape_2017, 1)\nval2 \u003c- trainPolygons(water_shape_2017_july07, 15)\nval.svm \u003c- rbind(val1, val2)\n# prepare data\nval.svm$type \u003c- as.factor(val.svm$type)\nattach(val.svm)\nx \u003c- subset(val.svm, select=c(-type, -sitch))\ny \u003c- type\nsvmmod \u003c- svm(x,y) # make model\ntable(predict(svmmod,x), y) # make prediction, confusion matrix\n```\n\n    ##        y\n    ##         field water\n    ##   field   967     0\n    ##   water     0  1252\n\n``` r\nsvmmod$x.scale$`scaled:center` # threshold is\n```\n\n    ## [1] -18.82414\n\n``` r\nggplot(val.svm, aes(x=sitch, y=VV)) + ylab(\"VV Backscatter\") +\n  geom_boxplot() + xlab(\"Type and Surrounding Area\") + \n  geom_hline(yintercept = svmmod$x.scale$`scaled:center`, color = \"darkblue\") +\n  ggtitle(\"Threshold for Open Water in Field/Bog Areas\")\n```\n\n![](report_files/figure-markdown_github/water-svm-1.png)\n\n### Double Bounce Threshold\n\nDouble bounce training data is constructed similar to the example above,\nexcept that it only uses a scene in january 2018 \\[code hidden\\].\n\n    ## The following objects are masked from val.svm:\n    ## \n    ##     type, VV\n\n``` r\ntable(predict(svm.db,x), y) # make prediction, confusion matrix\n```\n\n    ##        y\n    ##         other urban\n    ##   other  1214    29\n    ##   urban     1   221\n\n``` r\nsvm.db$x.scale$`scaled:center` # threshold is\n```\n\n    ## [1] -8.348805\n\n``` r\n# boxplot\nggplot(val.db, aes(x=type, y=VV)) +\n  geom_boxplot() + ylab(\"VV Backscatter\") + xlab(\"Type\") +\n  geom_hline(yintercept = svm.db$x.scale$`scaled:center`, color = \"yellow\")\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-5-1.png\" width=\"50%\" /\u003e\n\n### Count Rasters - Single Threshold\n\nThe idea behind count rasters is to find out how often certain areas are\nclassified as a certain class, i.e. their pixel values lie in a\nspecified range that is suspected of representing e.g. water. At first\nsuch sums were calculated for 2017 and 2018 separately in order to\ncompare the years. A function was written that cuts given areas up into\ntiles to keep memory usage at acceptable levels during this calculation.\nThe code is hidden because a similar function is shown later on. The\nmachanism of how 0/1 counts are assigned is shown anyway:\n\n``` r\nscene[scene \u003e= threshold] \u003c- 1\nscene[scene \u003c threshold] \u003c- 0\n```\n\nFor now, these rasters are read from file\n\n``` r\nras2017 \u003c- raster(\"./report/water_thresh/water_thresh_count_2017-01-08_2017-12-22.tif\")\nras2018 \u003c- raster(\"./report/water_thresh/water_thresh_count_2018-01-03_2018-12-29.tif\")\ncolo = viridisLite::inferno(29)\nbreaks = seq(0, 29, 1)\nimage(ras2017, col = colo, breaks = breaks, main=\"2017\")\ncolo = viridisLite::inferno(31)\nbreaks = seq(0, 31, 1)\nimage(ras2018, col = colo, breaks = breaks, main=\"2018\")\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/compare-years-1.png\" width=\"50%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/compare-years-2.png\" width=\"50%\" /\u003e\nThe fact that the much more dry year of 2018 exhibits more counts of\nvalues below the threshold might point to more heavy rain events (more\nflooding) or to the possibility that values below our threshold do not\npoint exclusively towards flooded terrain but also to other reasons for\nlow backscatter (e.g. dry soil). Water surfaces were detected in both\nyears.\n\n### Count Rasters - Classification\n\nThis classification can also be done for multiple classes. But in order\nto do so, we also need more thresholds. As calculated before we can use\n-18.8 for water and the double bounce threshold of -8.3. Another\nthreshold for open water in field areas was calculated with a different\nset of training data: -17.6. We can read another threshold from the\nfigure with the title “Threshold for Open Water in Field/Bog Areas”:\n-22.5 as a threshold for open water (as in lakes). Missing threshold\nvalues between -17 and -8 were filled by simply making up the values of\n-10, -13 and -14. The function to calculate these rasters is the same\ntiling-function as used before, with a slight change in how 0/1 values\nare assigned:\n\n``` r\n# interval \u003c- c(lower, upper) threshold\nscene[scene \u003e interval[2]] \u003c- 0\nscene[scene \u003c interval[1]] \u003c- 0\nscene[scene != 0] \u003c- 1\n```\n\nBy assigning these values as count thresholds, different count rasters\ncan be computed. The most interesting RGB combinations are shown. In the\nfirst plot, all lower classes are shown. We see that areas rarely fall\nlower than -22.5. Water tends to range in our dedicated water class and\nreveals domains that tend to flood. We see that the bog area has overall\nmore counts of low backscatter (blue).\n\n``` r\n# results of these classifications are saved in 'completesum_ALL.tif'\n# band order is down 1-3, middle 1-3, up 1-3, mean\nstack_all \u003c- raster::stack(\"./report/class_count_ras/completesum_ALL.tif\")\nplotRGB(stack_all, r=1, g=2, b=3, scale = 60, axes=TRUE, main=\"R: x \u003c -22.5 | G: -22.5 \u003c x \u003c -17.6 | B: -17.6 \u003c x \u003c -14\")\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/count-ras-class-1.png\" width=\"100%\" /\u003e\n\nIn the following plot we see a classification over almost the full range\nof the backscatter spectrum. Red depict areas with large high\nbackscatter counts (while double bounce pixels above -8.3 are masked\nfrom the image and appear black, as do pixels \\\u003c -22.5). Grenn shows the\ncounts in the mid-range, and blue is assigned to the water class.\n\n``` r\nplotRGB(stack_all, r=8, g=5, b=2, stretch=\"lin\", axes = TRUE, main = \"R: -13 \u003c x \u003c -8.3 | G: -17.6 \u003c x \u003c -13 | B: -22.5 \u003c x \u003c -17.6\")\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-8-1.png\" width=\"100%\" /\u003e\n\n### Count Rasters - Imaging Change\n\nIn this approach it was tried to avoid setting too many thresholds\nmanually and let the data speak instead. Different Measures of change\nwere quantified and can be displayed in different RGB combinations. The\nidea for this was given in this\n[paper](https://www.mdpi.com/2072-4292/8/10/795/htm#). The question was\nhow we could distinguish surfaces with alternating moisture content from\neach other. The calculated bands are:\n\n-   change count\n    -   how often does the pixel change? -\\\u003e count of change\n    -   ranges from 0 to \\# of time steps\n    -   higher values with much change\n    -   parameter `change detection threshold [cdt]`: sensitivity (what\n        counts as a change?)\n-   change frequency\n    -   how stable is the pixel? -\\\u003e mean of no-change-streaks (duration\n        of periods with no change is calculated, then averaged and\n        rounded)\n    -   ranges from 0 to \\# of time steps\n    -   higher values correspond to longer periods of no change\n    -   it might also be useful to `ceiling()` these values, or take the\n        `max()`\n-   change range\n    -   what magnitude does the change have? -\\\u003e difference of max and\n        min\n    -   difference of `max` and `min` in pixel time series is computed\n    -   higher values indicate a wider range\n-   mean\n    -   higher values for a higher mean backscatter\n-   threshold count\n    -   count over the time period of one specific range of values -\\\u003e\n        how often were the pixels between a and b (see chapter “Count\n        Rasters - Single Threshold”)\n    -   ranges from 0 to \\# of time steps\n    -   high values depict high counts in this class\n\nThey are calculated as follows:\n\n``` r\n#### calculate length of intervals of no change #\ncalcChangeLength \u003c- function(freq) {\n  # count streaks of change / no change\n  counts \u003c- rle(freq)\n  # convert to matrix\n  counts \u003c- matrix(c(counts[[1]], counts[[2]]), ncol = length(counts[[1]]), byrow = TRUE)\n  # rounded mean streak length of 'no change'\n  round(mean(counts[1, counts[2,] == 0]))\n}\n\n#### stitch rasters from folders ################\nstitchRasters \u003c- function(dirName, folderName, fileName, dates) {\n  dir_list \u003c- list.files(dirName)\n  for (a in 1:length(dir_list)) {\n    str \u003c- paste0(dirName, \"/\", dir_list[a])\n    ras \u003c- raster(str)\n    if(a \u003c 2) {\n      allRas \u003c- ras\n    } else {\n      allRas \u003c- raster::merge(allRas, ras)\n    }\n  }\n  name \u003c- paste0(\"/\", fileName, \"_\", dates[1], \"_\", dates[2], \"flag.tif\")\n  writeRaster(allRas, paste0(folderName, name), overwrite = TRUE)\n}\n\n#### characterize function #######################\ncharacterize \u003c- function(ext, interval, thresholds, cdt, folderName) {\n  \n  # make the folders\n  dir.create(folderName)\n  \n  change_dir \u003c- paste0(folderName, \"/change_count_tiles\")\n  dir.create(change_dir)\n  \n  range_dir \u003c- paste0(folderName, \"/change_range_tiles\")\n  dir.create(range_dir)\n  \n  mean_dir \u003c- paste0(folderName, \"/mean_tiles\")\n  dir.create(mean_dir)\n  \n  freq_dir \u003c- paste0(folderName, \"/change_freq_tiles\")\n  dir.create(freq_dir)\n  \n  thresh_dir \u003c- paste0(folderName, \"/threshold_count_tiles\")\n  dir.create(thresh_dir)\n  \n  #### TILING ####################################\n  # bbox can be used for stars subsetting []\n  ext \u003c- st_bbox(ext) # ext is xmin ymin xmax ymax\n  # study area one has 11860, 10.000 seems OK tile size\n  # calculate span\n  x.span \u003c- ext[3] - ext[1] # X\n  y.span \u003c- ext[4] - ext[2] # Y\n  # calc good number of cuts\n  x.cuts \u003c- ceiling(x.span / 10000)\n  y.cuts \u003c- ceiling(y.span / 10000)\n  # calc cut length\n  x.cut.by \u003c- x.span / x.cuts\n  y.cut.by \u003c- y.span / y.cuts\n  \n  # tile counter\n  count \u003c- 0\n  # go through all cuts in X direction\n  for (i in 1:x.cuts) {\n    # go through all cuts in Y direction\n    for (j in 1:y.cuts) {\n      count \u003c- count + 1\n      # make extent object\n      xmin \u003c- ext[1] + (i - 1) * x.cut.by\n      xmax \u003c- ext[1] + i * x.cut.by\n      ymin \u003c- ext[2] + (j - 1) * y.cut.by\n      ymax \u003c- ext[2] + j * y.cut.by\n      \n      cutbox \u003c- ext\n      cutbox[1] \u003c- xmin\n      cutbox[2] \u003c- ymin\n      cutbox[3] \u003c- xmax\n      cutbox[4] \u003c- ymax\n      \n      #### DO FOR EACH TILE #####################\n      \n      # make tile name\n      if(count \u003c 10) {\n        name \u003c- paste0(\"/tile_0\", count, \".tif\")\n      } else {\n        name \u003c- paste0(\"/tile_\", count, \".tif\")\n      }\n      \n      # load stars  \n      tile \u003c- loadPolygon(cutbox, interval)\n      \n      # make 0 change_count obj\n      change_count \u003c- tile[,,,1]\n      change_count[change_count \u003c 1000] \u003c- 0\n      \n      # assign full time span stars to build mean frequency\n      stars_freq \u003c- tile\n      \n      lengt \u003c- length(interval[1]:interval[2])\n      leng \u003c- lengt - 1\n      # go through time steps of tile\n      for (k in 1:leng) {\n        mag \u003c- abs(abs(tile[,,,k]) - abs(tile[,,,k+1]))\n        # direction of change can be extracted with one less \"abs\" and sum\n        mag[mag \u003c cdt] \u003c- 0\n        mag[mag \u003e= cdt] \u003c- 1\n        change_count \u003c- change_count + mag\n        \n        # frequency of change\n        if(k \u003c 2) {\n          # create freq as first change image\n          freq \u003c- mag\n        } else {\n          # bind new mag to freq to get time series of change\n          freq \u003c- c(freq, mag)\n        }\n      }\n      \n      write_stars(change_count, paste0(change_dir, name))\n      \n      freq \u003c- st_apply(freq, c(\"x\", \"y\"), FUN = calcChangeLength)\n      write_stars(freq, paste0(freq_dir, name))\n      \n      # change range \n      # mask changes from \u003c cdt ?\n      range_max \u003c- st_apply(tile, c(\"x\", \"y\"), FUN = \"max\")\n      range_min \u003c- st_apply(tile, c(\"x\", \"y\"), FUN = \"min\")\n      range \u003c- range_max - range_min\n      \n      # rescale\n      # max_value \u003c- interval[2] - interval[1] + 1\n      # range \u003c- rescale(range, to = c(0, max_value))\n      \n      write_stars(range, paste0(range_dir, name))\n      \n      # mean\n      tile_mean \u003c- st_apply(tile, c(\"x\", \"y\"), mean)\n      write_stars(tile_mean, paste0(mean_dir, name))\n      \n      # create thresholds sum\n      for (l in 1:lengt) {\n        scene \u003c- tile[,,,l]\n        scene[scene \u003e thresholds[2]] \u003c- 0\n        scene[scene \u003c thresholds[1]] \u003c- 0\n        scene[scene != 0] \u003c- 1\n        \n        if(l == 1) {\n          thresh_sum \u003c- scene\n        }\n        else {\n          thresh_sum \u003c- thresh_sum + scene\n        }\n      }\n      # write\n      write_stars(thresh_sum, paste0(thresh_dir, name))\n    }\n  }\n  \n  # stitch rasters\n  dateName \u003c- c(dates[interval[1]], dates[interval[2]])\n  \n  stitchRasters(change_dir, folderName, \"change_count\", dateName)\n  stitchRasters(freq_dir, folderName, \"change_freq\", dateName)\n  stitchRasters(range_dir, folderName, \"change_range\", dateName)\n  stitchRasters(mean_dir, folderName, \"mean\", dateName)\n  stitchRasters(thresh_dir, folderName, \"thresh_count\", dateName)\n  \n  # flag keeps preexisting raster from being included\n  raster_list \u003c- list.files(folderName, rec=FALSE, pattern = \"*flag.tif\")\n  raster_list \u003c- paste0(folderName, \"/\", raster_list)\n  \n  tifName \u003c- paste0(folderName, \"/all_\", dateName[1], \"_\", dateName[2], \".tif\")\n  \n  writeRaster(raster::stack(raster(raster_list[1]), raster(raster_list[2]), raster(raster_list[3]), raster(raster_list[4]), raster_list[5]), tifName, overwrite = TRUE)\n  \n  print(\"bands are in order:\")\n  print(raster_list)\n}\n```\n\nAnd can be called as:\n\n``` r\n## enter parameters\n## folder name of this specific run with ./\nfolderName \u003c- \"./charac_test_run\"\n## extent\next \u003c- study_area[2,]\n## time span interval 1,29 is 2017, 30,60 is 2018\ninterval \u003c- c(1,29)\n# which interval should be classified and counted?\nthresholds \u003c- c(-22.5, -17.6) # water threshold is given\n## change detection threshold, in dB (sensitivity of change detection)\ncdt \u003c- 2\n\ncharacterize(ext, interval, thresholds, cdt, folderName)\n```\n\nFirst, we’re looking at the bands one by one:\n\n``` r\nsta \u003c- read_stars(\"./report/charac_five_attr_both_years/all_2017-01-08_2018-12-29.tif\")\nstudy_area \u003c- st_transform(study_area, crs=st_crs(sta))\nsta_small \u003c- sta[study_area[1,]]\nplot(sta_small[,,,1], main = \"Change Count\")\nplot(sta_small[,,,2], main = \"No-Change Frequency\")\nplot(sta_small[,,,3], main = \"Change Range\")\nplot(sta_small[,,,4], main = \"Mean\")\nplot(sta_small[,,,5], main = \"Threshold Count\")\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-10-1.png\" width=\"50%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-10-2.png\" width=\"50%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-10-3.png\" width=\"50%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-10-4.png\" width=\"50%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-10-5.png\" width=\"50%\" /\u003e\n\nExample use case RGB to identify water / estimate wetness of the bog\narea. The band over which such areas can be identified is always\nassigned to the blue band, explanations can be found in the comments.\n\n``` r\nras \u003c- raster::stack(\"./report/charac_five_attr_both_years/all_2017-01-08_2018-12-29.tif\")\nras \u003c- crop(ras, study_area[1,])\nras \u003c- mask(ras, study_area[1,])\n# identify water via the water threshold, display change measures\nplotRGB(ras, r=1, g=3, b=5, stretch = \"lin\", axes = TRUE, main = \"R: change count, G: change range, B: water threshold count\")\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-11-1.png\" width=\"100%\" /\u003e\n\n``` r\n# identify water / wet areas via water threshold and a low mean (less green)\nplotRGB(ras, r=2, g=4, b=5, stretch = \"hist\", axes = TRUE, main = \"R: pixel stability, G: mean, B: water threshold count\")\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-11-2.png\" width=\"100%\" /\u003e\n\n``` r\n# identify water via a large change range\nplotRGB(ras, r=1, g=2, b=3, stretch = \"lin\", axes = TRUE, main = \"R: change count, G: no-change frequency, B: change range\")\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/unnamed-chunk-11-3.png\" width=\"100%\" /\u003e\n\nConnecting Time Series and Classification Count\n===============================================\n\nThe approaches above show some interesting results. However they do so\nby collapsing either the temporal or the spatial dimension. Two ideas\nhave been developed in which collapsing dimensions doesn’t play such a\nbig role.\n\n### Raster Time Series\n\nIn this case, the raster does not contain summarized data for a given\ninterval but is a form of looking at *all* time steps of the series at\nonce, as efficiently as possible.\n\n``` r\n# time series\nplotTS \u003c- function(area, threshold, start, end) {\n  # load obj\n  area \u003c- st_transform(area, crs = st_crs(proxy1718))\n  obj \u003c- loadPolygon(area, c(1,60))\n  # make colors for main part of VV dB range of values\n  colo = viridisLite::inferno(85)\n  breaks = seq(-25, -8, 0.2)\n  # build rain statistics\n  jahresniederschlag = sum(rain_all$mean)\n  durchschnitt = jahresniederschlag / 365\n  # plot for start:end\n  for (i in start:end) {\n    # rain interval\n    start.rain \u003c- 8 + (i - 1) * 12 + 1\n    if(i == 1) {start.rain \u003c- 0}\n    end.rain \u003c- i * 12 + 8\n    df \u003c- rain_all[start.rain:end.rain,]\n    gesamt \u003c- floor((sum(df$mean)) * 100) / 100\n    percent \u003c- (floor((gesamt * 100 / jahresniederschlag) * 100)) / 100\n    titl = paste0(df[1,1], \" until \", df[nrow(df),1], \" // \", gesamt, \"mm or \", percent, \"% of all rain\")\n    print(ggplot(df, aes(x = time, y = mean)) +\n            geom_bar(stat='identity') + ylab(\"Precipitation\") + xlab(\"Time\") +\n            ggtitle(titl) + theme(plot.title = element_text(size = 20)))\n    image(obj[1,,,i], col = colo, breaks = breaks, main = dates[i])\n    image(obj[1,,,i], col = c(\"white\", \"black\"), breaks = c(-35, threshold, 35), main = dates[i])\n  }\n}\n```\n\nThe previously defined rain data frame is used for rain plotting and\nstatistics. Some example time steps are shown instead of a whole year.\n\n``` r\nplotTS(study_area[1,], -18.82414, 14, 16)\n```\n\n\u003cimg src=\"report_files/figure-markdown_github/plot-raster-ts-1.png\" width=\"33%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/plot-raster-ts-2.png\" width=\"33%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/plot-raster-ts-3.png\" width=\"33%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/plot-raster-ts-4.png\" width=\"33%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/plot-raster-ts-5.png\" width=\"33%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/plot-raster-ts-6.png\" width=\"33%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/plot-raster-ts-7.png\" width=\"33%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/plot-raster-ts-8.png\" width=\"33%\" /\u003e\u003cimg src=\"report_files/figure-markdown_github/plot-raster-ts-9.png\" width=\"33%\" /\u003e\n\n### Connecting Count Rasters and Time Series Approach\n\nIn the time series above we have observed the phenomenon of the tipping\npoints. The mean of backscatter increases with increasing rain, until\nthe curve tips and decreases, presumably because the soil is saturated\nwith water and the area starts to flood, leading to a total reflection\nof the radar signal. This approach aims at excluding these surfaces from\nthe time series analysis in order to explore how all not-flooded areas\nare responding to heavy rain. There have been functions written that do\nthis task for larger raster areas with a tiling approach and using\n`lapply` for the time steps, but here we only look at the observed bog\npolygons which we can process on-the-fly.\n\n``` r\n# load stars with time steps\nvalues_sta \u003c- loadPolygon(inter_shape, c(1,60))\nthresh \u003c- -18.82414\n# quickly rebuild sum\nfor (i in 1:60) {\n  tile \u003c- values_sta[,,,i]\n  tile[tile \u003e thresh] \u003c- 0\n  tile[tile \u003c thresh] \u003c- 1\n  if(i == 1) {\n    count \u003c- tile\n  } else {\n    count \u003c- count + tile # count is then the threshold count raster\n  }\n}\n# assign NA\nvalues_sta[count \u003e 0] \u003c- NA # now we have assigned NA to every pixel that ever falls below the water threshold\nplot(values_sta[,,,1], main = \"Water-Masked Bog Polygons\")\n```\n\n![](report_files/figure-markdown_github/ts-and-counts-1.png)\n\nWe now want to build a time series pot just as before, but this time\nwith our masked `values_sta` object. We’re using objects defined under\n“Time Series Approach” and then “Aggregate Rain”.\n\n``` r\n# load polygon, aggregate as mean and convert to data frame\nspons.thresh.df \u003c- values_sta[spons] %\u003e% st_apply(., \"time\", mean, na.rm = TRUE) %\u003e% as.data.frame()\nnospons.thresh.df \u003c- values_sta[nospons] %\u003e% st_apply(., \"time\", mean, na.rm = TRUE) %\u003e% as.data.frame()\n# bind, select, rename\ndff \u003c- cbind(spons.thresh.df, nospons.thresh.df) %\u003e% .[,c(1,2,4)]\nnames(dff) \u003c- c(\"time\", \"sponsVV\", \"nosponsVV\")\n\ndff \u003c- cbind(dff, agg_rain) # bind to intensity data\nnames(dff) \u003c- c(\"time\", \"sponsVV\", \"nosponsVV\", \"rain\")\n\nggplot(dff, aes(x = time)) + \n  geom_bar(aes(x = time - 6, y = rain), stat = 'identity', fill = \"lightblue\", alpha = 0.8) + # time - 6 so that rain bars are displayed in between S1 acquisition dates\n  geom_line(aes(y = sponsVV * 4 + 70, color = \"in Förderung\")) + # scale VV as enhancement of effects\n  geom_point(aes(y = sponsVV * 4 + 70, color = \"in Förderung\")) +\n  geom_line(aes(y = nosponsVV * 4 + 70, color = \"nicht in Förderung\")) +\n  coord_cartesian(ylim = c(0,60)) + \n  ggtitle(\"Mean of VV in Bog Polygons (that are never assumed to flood) and Precipitation\") + xlab(\"Time\") +\n  ylab(\"Precipitation in mm/m²\") + \n  scale_y_continuous(sec.axis = sec_axis(~. *0.25 -17.5, name = \"Intensity in dB\")) + \n  scale_color_manual(name = \"Förderung\", values = c(\"in Förderung\"=\"blue\", \"nicht in Förderung\"=\"black\")) +\n  theme(legend.position = \"bottom\")\n```\n\n![](report_files/figure-markdown_github/unnamed-chunk-13-1.png)\n\n``` r\n# compare the two methods mask/nomask\nall.df \u003c- df[,1:3] %\u003e% cbind(., dff$sponsVV, dff$nosponsVV, dff$rain)\nall.df$nomaskMean \u003c- rowMeans(all.df[,2:3])\nall.df$maskMean \u003c- rowMeans(all.df[,4:5])\n\nggplot(all.df, aes(x = time)) +\n  geom_line(aes(y = maskMean, color = \"masked\")) +\n  geom_line(aes(y = nomaskMean, color = \"unmasked\")) +\n  xlab(\"Time\") + ylab(\"Intensity VV in dB\") + ggtitle(\"Difference Between Bog Polygons and Bog Polygons with Masked Water\")\n```\n\n![](report_files/figure-markdown_github/unnamed-chunk-13-2.png)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonathom%2Fsar_and_soil","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjonathom%2Fsar_and_soil","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonathom%2Fsar_and_soil/lists"}