https://github.com/japhir/stdstats

Simulation and plotting code for "Optimizing the use of carbonate standards to minimize uncertainties in clumped isotope data"
https://github.com/japhir/stdstats
Last synced: 4 months ago
JSON representation
Simulation and plotting code for "Optimizing the use of carbonate standards to minimize uncertainties in clumped isotope data"
Host: GitHub
URL: https://github.com/japhir/stdstats
Owner: japhir
License: gpl-3.0
Created: 2019-08-23T14:04:14.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2023-04-26T22:27:55.000Z (about 2 years ago)
Last Synced: 2025-01-23T07:44:47.964Z (6 months ago)
Size: 20.2 MB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.html
- License: LICENSE.md
- Citation: CITATION.cff
Awesome Lists containing this project

awesome-utrecht-university - stdstats - Simulation and plotting code for "Optimizing the use of carbonate standards to minimize uncertainties in clumped isotope data" (Projects / Research code)
awesome-utrecht-university - stdstats - Simulation and plotting code for "Optimizing the use of carbonate standards to minimize uncertainties in clumped isotope data" (Projects / Research code)
README

        

Simulation and plotting code for "Optimizing the use of carbonate standards to minimize uncertainties in clumped isotope data"

  #content { max-width: 60em; margin: auto; }

  .title  { text-align: center;

             margin-bottom: .2em; }

  .subtitle { text-align: center;

              font-size: medium;

              font-weight: bold;

              margin-top:0; }

  .todo   { font-family: monospace; color: red; }

  .done   { font-family: monospace; color: green; }

  .priority { font-family: monospace; color: orange; }

  .tag    { background-color: #eee; font-family: monospace;

            padding: 2px; font-size: 80%; font-weight: normal; }

  .timestamp { color: #bebebe; }

  .timestamp-kwd { color: #5f9ea0; }

  .org-right  { margin-left: auto; margin-right: 0px;  text-align: right; }

  .org-left   { margin-left: 0px;  margin-right: auto; text-align: left; }

  .org-center { margin-left: auto; margin-right: auto; text-align: center; }

  .underline { text-decoration: underline; }

  #postamble p, #preamble p { font-size: 90%; margin: .2em; }

  p.verse { margin-left: 3%; }

  pre {

    border: 1px solid #e6e6e6;

    border-radius: 3px;

    background-color: #f2f2f2;

    padding: 8pt;

    font-family: monospace;

    overflow: auto;

    margin: 1.2em;

  }

  pre.src {

    position: relative;

    overflow: auto;

  }

  pre.src:before {

    display: none;

    position: absolute;

    top: -8px;

    right: 12px;

    padding: 3px;

    color: #555;

    background-color: #f2f2f299;

  }

  pre.src:hover:before { display: inline; margin-top: 14px;}

  /* Languages per Org manual */

  pre.src-asymptote:before { content: 'Asymptote'; }

  pre.src-awk:before { content: 'Awk'; }

  pre.src-authinfo::before { content: 'Authinfo'; }

  pre.src-C:before { content: 'C'; }

  /* pre.src-C++ doesn't work in CSS */

  pre.src-clojure:before { content: 'Clojure'; }

  pre.src-css:before { content: 'CSS'; }

  pre.src-D:before { content: 'D'; }

  pre.src-ditaa:before { content: 'ditaa'; }

  pre.src-dot:before { content: 'Graphviz'; }

  pre.src-calc:before { content: 'Emacs Calc'; }

  pre.src-emacs-lisp:before { content: 'Emacs Lisp'; }

  pre.src-fortran:before { content: 'Fortran'; }

  pre.src-gnuplot:before { content: 'gnuplot'; }

  pre.src-haskell:before { content: 'Haskell'; }

  pre.src-hledger:before { content: 'hledger'; }

  pre.src-java:before { content: 'Java'; }

  pre.src-js:before { content: 'Javascript'; }

  pre.src-latex:before { content: 'LaTeX'; }

  pre.src-ledger:before { content: 'Ledger'; }

  pre.src-lisp:before { content: 'Lisp'; }

  pre.src-lilypond:before { content: 'Lilypond'; }

  pre.src-lua:before { content: 'Lua'; }

  pre.src-matlab:before { content: 'MATLAB'; }

  pre.src-mscgen:before { content: 'Mscgen'; }

  pre.src-ocaml:before { content: 'Objective Caml'; }

  pre.src-octave:before { content: 'Octave'; }

  pre.src-org:before { content: 'Org mode'; }

  pre.src-oz:before { content: 'OZ'; }

  pre.src-plantuml:before { content: 'Plantuml'; }

  pre.src-processing:before { content: 'Processing.js'; }

  pre.src-python:before { content: 'Python'; }

  pre.src-R:before { content: 'R'; }

  pre.src-ruby:before { content: 'Ruby'; }

  pre.src-sass:before { content: 'Sass'; }

  pre.src-scheme:before { content: 'Scheme'; }

  pre.src-screen:before { content: 'Gnu Screen'; }

  pre.src-sed:before { content: 'Sed'; }

  pre.src-sh:before { content: 'shell'; }

  pre.src-sql:before { content: 'SQL'; }

  pre.src-sqlite:before { content: 'SQLite'; }

  /* additional languages in org.el's org-babel-load-languages alist */

  pre.src-forth:before { content: 'Forth'; }

  pre.src-io:before { content: 'IO'; }

  pre.src-J:before { content: 'J'; }

  pre.src-makefile:before { content: 'Makefile'; }

  pre.src-maxima:before { content: 'Maxima'; }

  pre.src-perl:before { content: 'Perl'; }

  pre.src-picolisp:before { content: 'Pico Lisp'; }

  pre.src-scala:before { content: 'Scala'; }

  pre.src-shell:before { content: 'Shell Script'; }

  pre.src-ebnf2ps:before { content: 'ebfn2ps'; }

  /* additional language identifiers per "defun org-babel-execute"

       in ob-*.el */

  pre.src-cpp:before  { content: 'C++'; }

  pre.src-abc:before  { content: 'ABC'; }

  pre.src-coq:before  { content: 'Coq'; }

  pre.src-groovy:before  { content: 'Groovy'; }

  /* additional language identifiers from org-babel-shell-names in

     ob-shell.el: ob-shell is the only babel language using a lambda to put

     the execution function name together. */

  pre.src-bash:before  { content: 'bash'; }

  pre.src-csh:before  { content: 'csh'; }

  pre.src-ash:before  { content: 'ash'; }

  pre.src-dash:before  { content: 'dash'; }

  pre.src-ksh:before  { content: 'ksh'; }

  pre.src-mksh:before  { content: 'mksh'; }

  pre.src-posh:before  { content: 'posh'; }

  /* Additional Emacs modes also supported by the LaTeX listings package */

  pre.src-ada:before { content: 'Ada'; }

  pre.src-asm:before { content: 'Assembler'; }

  pre.src-caml:before { content: 'Caml'; }

  pre.src-delphi:before { content: 'Delphi'; }

  pre.src-html:before { content: 'HTML'; }

  pre.src-idl:before { content: 'IDL'; }

  pre.src-mercury:before { content: 'Mercury'; }

  pre.src-metapost:before { content: 'MetaPost'; }

  pre.src-modula-2:before { content: 'Modula-2'; }

  pre.src-pascal:before { content: 'Pascal'; }

  pre.src-ps:before { content: 'PostScript'; }

  pre.src-prolog:before { content: 'Prolog'; }

  pre.src-simula:before { content: 'Simula'; }

  pre.src-tcl:before { content: 'tcl'; }

  pre.src-tex:before { content: 'TeX'; }

  pre.src-plain-tex:before { content: 'Plain TeX'; }

  pre.src-verilog:before { content: 'Verilog'; }

  pre.src-vhdl:before { content: 'VHDL'; }

  pre.src-xml:before { content: 'XML'; }

  pre.src-nxml:before { content: 'XML'; }

  /* add a generic configuration mode; LaTeX export needs an additional

     (add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */

  pre.src-conf:before { content: 'Configuration File'; }

  table { border-collapse:collapse; }

  caption.t-above { caption-side: top; }

  caption.t-bottom { caption-side: bottom; }

  td, th { vertical-align:top;  }

  th.org-right  { text-align: center;  }

  th.org-left   { text-align: center;   }

  th.org-center { text-align: center; }

  td.org-right  { text-align: right;  }

  td.org-left   { text-align: left;   }

  td.org-center { text-align: center; }

  dt { font-weight: bold; }

  .footpara { display: inline; }

  .footdef  { margin-bottom: 1em; }

  .figure { padding: 1em; }

  .figure p { text-align: center; }

  .equation-container {

    display: table;

    text-align: center;

    width: 100%;

  }

  .equation {

    vertical-align: middle;

  }

  .equation-label {

    display: table-cell;

    text-align: right;

    vertical-align: middle;

  }

  .inlinetask {

    padding: 10px;

    border: 2px solid gray;

    margin: 10px;

    background: #ffffcc;

  }

  #org-div-home-and-up

   { text-align: right; font-size: 70%; white-space: nowrap; }

  textarea { overflow-x: auto; }

  .linenr { font-size: smaller }

  .code-highlighted { background-color: #ffff00; }

  .org-info-js_info-navigation { border-style: none; }

  #org-info-js_console-label

    { font-size: 10px; font-weight: bold; white-space: nowrap; }

  .org-info-js_search-highlight

    { background-color: #ffff00; color: #000000; font-weight: bold; }

  .org-svg { width: 90%; }

    MathJax.Hub.Config({

        displayAlign: "center",

        displayIndent: "0em",

        "HTML-CSS": { scale: 100,

                        linebreaks: { automatic: "false" },

                        webFont: "TeX"

                       },

        SVG: {scale: 100,

              linebreaks: { automatic: "false" },

              font: "TeX"},

        NativeMML: {scale: 100},

        TeX: { equationNumbers: {autoNumber: "AMS"},

               MultLineWidth: "85%",

               TagSide: "right",

               TagIndent: ".8em"

             }

});



Simulation and plotting code for "Optimizing the use of carbonate standards to minimize uncertainties in clumped isotope data"



Table of Contents





1. install required packages

2. optional packages

3. load libraries

4. multiple cores

5. set up plotting theme



6. set up stdinfo etc.



6.1. setup axes







7. calculate temperature sensitivity as a function of temperature



7.1. temp_sens_pot





8. create standard intro plot

9. micro benchmark

10. example sims



11. stddis



11.1. create a list of all possible stddis combinations

11.2. expand the whole grid

11.3. stddis runall

11.4. arrange results for plot

11.5. create barcharts for proportion axes

11.6. stddis_pl

11.7. prep text







12. stdvssmp



12.1. setup

12.2. run sims

12.3. save results

12.4. tidy it up

12.5. stdvssmp_pl

12.6. supplementary figure plot with all input standard deviations







13. prop-eth3



13.1. prop-eth3 for continuous sample range



13.2. prop-eth3 expand experimental matrices and run simulations



13.2.1. run many sims

13.2.2. save results





13.3. smp_out_uu

13.4. best_dat

13.5. smp_out_comb



13.6. prop_eth3_pl



13.6.1. plot_best

13.6.2. plot_prop

13.6.3. combine and print

13.6.4. 3d rayshader plot





13.7. best_prop_diff_pl

13.8. calculate some summary statistics for use in-text







14. prop-eth3 with a very very very cold and hot standard?



14.1. prop-eth3 for continuous sample range

14.2. prop-eth3 expand experimental matrices and run simulations



14.3. run many sims



14.3.1. save results





14.4. brrr_best_dat

14.5. smp_out_comb



14.6. prop_eth3_pl



14.6.1. plot_best

14.6.2. plot_prop

14.6.3. combine and print





14.7. calculate some summary statistics for use in-text

14.8. quick check on how much it matters if we add two hypothetical very large-range standards













The supplementary code to the manuscript "Optimizing the use of carbonate

standards to minimize uncertainties in clumped isotope data".





Copyright (C) 2019 Ilja J. Kocken





This program is free software: you can redistribute it and/or modify

it under the terms of the GNU General Public License as published by

the Free Software Foundation, either version 3 of the License, or

(at your option) any later version.





This program is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

GNU General Public License for more details.





You should have received a copy of the GNU General Public License

along with this program.  If not, see https://www.gnu.org/licenses/.





See https://github.com/japhir/stdstats for potentially updated versions and Emacs org source files.





Please note that it takes quite some time to run all of the code, so if you

want to evaluate it, make sure to do so line-by-line!





We set the seed here in an attempt to make these exact simulations

reproducible, but this only works for the next random sampling call. Thus, you

will not get identical results. Please load the included simulation output .rds

files to generate identical plots (see below).





set.seed(1563435)









1. install required packages





Install all below packages. Some of them (e.g. furrr) rely on the dev version

available on GitHub. Install them with





devtools::install_github("DavisVaughan/furrr")







Install the new package stdsim similarly, from





devtools::install_github("japhir/stdsim")







or point to the directory in which you downloaded the stdsim folder with





devtools::install_local("/path/to/stdsim")













2. optional packages





I use this for my theme.





devtools::install_github("baptiste/egg")













3. load libraries





We use the following packages for this project:





library(dplyr)     # for piping (%>%), mutate, and nice data manipulation

library(tidyr)     # for making tibbles tidy, (gather, spread etc.)

library(tibble)    # a better thing than a dataframe

library(ggplot2)   # plotting

library(tictoc)    # to keep track of how long the functions took

library(purrr)     # functional programming to map variables from lists of dataframes

library(magrittr)  # only used once for extract, but original package of the pipe

library(patchwork) # combine plots

library(furrr)     # furrr allows parallell purr functions w/ progress bars!

library(stdsim)    # the new R package created just for this paper!













4. multiple cores





run simulations on as many cores as possible. Note that I read somewhere that

the random number generator gets less random when using multiple cores, but it

is much faster and since I perform all simulations often this shouldn't effect

the results.





plan(multiprocess)













5. set up plotting theme





We can use a different theme, but I like the one in the dev package egg.

Install it or ignore this section of code.





theme_set(egg::theme_article(base_size = 11, base_family="Helvetica"))













6. set up stdinfo etc.





This uses the default functions in stdsim to generate a tibble with standard

and sample information. It doesn't add δ₄₇ values by default since they differ

between labs and are not important for these simulations.





eth.info <- make_std_table()

smpinfo <- make_smp_info(c(0, 40))

stdev <- 14

# append d47 values based on actual measurement results for our MOTU

eth.info$d47 <- c(15.6, -13.2, 16.2, -13.1, NA_real_)











6.1. setup axes









7. calculate temperature sensitivity as a function of temperature





Now we are interested in calculating the rate of change as a function of D47,

so that we can calculate the change in temperature. So we take the derivative

of the original T(D47) function.





tempcal_simplified  <- function(Tc, slp=0.0449, int=0.167, kkelvin=273.15) {

    (slp * 1e6) / (Tc + kkelvin)^2 + int

}

tempcal_derivative  <- function(Tc, slp=0.0449, int=0.167, kkelvin=273.15) {

  -((2 * slp * 1e6) / ((kkelvin + Tc) ^ 3))

}

revcal_simplified <- function(D47, slp=0.0449, int=0.167, kkelvin=273.15) {

    sqrt((slp * 1e6) / (D47 - int)) - kkelvin

}

revcal_derivative <- function(D47, slp=0.0449, int=0.167) {

  (sqrt(-(slp * 1e6) / (int - D47))) / (2 * int - 2 * D47)

}







See





?revcal

?tempcal







for the actual function documentation.









7.1. temp_sens_pot





Calculate the sensitivity of the temperature calibration at the relevant

temperature range, so that we can add an estimate of uncertainty in the

temperature domain to plots.





The citeA:Kele2015 temperature calibration is only valid between 6 and 95 °C, so





# guo 2009 eqn. 18

# takes temperature in degrees celsius, converts to D47

guo_cal <- function(temp) {

  # convert degrees celsius to kelvin

  x <- temp + kkelvin

  # apply polynomial fit

  -3.33040e9 / x^4 + 2.32415e7 / x^3 - 2.91282e3 / x^2 - 5.54042 / x + 0.23252

}

# takes D47, converts to temperature in degrees celsius

guo_deriv <- function(temp) {

  x <- temp + kkelvin

  (5.54042 * x^3 + 5825.64 * x^2 - 69724500 * x + 13321600000)/x^5 * 1000

}







Update standards to use Guo if ETH-1 or ETH-2. We hack it together by

numerically solving it.





guo_temp <- tibble(Tc = rng, D47 = guo_cal(Tc))

eth1_new_temp <- guo_temp$Tc[[which(near(guo_temp$D47, eth.info$D47[[1]], tol = .0000005))]]

eth2_new_temp <- guo_temp$Tc[[which(near(guo_temp$D47, eth.info$D47[[2]], tol = .0000005))]]

guo_std_temp <- bind_rows(eth.info, smpinfo) %>%

  mutate(temp = case_when(id == "ETH-1" ~ eth1_new_temp, #802.812 - kkelvin,

                          id == "ETH-2" ~ eth2_new_temp, #822.2 - kkelvin,

                          TRUE ~ temp))







plot_temp <- sensdf %>%

  filter(Tc >= 6, Tc <= 95) %>%

  ggplot(aes(y = D47, x = Tc)) +

  geom_ribbon(aes(ymin = lwr, ymax = upr), fill = "skyblue", alpha = .4) +

  geom_line(colour = "blue", linetype = 2, alpha = .5, data = sensdf) +

  geom_line(colour = "blue") +

  ## geom_line(colour = "black", data = guo_temp) +

  stat_function(fun = guo_cal, colour = "black") +

  geom_segment(aes(x = -Inf, xend = temp, y = D47, yend = D47, col = id),

               alpha=.5,

               inherit.aes=FALSE,

               data = guo_std_temp) +

  geom_segment(aes(x = temp, xend = temp, y = -Inf, yend = D47, col = id),

               alpha=.5,

               inherit.aes=FALSE,

               data = guo_std_temp) +

  annotate("text", x = 60, y = .65, label = "Kele et al., 2015, \nrecalculated by Bernasconi et al., 2018", colour = "darkblue", hjust = 0) +

  annotate("text", x = 450, y = .3, label = "Guo et al., 2009") +

  scale_colour_manual(values = c(eth.info$col[-5], smpinfo$col[[1]], eth.info$col[[5]])) +

  labs(x = "Temperature (°C)", y = Delta[47] ~ "CDES (\u2030)") +

  coord_cartesian(ylim = c(.2, .8), xlim = c(0, 550)) +

  theme(axis.title.x=element_blank(),

        axis.text.x=element_blank(),

        legend.pos=c(.75, .6), legend.title = element_blank())

plot_sens <- sensdf %>%

  filter(Tc >= 6, Tc <= 95) %>%

  ggplot(aes(y = sens * 1e3, x = Tc)) +

  geom_line(colour = "blue", linetype = 2, alpha = .5, data = sensdf) +

  geom_line(colour = "blue") +

  stat_function(fun = guo_deriv, xlim = c(-10, 1000)) +

  coord_cartesian(ylim = c(-5, 0), xlim = c(0, 550)) +

  ## geom_vline(xintercept=c(0, 40), col="#ededed") +

  labs(x = "Temperature (°C)", y = "Sensitivity" ~ "(" * Delta[47] / "°C, ppm)")

temp_sens_pl <- plot_temp + plot_sens + plot_layout(nrow=2, heights=c(3, 1))

temp_sens_pl























8. create standard intro plot





The standards as a function of composition





lims <- c(.15, .71)

standards_plot <- ggplot(eth.info, aes(x = d47, y = D47.noacid, col = id, label = id)) +

  geom_point(size = 2, show.legend = F) +

  ## geom_label(show.legend = F) +

  ggrepel::geom_label_repel(size = 2.5, show.legend = FALSE) +

  # add UU1 standard

  geom_hline(yintercept = eth.info$D47.noacid[[5]], linetype = 2, col = eth.info$col[[5]]) +

  annotate("label", x = 1.225, y = eth.info$D47.noacid[[5]], label = eth.info$id[[5]], col = eth.info$col[[5]], size = 2.5) +

  scale_colour_manual(values = eth.info$col) +

  labs(colour = "", x = delta^{47}~"(SG vs WG PBL \u2030)",

    y = Delta[47] ~ "CDES" - "AFF (\u2030)") +

  scale_y_continuous(sec.axis = sec_axis(~ sqrt((0.0449 * 1e+6)/(. + kaff - 0.167)) - 273.15,

                                        "Sample temperature (°C)", temp_breaks,

                                        temp_labs)) +

  coord_flip(ylim=lims) +

  theme(axis.text.x.bottom=element_blank(), axis.title.x.bottom=element_blank())

  ## coord_cartesian(clip = "off") +

  ## theme(legend.pos = c(.15, .85))







And the simulation input conditions illustrating the ETF.





stdevs <- c(14, 25, 50) / 1e3

xs <- .54 + c(0, .04, .08)

ys <- rep(-.6, 3)

standard_sample_data <- make_smp_info(c(0, 40)) %>%

  mutate(id=ifelse(temp == 40, "sample 1", "sample 2")) %>%

  ## mutate(id=paste(id, temp)) %>%

  bind_rows(eth.info)

base_plot <-  standard_sample_data %>%

  ggplot(aes(x = D47.noacid, y = rawcat, col = id, label = id)) +

  # add etf

  geom_abline(intercept = kintercept, slope = kslope,

    linetype = 1, size = 1, col = "gray") +

  # 50 ppm uncertainty pointrange

  geom_linerange(aes(ymin = rawcat - 50 * kslope / 1e3,

    ymax = rawcat + 50 * kslope / 1e3),

    size = 1, linetype = 1, alpha = .1) +

  # 25 ppm uncertainty pointrange

  geom_linerange(aes(ymin = rawcat - 25 * kslope / 1e3,

    ymax = rawcat + 25 * kslope / 1e3),

    size = 1, linetype = 1, alpha = .4) +

  # 14 ppm uncertainty pointrange

  geom_linerange(aes(ymin = rawcat - 14 * kslope / 1e3,

    ymax = rawcat + 14 * kslope / 1e3),

    size = 1, linetype = 1) +

  geom_point(size=2) +

  # create a manual legend with the different input uncertainties

  annotate("segment",

           x = xs, xend = xs,

           y = ys, yend = ys + stdevs * kslope,

    alpha = c(1, .4, .1)) +

  annotate("segment",

           x=xs, xend=xs + stdevs,

           y=ys, yend=ys,

           alpha=c(1, .4, .1)) +

  annotate("text",

           x = xs,

           y = ys - .02,

    label = c("14", "25", "50"), size = 2) +

  # add the input sample measurements

  ggrepel::geom_text_repel(force = 3, hjust = 1, nudge_y = .05, nudge_x = -.01, size=2.5, segment.color = NA) +

  # make it pretty, manual colour scale, samples are black

  scale_colour_manual(

      limits = standard_sample_data$id,

      values = standard_sample_data$col) +

  # nice axis labels

  labs(

    colour = "",

    x = Delta[47] ~ "CDES" - "AFF (\u2030)",

    y = Delta[47] ~ "raw (\u2030)"

  ) +

  scale_x_continuous(limits=lims) +

  theme(legend.pos = "none")







In the text we combine them using patchwork, to create figure 1.





standards_pl <- standards_plot + base_plot + plot_layout(nrow=2, heights = c(.4, .6))

standards_pl





















9. micro benchmark





Calculate how long it takes for one simulation.





options(genplot = FALSE, verbose = FALSE)

tpersim <- microbenchmark::microbenchmark(sim_stds(out = "cis",

  stdtable = eth.info)) %>%

  summary() %>%

  pull(mean) / 100

tpersim













10. example sims





We create some example simulations for fig. 2.





options(genplot=F, verbose=F)

## set up small inputs dataframe

example_sims <- tibble(

  name=rep(c("Equal\nproportions", "Optimal\ndistribution", "Optimal\ndistribution + UU1"), 2),

  stdfreqs=rep(list(c(1, 1, 1, 1, 0), c(1, 1, 9, 0, 0), c(1, 1, 0, 0, 9)), 2),

  smpt=c(rep(0, 3), rep(40, 3))) %>%

  ## run sims with inputs dataframe

  mutate(res = purrr::pmap(select(., -name), sim_stds, stdev=25, out="all", stdn=50, smpn=50),

         # extract the default plots

         pl=purrr::map(res, plot_sim, graylines=F, point_alpha=.2, pointrange=T, labs=F, fixed=F),

         # add a row number for the next step

         exprow=1:n() %>% as.character())

# combine the smp and std outputs of each experiment, based on the row number

six_example_sims <- example_sims$res %>%

  map_dfr("smp", .id="exprow") %>%

  bind_rows(example_sims$res %>% map_dfr("std", .id="exprow")) %>%

  left_join(example_sims, by="exprow")







We create the example plot





exmp_plot <- ggplot(six_example_sims, aes(x=D47.noacid, y=raw, col=id, fill=id)) +

  geom_smooth(aes(group=paste0(name, smpt)), method="lm", size=.1,

              fullrange=TRUE, data=filter(six_example_sims, id != "sample")) +

  geom_violin(alpha=.3, colour=NA, scale="count", width=.5, position=position_identity()) +

  geom_point(shape=1, alpha=.2, size=.3) +

  facet_grid(rows=vars(name), cols=vars(paste(smpt, "°C"))) +

  ## coord_fixed(xlim=c(.1, .8)) +

  coord_cartesian(xlim=c(.14, .75)) +

  scale_colour_manual("ID",

                      ## limits = c(out$cond$stdtable$id, out$cond$smpinfo$id),

                      limits = c(example_sims$res[[1]]$cond$stdtable$id, example_sims$res[[1]]$cond$smpinfo$id),

                      ## values = c(out$cond$stdtable$col, out$cond$smpinfo$col)) +

                      values = c(example_sims$res[[1]]$cond$stdtable$col, example_sims$res[[1]]$cond$smpinfo$col)) +

  scale_fill_manual("ID",

                    limits = c(example_sims$res[[1]]$cond$stdtable$id, example_sims$res[[1]]$cond$smpinfo$id),

                    values = c(example_sims$res[[1]]$cond$stdtable$col, example_sims$res[[1]]$cond$smpinfo$col)) +

  labs(x = Delta[47] ~ "CDES" - "AFF (\u2030)",

       y = Delta[47] ~ raw ~ "(\u2030)") +

  theme(legend.pos="top", legend.key.size=unit(3, "mm"), legend.text = element_text(size = 6),

        strip.text.y = element_text(size = 8, angle = 90))

exmp_plot







and the table for in the text:





tbl_exmp <-

  forplot_0 %>%

  bind_rows(forplot_40) %>%

  filter(expname %in% c("1:1:1:1:0", "1:1:9:0:0", "1:1:1:0:9")) %>%

    select(-stdfreqs, -exprow, -meanerr, -hascoldstandard) %>%

  group_by(expname, smpt, stdev) %>%

  ## nest() %>%

  summarize(err_mean = mean(smp, na.rm = TRUE) * 1e3,

            err_ci = qt((1 - .05), length(smp) - 1) * sd(smp, na.rm = TRUE) / sqrt(length(smp) - 1) * 1e3) %>%

  mutate(err_temp = err_mean / 1e3 / tempcal_derivative(smpt) %>% abs,

         err_temp_ci = err_ci / 1e3 / tempcal_derivative(smpt) %>% abs,

   ) %>%

  arrange(stdev, -err_mean) %>%

  ## bind_cols(map_dfr(.$data, ~ mean(smp))) %>%

  pivot_wider(id_cols = c(expname, stdev), names_from = smpt, values_from = c(err_mean, err_ci, err_temp, err_temp_ci)) %>%

  # change order of things

  mutate(Name = case_when(expname == "1:1:1:1:0" ~ "Equal proportions",

                          expname == "1:1:9:0:0" ~ "Optimal distribution",

                          expname == "1:1:1:0:9" ~ "Optimal distribution + UU1",

                          TRUE ~ "wth")) %>%

  select(Name, expname, stdev, ends_with("_0"), ends_with("_40")) %>%

  as_tibble() %>%

  mutate(

    out_0_ppm = glue::glue("{round(err_mean_0, 2)} \\pm {round(err_ci_0, 2)}"),

    ## improv_0 = round(err_mean_0 / lead(err_mean_0), 2),

    ## improv_40 = round(err_mean_0 / lead(err_mean_0), 2),

    out_0_deg = glue::glue("{round(err_temp_0, 2)} \\pm {round(err_temp_ci_0, 2)}"),

    out_40_ppm=glue::glue("{round(err_mean_40, 2)} \\pm {round(err_ci_40, 2)}"),

    out_40_deg = glue::glue("{round(err_temp_40, 2)} \\pm {round(err_temp_ci_40, 2)}"),

    ) %>%

  ## rename for latex output

  select(Name,

         `Standard distribution`=expname,

         `\\sigma` = stdev,

         `0\\us\\celsius (ppm)` = out_0_ppm,

         ## `\\times` = improv_0,

         `40\\us\\celsius (ppm)`=out_40_ppm,

         ## `\\times` = improv_40,

         ## `0\\u

         `0\\us\\celsius (\\celsius)`=out_0_deg,

         `40\\us\\celsius (\\celsius)`=out_40_deg)





Name

Standard distribution

σ

0 °C (ppm)

40 °C (ppm)

0 °C (°C)

40 °C (°C)

Equal proportions

1:1:1:1:0

14

9.37 ± 0.12

7.21 ± 0.09

2.13 ± 0.03

2.46 ± 0.03

Optimal distribution

1:1:9:0:0

14

6.8 ± 0.09

5.58 ± 0.07

1.54 ± 0.02

1.91 ± 0.03

Optimal distribution + UU1

1:1:1:0:9

14

6 ± 0.07

5.63 ± 0.07

1.36 ± 0.02

1.93 ± 0.02

Equal proportions

1:1:1:1:0

25

17.03 ± 0.24

12.89 ± 0.16

3.86 ± 0.05

4.41 ± 0.06

Optimal distribution

1:1:9:0:0

25

12.45 ± 0.18

10.24 ± 0.12

2.82 ± 0.04

3.5 ± 0.04

Optimal distribution + UU1

1:1:1:0:9

25

10.86 ± 0.12

10.11 ± 0.1

2.46 ± 0.03

3.46 ± 0.04

Equal proportions

1:1:1:1:0

50

35.15 ± 0.57

26.35 ± 0.46

7.98 ± 0.13

9.01 ± 0.16

Optimal distribution

1:1:9:0:0

50

25.08 ± 0.44

20.1 ± 0.27

5.69 ± 0.1

6.87 ± 0.09

Optimal distribution + UU1

1:1:1:0:9

50

22.09 ± 0.33

20.03 ± 0.28

5.01 ± 0.08

6.85 ± 0.1









11. stddis





To simulate the distribution of the standards we create a tibble of inputs for

sim_stds().









11.1. create a list of all possible stddis combinations





# the proportions we want each standard to get

pr <- c(0, 1, 3, 9)

# each standard gets these proportions in all possible combinations

props <- expand.grid(

  `ETH-1` = pr,

  `ETH-2` = pr,

  `ETH-3` = pr,

  `ETH-4` = pr,

  UU1 = pr) %>%

  # we need at least 1|2 & 3|4|UU1 to be able to calculate an ETF

  # we need at least 1|2|4 & 3|4|UU1 to be able to calculate an ETF

  ## filter(`ETH-1` + `ETH-2` + `ETH-4` > 0 & `ETH-3` + `ETH-4` + `UU1` > 0) %>%

  # we need at least 1 standard

  filter(`ETH-1` + `ETH-2` + `ETH-4` + `ETH-3` + `ETH-4` + `UU1` > 0) %>%

  # calculate relative abundances

  mutate(sums = rowSums(.),

    f1 = `ETH-1` / sums,

    f2 = `ETH-2` / sums,

    f3 = `ETH-3` / sums,

    f4 = `ETH-4` / sums,

    fu = UU1 / sums) %>%

  # filter out the redundant ones

  distinct(f1, f2, f3, f4, fu, .keep_all = TRUE) %>%

  arrange(`ETH-1`, `ETH-2`, `ETH-3`, `ETH-4`, UU1)

# convert the proportions to a list we can put in our experimental matrix

props_list <- props %>%

  select(-c(sums, starts_with("f"))) %>%

  as.matrix() %>%

  split(seq(nrow(.)))













11.2. expand the whole grid





stddis <- expand.grid(

    smpt = c(0, 40),

    stdfreqs = props_list,  # this uses the list created previously

    stdev = c(50, 25, 14)) %>%

  # we add an experiment name for plot labels and easy overview

  mutate(expname = map_chr(stdfreqs, paste, collapse = ":")) %>%

  # add cold standard logical for later filtering

  mutate(hascoldstandard = grepl("[[139]]", expname))

# repeat each experiment a hundred times

megastddis <- stddis[rep(stddis %>% nrow() %>% seq_len(), 100), ] %>%

  # and add a row character for later merging of results

  mutate(exprow = as.character(seq_len(n())))













11.3. stddis runall





Note that we use the package future so that we can show a progress bar and use

multiple cores. One could also use purrr::pmap_dfr() which it's based on, here.





## I turn off plotting and info messages

options(genplot = FALSE, verbose = FALSE)

## keep track of how long it takes

message(nrow(megastddis), " simulations started at ", Sys.time())

## very rough expected finish time (it's usually faster)

message("expected to take until ", Sys.time() + tpersim * nrow(megastddis) / 4)

## track actual duration with tictoc

tic("stddis total time")

## run sim_stds with parameters from mgstddis and global parameters after it

stddis_cnf <- furrr::future_pmap_dfr(

                       select(megastddis, smpt, stdfreqs, stdev),

                       sim_stds, stdtable = eth.info,

                       out = "cis", stdn = 50, smpn = 50,

                       .id = "exprow", # append a row name id

                       .progress=TRUE   # show a progress bar

                     ) %>%

  filter(id == "smp") %>%                        # filter output

  select(exprow, id, cv) %>%                     # select output

  spread(id, cv) %>%                             # make it wide format

  right_join(megastddis, by="exprow")            # join it with experimental df

toc()

message("simulations ran until ", Sys.time())







Save the results so that we don't have to run the simulations every time.





saveRDS(stddis_cnf, "stddis_cnf_2019-06-17.rds")







Restore the results from previous simulations.





stddis_cnf <- readRDS("stddis_cnf_2019-06-17.rds")













11.4. arrange results for plot





We re-organize the dataframes and make a selection of the best results for 0

and 40 degrees.





# the mean error of equal proportions at 0 degrees

normerr_0  <- stddis_cnf %>%

  filter(smpt == 0, expname == "1:1:1:1:0") %>%

  summarize(meanerr=mean(smp)) %>%

  pull(meanerr)

idealerr_0 <- stddis_cnf %>%

  filter(smpt == 0, expname == "1:1:9:0:0") %>%

  summarise(meanerr = mean(smp)) %>%

  pull(meanerr)

normerr_40  <- stddis_cnf %>%

  filter(smpt == 40, expname == "1:1:1:1:0") %>%

  summarize(meanerr=mean(smp)) %>%

  pull(meanerr)

idealerr_40 <- stddis_cnf %>%

  filter(smpt == 40, expname == "1:1:9:0:0") %>%

  summarise(meanerr = mean(smp)) %>%

  pull(meanerr)

forplot_0 <-

  stddis_cnf %>%

  filter(smpt == 0) %>%

  # append mean error per treatment to original dataframe

  group_by(expname) %>%

  summarise(meanerr = mean(smp, na.rm=TRUE)) %>%

  left_join(filter(stddis_cnf, smpt == 0), by = "expname") %>%

  # if there is no cold standard, it should be at least as good as normerr

  filter((!hascoldstandard & (meanerr <= normerr_0)) |

  # if there is a cold standard, it should be at least as good as idealerr

           (hascoldstandard & (meanerr <= idealerr_0))) %>%

  arrange(meanerr, expname) %>%

  ungroup() %>%

  # this is a hack to order the labels of a factor in a plot

  mutate(expname = factor(expname, unique(expname)))

forplot_40 <-

  stddis_cnf %>%

  filter(smpt == 40) %>%

  # append mean error per treatment to original dataframe

  group_by(expname) %>%  # note that we do not take into account sample temp..

  summarise(meanerr = mean(smp, na.rm=TRUE)) %>%

  left_join(filter(stddis_cnf, smpt == 40), by = "expname") %>%

  # if there is no cold standard, it should be at least as good as normerr

  filter((!hascoldstandard & (meanerr <= normerr_40)) |

  # if there is a cold standard, it should be at least as good as idealerr

           (hascoldstandard & (meanerr <= idealerr_40))) %>%

  arrange(meanerr, expname) %>%

  mutate(expname = factor(expname, unique(expname)))













11.5. create barcharts for proportion axes





forplot_0_x <- forplot_0 %>%

  distinct(expname, .keep_all = TRUE) %>%

  unnest(cols=stdfreqs) %>%

  mutate(std = rep(c(paste0("ETH-", 1:4), "UU"), n()/5))

x_0 <- forplot_0_x %>%

  ggplot(aes(x = expname, y = stdfreqs, fill = std)) +

  geom_col(position="fill") +

  ## geom_text(aes(label = lab), y = .5, size = 2,

  ##           data = mutate(forplot_0_x,

  ##                         lab = ifelse(expname %in% c("1:1:1:1:0", "1:1:9:0:0", "1:1:0:0:9"), expname, ""))) +

  scale_fill_manual(values=eth.info$col,

                    guide = guide_legend(label.position = "top",

                                         direction = "horizontal",

                                         label.theme = element_text(size = 8, angle = 90, hjust = 0))) +

  scale_y_reverse(position = "right", expand = c(0, 0)) +

  labs(x = "Standard distribution", y = "ETH-1:2:3:4:UU1", fill = "") +

  ## labs(x = "Standard distribution", y = "ETH-<span style='color:orange'>1</span>:<span style='color:purple'>2</span>:<span style='color:#00B600'>3</span>:<span style='color:blue'>4</span>:<span style='color:#FFCD00'>UU1</span>") +

  coord_flip() +

  theme(axis.text = element_blank(),

        axis.ticks = element_blank(),

        plot.margin = margin(r = 0),

        axis.title.x = element_blank(),

        legend.pos = "top",

        legend.justification = 12,

        legend.spacing.x = unit(1, "mm"),

        ## legend.margin = margin(2, 0, 2, 0),

        legend.key.size = unit(2, "mm"))







forplot_40_x <- forplot_40 %>%

  distinct(expname, .keep_all = TRUE) %>%

  unnest(cols=stdfreqs) %>%

  mutate(std = rep(c(paste0("ETH-", 1:4), "UU"), n()/5))

x_40 <- x_0 %+% forplot_40_x +

  theme(axis.title.x = element_blank(),

        axis.title.y = element_blank(),

        legend.pos = "none")













11.6. stddis_pl





sub_vjust <- -7

sds <- tibble(expname = "1:0:9:1:0", sd = c(14, 25, 50), smp = c(.006, .012, .027) + .003, hascoldstandard = FALSE, smpt = 0)

# create a plot

stddis_pl_0 <- forplot_0 %>%

  ggplot(aes(x = expname, y = smp * 1e3,  # error in ppm

    colour = factor(smpt), fill = factor(smpt))) +

  # theming

  labs(x = "Standard distribution",

    y = "Combined error of sample and ETF (95% CI, ppm)",

    fill = ktit_smpid,

    colour = ktit_smpid,

    shape = "Total standards") +

  scale_fill_manual(values = kcols[[1]], labels = "0 °C") +

  scale_colour_manual(values = kcols[[1]], labels = "0 °C") +

  scale_y_continuous(breaks = seq(0, 65, 10), lim = c(5, 40),

    sec.axis = sec_axis("Combined error of sample and ETF (95% CI, °C)",

      trans = ~. / abs(tempcal_derivative(0) * 1e3), breaks = seq(0, 20, 2))) +

  theme(

    plot.title = element_text(hjust = 0.5, vjust = -10),

    plot.margin = margin(l = 0),

    ## plot.subtitle = element_text(size = 8, hjust = -.1, vjust = -8),

    axis.title.x.top = element_text(hjust = 7),

    axis.title.x.bottom = element_text(hjust = 7),

    axis.text.y = element_blank(),

    axis.ticks.y = element_blank(),

    axis.title.y = element_blank(),

    strip.text = element_blank(),

    legend.position = "none") +

  # manual legend for input stdevs (sigma)

  # the actual data

  geom_point(alpha = .03) +

  ## annotate("text", x = -Inf, y = Inf, label = paste0("\u3c3 = ", c(14, 25, 50))) +

  geom_text(aes(label = sd), data = sds, colour = "black") +

  coord_flip(clip = "off") +

  stat_summary(aes(group=stdev), geom="ribbon", fun.data=mean_cl_normal, fun.args=list(conf.int=0.95), alpha=.4, colour=NA) +

  stat_summary(aes(group=stdev), geom="line", fun.data=mean_cl_normal) +

  labs(title="0 °C sample") #, subtitle = "ETH-1:2:3:4:UU1")

# add the average lines +- 95% CIs for all the sample temperatures

stddis_pl_40 <- forplot_40 %>%

  filter(smpt == 40) %>%

  ggplot(aes(x = expname, y = smp * 1e3,  # error in ppm

    colour = factor(smpt), fill = factor(smpt))) +

  # theming

  labs(x = "Standard distribution",

    y = "Combined error of sample and ETF (95% CI, ppm)",

    fill = ktit_smpid,

    colour = ktit_smpid,

    shape = "Total standards") +

  scale_fill_manual(values = kcols[[2]], labels = "40 °C") +

  scale_colour_manual(values = kcols[[2]], labels = "40 °C") +

  scale_y_continuous(breaks = seq(0, 65, 10), lim = c(5, 30),

    sec.axis = sec_axis("Combined error of sample and ETF (95% CI, °C)",

      trans = ~. / abs(tempcal_derivative(40) * 1e3), breaks = seq(0, 20, 2))) +

  theme(

    plot.title = element_text(hjust = 0.5, vjust = -10),

    plot.margin = margin(l = 0),

    plot.subtitle = element_blank(),

    axis.title = element_blank(),

    axis.text.y = element_blank(),

    axis.ticks.y = element_blank(),

    axis.title.y = element_blank(),

    strip.text = element_blank(),

    legend.position = "none") +

  labs(title="40 °C sample") +

  coord_flip(clip = "off") +

  # the actual data

  geom_point(alpha = .03) +

  stat_summary(aes(group=stdev), geom="ribbon", fun.data=mean_cl_normal,

               fun.args=list(conf.int=0.95), alpha=.4, colour=NA) +

  stat_summary(aes(group=stdev), geom="line", fun.data=mean_cl_normal)

## stddis_pl  <- stddis_pl_0 + stddis_pl_40 + plot_layout(widths=c(35 / 25 , 1))

stddis_pl <- x_0 + stddis_pl_0 + x_40 + stddis_pl_40 + plot_layout(widths = c(.15, 35/25, .15, 1), ncol = 4)







stddis_pl





















11.7. prep text





This generates some dataframes so we can easily extract relevant averages for

use in-text.





stddis_exmp_0 <- forplot_0 %>%

  filter(smpt == 0) %>%

  group_by(expname) %>%

  summarize(err=mean(smp, na.rm=TRUE), cnf = confidence(smp, n(), alpha=0.05)) %>%

  arrange(err)

stddis_exmp_40 <- forplot_40 %>%

  filter(smpt == 40, stdev == 25) %>%

  group_by(expname) %>%

  summarize(err=mean(smp,na.rm=TRUE), cnf=confidence(smp, n(), alpha= 0.05)) %>%

  arrange(err)















12. stdvssmp





Here we simulate the standards versus samples based on three input distributions,









12.1. setup





stdvssmp <- expand.grid(

  smpt = c(0, 40), stdn = as.integer(seq(12, 88, 4)),

  dist = c("equal proportions", "optimal proportions", "optimal proportions including UU1"),

  stdev = c(50, 25, 14)) %>%

  as_tibble() %>%

  mutate(smpn = as.integer(100 - stdn),

         n = as.integer(smpn + stdn),

         smpf = smpn / n,

         stdf = stdn / n,

         stdfreqs = case_when(dist == "equal proportions" ~ list(c(1, 1, 1, 1, 0)),

                              dist == "optimal proportions" ~ list(c(1, 1, 9, 0, 0)),

                              dist == "optimal proportions including UU1" ~ list(c(1, 1, 1, 0, 9))))

# repeat each experiment a hundred times

megastdvssmp <- stdvssmp[rep(stdvssmp %>% nrow() %>% seq_len(), 100), ] %>%

  mutate(exprow = as.character(seq_len(n())))  # row number













12.2. run sims





# again, I turn off plotting and info messages

options(genplot = FALSE, verbose = FALSE)

# keep track of time

message(nrow(megastdvssmp), " simulations started at ", Sys.time())

message("expected to take until ", Sys.time() + tpersim * nrow(megastdvssmp) / 4)

# run duration

tic("overview")

# run sim_stds with parameters from megastdvssmp and global parameters after it

stdvssmp_cnf <- furrr::future_pmap_dfr(

                         select(megastdvssmp,

                                smpt, stdn, stdev, smpn, stdfreqs),

                         sim_stds,

                         stdtable = eth.info, out = "cis",

                         .id = "exprow",          # append a row name id

                         .progress=TRUE

                         ) %>%

  filter(id %in% c("etf","sample", "smp")) %>%    # filter output

  select(exprow, id, cv) %>%                      # select output

  spread(id, cv) %>%                              # make it wide format

  right_join(megastdvssmp)                        # join it with experimental df

toc()

message("simulations ran until ", Sys.time())













12.3. save results





saveRDS(stdvssmp_cnf, "stdvssmp_cnf_2019-06-13.rds")







stdvssmp_cnf <- readRDS("stdvssmp_cnf_2019-06-13.rds")













12.4. tidy it up





tidy_stdvssmp_results <- stdvssmp_cnf %>%

  gather(errortype, error, sample, etf, smp)













12.5. stdvssmp_pl





## create label annotation because the legend with opacity was unclear

smpn=c(76, 44, 30)

error=c(8, 4, 25)

labels = c("ETF", "Sample", "Combined")

leg <- tibble(

  ## x=c(58, 67, 28,

  ##     53, 62, 63),

  ## y=c(14.5, 5.5, 16,

  ##     10, 5.5, 12.5),

  x=rep(smpn, 2), y=rep(error, 2),

  xend=c(58, 67, 35,

         53, 67, 26),

  yend=c(15, 5.5, 16,

         10, 5.5, 13),

  smpt=c(rep(0, 3), rep(40, 3)),

  stdev = 25,

  ## lab=paste(labels, "uncertainty at", smpt, "°C"),

  lab=rep(labels, 2),

  errortype = rep(c("etf", "sample", "smp"), 2),

  dist=rep("equal proportions", 6))

              ## smpt=0)

stdvssmp_pl <- ggplot(tidy_stdvssmp_results %>% filter(stdev == 25),

  aes(x = smpn, y = error * 1e3, fill = as.factor(smpt),

    colour = as.factor(smpt), alpha = as.factor(errortype))) +

  # draw the points for all simulations, but make them very vague

  geom_point(alpha = .05, size = .3) +

  # draw an error range through the different experiments

  stat_summary(geom = "ribbon", colour = NA,  # no border

    fun.data = mean_cl_normal,

    fun.args = list(conf.int = .95, na.rm = TRUE)) +

  # draw an average through the different experiments

  stat_summary(geom = "line", fun.data = mean_cl_normal) +

  # three standard distributions on the x-facets, 3 standard deviations on y

  ## we add some ugly labels here because they are more clear than a legend in this case

  geom_segment(aes(x=x, xend=xend, y=y, yend=yend), data=leg, size=2, show.legend=F) +

  geom_label(aes(x=x, y=y, label=lab), data=leg, colour="black", fill="white", alpha=1, show.legend=F) +

  facet_grid(#rows = vars(stdev),

    ## rows = vars(stdev),

    cols = vars(dist),

    as.table = FALSE,

    ## shrink = TRUE,

    ## scales = "free_y",

    ## space = "free_y"

    ) +

  # x-axes

  scale_x_continuous("Number of sample replicates", lim = c(10, 90),

    breaks = seq(12, 88, 8),

    sec.axis = sec_axis(~ 100 - ., name = "Number of standard replicates",

      breaks = seq(88, 12, -8))) +

  scale_y_continuous("95% CI (ppm)",

                     ## trans="log10",

                     limits = c(3, 35),

                     breaks = c(seq(0, 50, 5), seq(60, 100, 10)),

                     ## sec.axis = sec_axis(~. / abs(tempcal_derivative(0) * 1e3), name = "Approximate error (°C)",

                                         ## breaks=c(seq(0, 20, 1), seq(25, 40, 5)))

                     ) +

  ## coord_trans(y = "log10") +

  # colours

  scale_colour_manual(ktit_smpid, labels = klab_smpid, values = kcols) +

  scale_fill_manual(ktit_smpid, labels = klab_smpid, values = kcols) +

  scale_alpha_manual("Source of Error",

    labels = c("ETF", "Sample", "Combined"),

    values = c(.5, .2, .9), guide=FALSE) +

  # theming

  ## annotation_logticks(sides="l") +

  theme(legend.position = c(.85, .8), strip.text=element_text(size=8), strip.placement="outside")

 stdvssmp_pl





















12.6. supplementary figure plot with all input standard deviations





we use a new package that allows different scales on different facets





devtools::install_github("zeehio/facetscales")







stdvssmp_pl_all <- ggplot(tidy_stdvssmp_results,

  aes(x = smpn, y = error * 1e3, fill = as.factor(smpt),

    colour = as.factor(smpt), alpha = as.factor(errortype))) +

  # draw the points for all simulations, but make them very vague

  geom_point(alpha = .05, size = .3) +

  # draw an error range through the different experiments

  stat_summary(geom = "ribbon", colour = NA,  # no border

    fun.data = mean_cl_normal,

    fun.args = list(conf.int = .95, na.rm = TRUE)) +

  # draw an average through the different experiments

  stat_summary(geom = "line", fun.data = mean_cl_normal) +

  # three standard distributions on the x-facets, 3 standard deviations on y

  ## we add some ugly labels here because they are more clear than a legend in this case

  geom_segment(aes(x=x, xend=xend, y=y, yend=yend), data=leg, size=2, show.legend=F) +

  geom_label(aes(x=x, y=y, label=lab), data=leg, colour="black", fill="white", alpha=1, show.legend=F) +

  facetscales::facet_grid_sc(

    rows = vars(stdev),

    cols = vars(dist),

    as.table = FALSE,

    shrink = TRUE,

    scales = list(

      y = list(`14` = scale_y_continuous("95% CI (ppm)", lim = c(2, 15)),

               `25` = scale_y_continuous("95% CI (ppm)", lim = c(3, 30)),

               `50` = scale_y_continuous("95% CI (ppm)", lim = c(9, 55)))

    ),

    ## space = "free_y"

    ) +

  # x-axes

  scale_x_continuous("Number of sample replicates", lim = c(10, 90),

    breaks = seq(12, 88, 8),

    sec.axis = sec_axis(~ 100 - ., name = "Number of standard replicates",

      breaks = seq(88, 12, -8))) +

  ## scale_y_continuous(,

  ##                    ## trans="log10",

  ##                    ## limits = c(NA, 50),

  ##                    breaks = c(seq(0, 50, 5), seq(60, 100, 10)),

  ##                    ## sec.axis = sec_axis(~. / abs(tempcal_derivative(0) * 1e3), name = "Approximate error (°C)",

  ##                                        ## breaks=c(seq(0, 20, 1), seq(25, 40, 5)))

  ##                    ) +

  ## coord_trans(y = "log10") +

  # colours

  scale_colour_manual(ktit_smpid, labels = klab_smpid, values = kcols) +

  scale_fill_manual(ktit_smpid, labels = klab_smpid, values = kcols) +

  scale_alpha_manual("Source of Error",

    labels = c("ETF", "Sample", "Combined"),

    values = c(.5, .2, .9), guide=FALSE) +

  # theming

  ## annotation_logticks(sides="l") +

  theme(legend.position = c(.85, .85), strip.text=element_text(size=8), strip.placement="outside")

 stdvssmp_pl_all























13. prop-eth3





In the discussion we create a new set of simulations.









13.1. prop-eth3 for continuous sample range





new_smp_info2 <- tibble(smpid = "smp", smp_D47 = seq(0.18, 0.9, 0.0025)) %>%

  mutate(smp_D47.noacid = smp_D47 - kaff,

         rawcat = smp_D47.noacid * kslope + kintercept,

         smpt = revcal(smp_D47, ignorecnf = TRUE))

prop_eth3 <- seq(.02, .98, length.out = 500)













13.2. prop-eth3 expand experimental matrices and run simulations





mat <- expand.grid(smp_D47 = new_smp_info2$smp_D47, prop_eth3 = prop_eth3) %>%

  left_join(new_smp_info2, by = "smp_D47") %>%

  mutate(prop_left = 1 - prop_eth3,

         exprow = as.character(seq_along(1:n()))) %>%

  mutate(stdfreqs = select(., prop_eth3, prop_left) %>% as.matrix() %>% split(seq(nrow(.)))) %>%

  select(-prop_left)











13.2.1. run many sims





smp_out <- furrr::future_pmap_dfr(

                    select(mat, stdfreqs, smpt),

                    sim_stds,

                    ## here we subset the standards to ETH-3 and ETH-1, in that order

                    stdtable = make_std_table()[c(3, 1), ],

                    stdev = 25, out = "cis", .id = "exprow", .progress = TRUE) %>%

  filter(id %in% c("etf","sample", "smp")) %>%   # filter output

  select(exprow, id, cv) %>%                     # select output

  spread(id, cv) %>%                             # make it wide format

  right_join(mat, by = "exprow") %>% # join it with experimental df

  mutate(exp="ETH-1 and ETH-3")













13.2.2. save results





saveRDS(smp_out, "smp_out_new_2019-06-12.rds")







smp_out <- readRDS("smp_out_new_2019-06-12.rds")















13.3. smp_out_uu





smp_out_uu <-

  furrr::future_pmap_dfr(

           select(mat, stdfreqs, smpt),

           sim_stds,

           # here we subset the standards to ETH-3 and ETH-1, in that order

           stdtable = make_std_table()[c(5, 1), ],

           stdev = 25, out = "cis", .id = "exprow", .progress = TRUE) %>%

  filter(id %in% c("etf","sample", "smp")) %>%   # filter output

  select(exprow, id, cv) %>%                     # select output

  spread(id, cv) %>%                             # make it wide format

  right_join(mat, by = "exprow") %>% # join it with experimental df

  mutate(exp="ETH-1 and UU1")







saveRDS(smp_out_uu, "smp_out_uu_2019-06-12.rds")







smp_out_uu <- readRDS("smp_out_uu_2019-06-12.rds")













13.4. best_dat





best_range <- 1:10

best_dat <- bind_rows(smp_out, smp_out_uu) %>%

  group_by(exp, smp_D47) %>%

  arrange(smp) %>%

  slice(best_range)













13.5. smp_out_comb





combine the sims for one plot with faceting





smp_out_comb  <- smp_out %>%

  bind_rows(smp_out_uu) %>%

  select(exprow, smp_D47, prop_eth3, exp, smp) %>%

  spread("exp", "smp") %>%

  mutate(diff=`ETH-1 and UU1` - `ETH-1 and ETH-3`)













13.6. prop_eth3_pl









13.6.1. plot_best







13.6.2. plot_prop







13.6.3. combine and print





prop_eth3_pl <- plot_best + plot_prop + plot_layout(nrow=2, heights=c(.2, .8))

prop_eth3_pl





















13.6.4. 3d rayshader plot





This one is not included in the manuscript or the supplementary information

pdf, but I highly recommend creating one to play around with it!





# new rayshader 3d option

# remotes::install_github("tylermorganwall/rayshader")

library(rayshader)

ray <- smp_out %>%

  ggplot(aes(x = smp_D47, y = prop_eth3, fill = smp * 1e3)) +

  geom_raster() +

  ## geom_smooth(aes(col = exp), se = F, method = "loess", span = .3, size = 1,

  ##           data = best_dat) +

  ## geom_point(aes(col = exp), shape = 0, size = 1, alpha = .5,

  ##            data = filter(best_dat, exp == "ETH-1 and ETH-3")) +

  labs(x = Sample~Delt[[47]]"CDES (\u2030)",

       y = "Proportion of standards",

       fill = "Combined 95% CI\n(ppm)") +

  viridis::scale_fill_viridis(

             ## the rescaler works nicely, but messes up the legend a bit

             ## rescaler = function(x, to = c(0, 1), from = NULL, newmax=30) {

             ##   ifelse(x < newmax,

             ##          scales::rescale(x, to = to, from = c(min(x, na.rm = TRUE), newmax)), 1)},

             ## I'll go back to simple clipping again

             limits = c(NA, 30),

             expand=c(0, 0),

             ## oob = function(x) {x},

             option = "magma",

             breaks = err_breaks, #labels = err_ticks

           ) +

  scale_y_continuous(expand = c(0, 0), breaks = c(.05, .1, .25, .5, .75, .9, .95)) +

  scale_x_continuous(expand = c(0, 0), lim=c(.18, .9),

          sec.axis = sec_axis(~ sqrt((0.0449 * 1e+6)/(. - 0.167)) - 273.15,

                          "Sample temperature (°C)", temp_breaks, temp_labs)) +

  theme(legend.key.width = unit(.3, "cm"),

        legend.key.height = unit(2, "cm"),

        strip.placement="outside")

plot_gg(ray, multicore=TRUE, width=5, height=7, scale=350, raytrace=TRUE,

        sunangle=40)

## render_movie("rayshader_movie_plot.mp4")

render_snapshot("imgs/rayshader_snapshot.png")







See the screenshot for a preview















13.7. best_prop_diff_pl





The difference plot for the supplementary information.





best_dat_100 <- bind_rows(smp_out, smp_out_uu) %>%

  group_by(exp, smp_D47) %>%

  arrange(smp) %>%

  slice(1:100)

best_dat_comb <- best_dat_100 %>%

  ungroup() %>%

  select(smp_D47, prop_eth3, exp, smp) %>%

  spread("exp", "smp") %>%

  mutate(diff=`ETH-1 and UU1` - `ETH-1 and ETH-3`)







plot_best_comb <- best_dat_comb %>%

  ggplot(aes(x = smp_D47, y = diff * 1e3)) +

  geom_hline(yintercept=0) +

  annotate("text", x = c(.5, .5), y = c(2, -4), label = c("ETH-3 does better", "UU1 does better"), size = 3) +

  geom_vline(xintercept=smpinfo$D47, col=kcols[1:2]) +

  ## show ALL the points?

  ## geom_point(shape = 16, alpha = .1, data=bind_rows(smp_out,smp_out_uu)) +

  geom_point(shape = 1, alpha = .1) +

  geom_smooth(method="loess", span = .3) +

  labs(col="Selection of standards",

       x = Sample ~ Delta[47] ~ "(\u2030)",

       y = "Difference in combined 95% CI (ppm)") +

  scale_colour_manual(values=eth.info$col[c(3, 5)]) +

  scale_x_continuous(expand=c(0, 0), lim=c(.18, .9),

                     sec.axis = sec_axis(~ sqrt((0.0449 * 1e+6)/(. - 0.167)) - 273.15,

                                         "Sample temperature (°C)", temp_breaks, temp_labs)) +

  scale_y_continuous(expand = c(0, 0)) +

  ## add arrows to ETH-1 and ETH-3 positions

  annotate("segment",

           arrow = arrow(angle = 20, length = unit(.4, "cm"), type = "closed"),

           x = eth.info$D47[c(1, 3, 5)], y = c(-Inf, Inf, -Inf),

           xend = eth.info$D47[c(1, 3, 5)], yend = c(-4, 2, -5),

           colour = eth.info$col[c(1, 3, 5)],

           alpha = 1,

           size = .4) +

  annotate("text", x = eth.info$D47[c(1, 3, 5)], # + c(0.03, -.02),

           y = c(-4, 2, -5), label = eth.info$id[c(1, 3, 5)],

           vjust= c(-.5, .5, -.5), size = 2.5) +

  theme(legend.pos=c(.2, .7),

        ## shared axis with bottom panel

        axis.title.x.bottom=element_blank(), axis.text.x.bottom=element_blank())







plot_prop_diff <- smp_out_comb %>%

  ggplot(aes(x = smp_D47, y = prop_eth3, fill = diff * 1e3)) +

  geom_raster() +

  geom_smooth(aes(col = exp, fill=smp*1e3), se = F, method = "loess", span = .3, size = 1,

              data = best_dat_100) +

  ## geom_point(aes(col=exp), data=best_dat_100, shape = 1) +

  scale_colour_manual(values=eth.info$col[c(3, 5)]) +

  scale_fill_gradient2(low=eth.info$col[[5]], high=eth.info$col[[3]], breaks = err_breaks,

                       limits=c(-5, 5)) +

  labs(x = Sample ~ Delta[47] ~ "(\u2030)",

       y = "Proportion of standards",

       col = "Selection of standards",

       fill = "Difference in\ncombined 95% CI\n(ppm)") +

  scale_y_continuous(expand = c(0, 0), breaks = c(.05, .1, .25, .5, .75, .9, .95)) +

  scale_x_continuous(expand = c(0, 0), lim=c(.18, .9)) +

  theme(legend.key.width = unit(.1, "cm"),

        legend.key.height = unit(1.5, "cm"),

        strip.placement="outside")







best_prop_diff_pl <- plot_best_comb + plot_prop_diff + plot_layout(nrow=2, heights=c(.2, .8))

best_prop_diff_pl





















13.8. calculate some summary statistics for use in-text





# these are some helper functions to calculate the values we put in the text

# get the confidence value at alpha value alpha from a t-distribution

confidence <- function(x, n, alpha=.05) {

  qt(1 - alpha / 2, df=n - 1) * sd(x) / sqrt(n)

}

# convert x from permil to ppm and round down to dig digits.

ppmround <- function(x, dig=2) {

  round(x * 1e3, digits=dig)

}

interp_eth3 <- best_dat %>%

  ungroup() %>%

  filter(exp=="ETH-1 and ETH-3", smp_D47 >= eth.info$D47[[1]], smp_D47 <= eth.info$D47[[5]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

extrap_eth3 <- best_dat %>%

  ungroup() %>%

  filter(exp=="ETH-1 and ETH-3", smp_D47 >= eth.info$D47[[5]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

interp_uu <- best_dat %>%

  ungroup() %>%

  filter(exp=="ETH-1 and UU1", smp_D47 >= eth.info$D47[[1]], smp_D47 <= eth.info$D47[[5]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

extrap_uu <- best_dat %>%

  ungroup() %>%

  filter(exp=="ETH-1 and UU1", smp_D47 >= eth.info$D47[[5]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

eth3_to_uu_eth3 <- best_dat %>%

  ungroup() %>%

  filter(exp=="ETH-1 and ETH-3", smp_D47 >= eth.info$D47[[3]], smp_D47 <= eth.info$D47[[5]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

eth3_to_uu_uu <- best_dat %>%

  ungroup() %>%

  filter(exp == "ETH-1 and UU1", smp_D47 >= eth.info$D47[[3]], smp_D47 <= eth.info$D47[[5]]) %>%

  summarize(mean=mean(smp, na.rm = TRUE),

            ci=confidence(smp, n=n()))

eth3_to_uu_tempsens <- seq(eth.info$D47[[3]], eth.info$D47[[5]], .01) %>%

  tempcal_derivative() %>%

  mean()

uu_to_0_eth3 <- best_dat %>%

  ungroup() %>%

  filter(exp=="ETH-1 and ETH-3", smp_D47 >= eth.info$D47[[5]], smp_D47 <= smpinfo$D47[[1]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

uu_to_0_uu <- best_dat %>%

  ungroup() %>%

  filter(exp=="ETH-1 and UU1", smp_D47 >= eth.info$D47[[5]], smp_D47 <= smpinfo$D47[[1]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

uu_to_0_tempsens <- seq(eth.info$D47[[5]], smpinfo$D47[[1]], .01) %>%

  tempcal_derivative() %>%

  mean()















14. prop-eth3 with a very very very cold and hot standard?





The reviewers requested another set of simulations.









14.1. prop-eth3 for continuous sample range





same as before





new_smp_info2 <- tibble(smpid = "smp", smp_D47 = seq(0.18, 0.9, 0.0025)) %>%

  mutate(smp_D47.noacid = smp_D47 - kaff,

         rawcat = smp_D47.noacid * kslope + kintercept,

         smpt = revcal(smp_D47, ignorecnf = TRUE))

prop_eth3 <- seq(.02, .98, length.out = 500)













14.2. prop-eth3 expand experimental matrices and run simulations





same as before





mat <- expand.grid(smp_D47 = new_smp_info2$smp_D47, prop_eth3 = prop_eth3) %>%

  left_join(new_smp_info2, by = "smp_D47") %>%

  mutate(prop_left = 1 - prop_eth3,

         exprow = as.character(seq_along(1:n()))) %>%

  mutate(stdfreqs = select(., prop_eth3, prop_left) %>% as.matrix() %>% split(seq(nrow(.)))) %>%

  select(-prop_left)













14.3. run many sims





brrr <- make_std_table(id = c("UU2", "UU3"), col = c("darkblue", "red"),

               D47_std = c(0.9252, 0.0266)) # based on Wang 2004

brrr_smp_out <- furrr::future_pmap_dfr(

                    select(mat, stdfreqs, smpt),

                    sim_stds,

                    ## here we subset the standards to ETH-3 and ETH-1, in that order

                    stdtable = brrr,

                    stdev = 25, out = "cis", .id = "exprow", .progress = TRUE) %>%

  filter(id %in% c("etf","sample", "smp")) %>%   # filter output

  select(exprow, id, cv) %>%                     # select output

  spread(id, cv) %>%                             # make it wide format

  right_join(mat, by = "exprow") %>% # join it with experimental df

  mutate(exp="UU3 and UU2")











14.3.1. save results





saveRDS(brrr_smp_out, "brrr_smp_out_2019-08-20.rds")







the 0.8 U2 standard + ETH-1





brrr_smp_out <- readRDS("brrr_smp_out_2019-08-19.rds")







The heated/eq. gas equivalents





brrr_smp_out <- readRDS("brrr_smp_out_2019-08-20.rds")















14.4. brrr_best_dat





best_range <- 1:10

brrr_best_dat <- bind_rows(smp_out, smp_out_uu, brrr_smp_out) %>%

  group_by(exp, smp_D47) %>%

  arrange(smp) %>%

  slice(best_range)













14.5. smp_out_comb





combine the sims for one plot with faceting





brrr_smp_out_comb  <- bind_rows(smp_out, brrr_smp_out, smp_out_uu) %>%

  select(exprow, smp_D47, prop_eth3, exp, smp) %>%

  spread("exp", "smp") %>%

  mutate(diff=`UU3 and UU2` - `ETH-1 and ETH-3`)













14.6. prop_eth3_pl









14.6.1. plot_best







14.6.2. plot_prop







14.6.3. combine and print





brrr_prop_eth3_pl <- brrr_plot_best + brrr_plot_prop + plot_layout(nrow=2, heights=c(.2, .8))

brrr_prop_eth3_pl























14.7. calculate some summary statistics for use in-text





eth3_to_u2_eth3 <- brrr_best_dat %>%

  ungroup() %>%

  filter(exp=="ETH-1 and ETH-3", smp_D47 >= eth.info$D47[[3]], smp_D47 <= brrr$D47[[1]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

eth3_to_u2_u2 <- brrr_best_dat %>%

  ungroup() %>%

  filter(exp=="UU3 and UU2", smp_D47 >= eth.info$D47[[3]], smp_D47 <= brrr$D47[[1]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

eth3_to_u2_tempsens <- seq(eth.info$D47[[3]], brrr$D47[[1]], .001) %>%

  tempcal_derivative() %>%

  mean(na.rm=FALSE)

u2_to_0_eth3 <- best_dat %>%

  ungroup() %>%

  filter(exp=="ETH-1 and ETH-3", smp_D47 >= smpinfo$D47[[1]], smp_D47 <= brrr$D47[[1]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

u2_to_0_u2 <- brrr_best_dat %>%

  ungroup() %>%

  filter(exp=="UU3 and UU2", smp_D47 >= smpinfo$D47[[1]], smp_D47 <= brrr$D47[[1]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

u2_to_0_tempsens <- seq(smpinfo$D47[[1]], brrr$D47[[1]], .01) %>%

  tempcal_derivative() %>%

  mean()

eth3_to_0_eth3 <- brrr_best_dat %>%

  ungroup() %>%

  filter(exp=="ETH-1 and ETH-3", smp_D47 >= eth.info$D47[[3]], smp_D47 <= smpinfo$D47[[1]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

eth3_to_0_u2 <- brrr_best_dat %>%

  ungroup() %>%

  filter(exp=="UU3 and UU2", smp_D47 >= eth.info$D47[[3]], smp_D47 <= smpinfo$D47[[1]]) %>%

  summarize(mean=mean(smp),

            ci=confidence(smp, n=n()))

eth3_to_0_tempsens <- seq(eth.info$D47[[3]], smpinfo$D47[[1]], .01) %>%

  tempcal_derivative() %>%

  mean()













14.8. quick check on how much it matters if we add two hypothetical very large-range standards





(eth3_to_0_eth3$mean - eth3_to_0_u2$mean) * 1000







ppm difference, which equates to





 # in permil             # in permil          # in permil / degreeC

(eth3_to_0_eth3$mean - eth3_to_0_u2$mean) / eth3_to_0_tempsens







so… .3 °C improvement for samples between \(-29.8000392923437\)





maybe it's even more at the more extreme end?





(eth3_to_0_eth3$mean - eth3_to_0_u2$mean) / eth3_to_0_tempsens







degrees improvement for samples between ETH-3 (~20) and 0 °C.





or in terms of improvement:





((eth3_to_0_eth3$mean / eth3_to_0_u2$mean) - 1) * 100















Date: 2019-08-23

Author: Ilja J. Kocken

Created: 2021-11-16 Tue 13:39

Validate