An open API service indexing awesome lists of open source software.

https://github.com/CorradoLanera/devubesp

Automate Analyses Setup at UBESP
https://github.com/CorradoLanera/devubesp

Last synced: 5 months ago
JSON representation

Automate Analyses Setup at UBESP

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# devubesp

[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/CorradoLanera/devubesp?branch=master&svg=true)](https://ci.appveyor.com/project/CorradoLanera/devubesp)
[![Travis build status](https://travis-ci.org/CorradoLanera/devubesp.svg?branch=master)](https://travis-ci.org/CorradoLanera/devubesp)
[![Codecov test coverage](https://codecov.io/gh/CorradoLanera/devubesp/branch/master/graph/badge.svg)](https://codecov.io/gh/CorradoLanera/devubesp?branch=master)
[![Lifecycle: maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)
[![CRAN status](https://www.r-pkg.org/badges/version/devubesp)](https://cran.r-project.org/package=devubesp)

The goal of devubesp is to automate analyses setup tasks that are
otherwise performed manually. This includes setting up directories,
supporting packages and projects.

## Installation

You can install the development version from
[GitHub](https://github.com/) with the following procedure:

```{r, eval = FALSE}
# install.packages("devtools")
devtools::install_github("CorradoLanera/devubesp")
```

## Preamble

The package, _de facto_, provide a single function. The aim of the
function ``create_ubesp_analysis()` is to setup all the folders'
structure, with all the needed and sample files to conduct analyses at
UBESP. Does not matter if they are for a thesis, a project, a simple
analyses, the initial structure for the project should be the same.

The aim is to drastically reduce (up to the phisical limit of write a
single line of code) the time needed to set up a well-organized and
standardized structure for all the project, from the simplest to the
most complex.

My first hope is to have provaided a simple and effective instrument to
reduce the direct time in the preparation of any project. The second,
and more relevant, hope is that all of this will be shared between the
colleagues in a way to reduce also the future time to understand each
other projects structure and work-flow in conducting analyses with R
within the same unit.

## Characteristics

The main characteristics of the resulting structure of files and folders
created are:

- A main folder name which is not ambiguous in time, this is achieved
pre-appending the year to the name of the main project name.^[In the
`minutes/` folder there where also an empty `.txt` file named with
the current date (`yymmdd`) and the project name, to permit to setup
initial official notes (time-trakked).]

- presence of a first-level folder to collect and manage *data-raw*,
which can be huge, and sensitive, hence often they cannot be shared
or uploaded on the Net (e.g., GitHub). For this reason this folder
is outside the one promoted to be VCS trakked by git.GitHub.
On the other hand, the folder itself can be copy-pasted to a
costumer folder, a supervisor folder, or to an external folder
(e.g., SharePoint), if usefull and admitted, because it contains
only the raw data and the script to import (merge and) import them
in the analyses folder (if necessary).

The folder structure is very simple and it contains only a
preconfigured script to import and save the data in the right
position.


- presence of a folder (`analyses/`) that can be safely copy-pasted to
a costumer folder, to supervisor folder, or to an external folder
(e.g., SharePoint). This has to be happen without any worry to
brake some VCS controller (like git) nor to share unwanted
information, comments, notes, data that should remain private to the
analyst. It will contain all the reports, the outputs, and the
scripts useful to find the results and also to reproduce them.

The main folder structure is:
```
analyses/
|- R/ # R script for analyses and reports (.R)
|- reports/ # Final version of the reports (.docx)
|- outputs/ # Output to share for the analyses (.png)
|- docx-template/ # Template to fine knit the report (.docx)
|- bibtex-files/ # Reference cited in the report (.bib)
```
This folder, because it is an analyses one, is already preconfigured
as an RStudio `.Rproj`.

- presence of a package-folder infrastructure which can be use to
implement and include any custom or additional function usefull
for the analyses. Making able to test, document, trakking their
changes, and share with everyone else easily.

This can be VCS with git and GitHub safely, without the risk to
overfull the space at the disposal with messy and uncompressed raw
data.

The folder `analyses/` is stored in the first level inside of the
package-folder `prjname/`, so it is easy to find, and use them for the
analyses. On the other hande, `analyses/` is `.Rbuildignore`d from
the package, this way it that can be git-tracked (by default it is)
but it is not provided in the package boundle.^[If you want to not
git trakking the `analyses/` folder too, add it to `.gitignore`.]
Hence, the package can be safely shared both through GitHub or CRAN
without risk to share data or analyses, and everyone (future-you
too) can easily use all the function you provide with the package
as simply as `devtools::install_gitub("UBESP-DCTV/")`.

- presence of (standardized) additional common project and utility
folders and files. These are all subfolders of the main project
folder:

```
yyyy-prjname/
|- prjname/ # the R-package folder
|- analyses/ # the folder for the analyses
|- essential-bibliography/ # used literature (.pdf)
|- papers/ # maintaining ordered in writings
|- draft/ # from drafts (`.docx`)
|- submitted/ # to submissions (`.docx`, `.png`, `.pdf`)
|- published/ # up to the publishing (`.pdf`, only)
|- minutes/ # for the minutes (example `.txt` inside)
|- proposal.docx # requests (replace example w/ current!)
|- personal-notes.md # personal notes, not to be shared
```

## Example of usage

The simplest example is all you need to view to learn how simple it is
to be used:^[the tilda ("`~`"), here, expands to the home directory from
the "r point-of-view", i.e. the classical home/ on Unix systems, but to
the "user/documents" on windows (instead of the simple "user/").]

```{r simple-example, eval=FALSE}
devubesp::create_ubesp_analysis("~/test")
# next, read and follow the instructions appering on the screen :-)
```

You can also include longer path to aggregate similar kynd of projects:

```{r structured-example, eval=FALSE}
devubesp::create_ubesp_analysis("~/theses/cl")
devubesp::create_ubesp_analysis("~/department-analyses/cardio/xxx")
devubesp::create_ubesp_analysis("~/whatever/whatelse")
```

## Best Practice for synchronized folders
Including a git project into a synchronized folder (e.g., OneDrive, G
Drive or DropBox) can lead to issues because too many agents compete to
have the control of what you can see and what is changed (e.g., git and
OneDrive).

On the other side, if you develop your code using git, you do not need
to have a second agent to track it! Anyway, all the part of your
project, which is not under git control should be monitored and stored
in the synchronized folder.

From here, the choice seems to be using two distinct folders, e.g., one
under `OneDrive/` for all the project's files you do not want into git
(and the data, if they are protected or too big for GitHub) and one
under git version control on `~/Documents/`.

On the other hand, that choice can be annoying; you have to switch
continuously from one folder to another during the development of your
project. More important, you cannot access smoothly to folders in
OneDrive you do not want to git (e.g., `data/`) from your R project
directory directly. Anyway, copying everything twice is incredibly
inefficient, at best!

A possible solution is to use symbolic links. They are simple pointers
to a file or folder. To the user, they appear like shortcut links, but
they are not: the former are only pointers (i.e., they are neither files
nor folders), while the latter are files containing the address of their
corresponding target files or folders.

Now you can:

- put all your project-tree under, e.g., `OneDrive/yyyy-prjname/`

- create an empty folder `~/Documets/yyyy-prjname/`

- move the `projname/` folder containing all the development staff you
are going to develop in R into the `~/Documets/yyyy-prjname/` folder

- track it under git

- create a symbolic link from all the other files and folders from
`OneDrive/yyyy-prjname` to `~/Documets/yyyy-prjname/`.

This way, `~/Documents/yyyy-prjname`/ folder will (apparently) contain
all the objects into `OneDrive/yyyy-prjname/`, plus the R project
folder. Moreover, those symlinks do not use any space in your disk, and
they are safely stored and tracked by `OneDrive` only. On the other
hand, thank symlinks, you can access them, from the R project folder
smoothly using standard relative paths as they were __really__ there,
e.g., by calling `here::here(../data-raw)` from the project working
directory!

To create symlinks you can have different ways depending of your needs
and OS. Following what I find usefull to learn how to create them:

- Windows 10: [command line](https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/mklink),
[powershell (see Example 7)](https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.management/new-item?view=powershell-7),
[GUI (mouse right click)](https://schinagl.priv.at/nt/hardlinkshellext/linkshellextension.html)

- Linux: [here](https://www.howtogeek.com/287014/how-to-create-and-use-symbolic-links-aka-symlinks-on-linux/)

## Feature request
If you need some more features, please file an issue on
[github](https://github.com/CorradoLanera/devubesp/issues).

## Bug reports
If you encounter a bug, please file a
[reprex](https://github.com/tidyverse/reprex)
(minimal reproducible example) on
[github](https://github.com/CorradoLanera/devubesp/issues).

## Code of Conduct
Please note that the "devubesp" project is released with a
[Contributor Code of Conduct](.github/CODE_OF_CONDUCT.md).
By contributing to this project, you agree to abide by its terms.

## Attribution
Most of the underlying function used to create this package comes
directly or with slightly modifications from the
[usethis](https://github.com/r-lib/usethis) package, by Hadley Wickham
and Jennifer Bryan.

## Footnotes