{"id":32589001,"url":"https://github.com/geco-bern/r_proj_template","last_synced_at":"2025-10-30T00:52:05.491Z","repository":{"id":141504815,"uuid":"449338898","full_name":"geco-bern/R_proj_template","owner":"geco-bern","description":"GECO R project template","archived":false,"fork":false,"pushed_at":"2025-06-05T16:06:32.000Z","size":110,"stargazers_count":5,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-06-05T17:19:47.380Z","etag":null,"topics":["data-science","developer-tools","development","r","template"],"latest_commit_sha":null,"homepage":"https://geco-bern.github.io/R_proj_template/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/geco-bern.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-01-18T15:25:53.000Z","updated_at":"2025-06-05T16:02:59.000Z","dependencies_parsed_at":"2023-10-26T12:45:33.148Z","dependency_job_id":null,"html_url":"https://github.com/geco-bern/R_proj_template","commit_stats":null,"previous_names":[],"tags_count":0,"template":true,"template_full_name":null,"purl":"pkg:github/geco-bern/R_proj_template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geco-bern%2FR_proj_template","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geco-bern%2FR_proj_template/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geco-bern%2FR_proj_template/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geco-bern%2FR_proj_template/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/geco-bern","download_url":"https://codeload.github.com/geco-bern/R_proj_template/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geco-bern%2FR_proj_template/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281726179,"owners_count":26550920,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-29T02:00:06.901Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","developer-tools","development","r","template"],"created_at":"2025-10-30T00:52:03.336Z","updated_at":"2025-10-30T00:52:05.480Z","avatar_url":"https://github.com/geco-bern.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# The GECO R project template\n\nThis is a github template for a GECO R project repository. This template can\nbe loaded when creating a new project in github. As such you pre-populate a\nproject with a standard set of directories. This ensures a project-oriented \nand consistent workflow across all lab members, and removes some of the mental\noverhead in making decisions on a project structure.\n\nIt must be noted that some of the functionality as provided by this setup is\nincluded in the [`usethis`](https://usethis.r-lib.org/) R package, the aim is \nslightly different. For one, `usethis` is aimed at populating a package structure.\nSecond, there is not preset full template, hence many individual decisions still\nneed to be made and code executed. This is bypassed by this setup.\n\n## Use\n\n### A github project from the template\n\nTo use the template create a new repository on github, as you otherwise would\nusing the big green button. If you are in the project on github you can hit\nthe green button top right (Use this template).\n\n![](https://github.com/bluegreen-labs/environmental_data_science_101/raw/main/images/green_button.png)\n\nOtherwise you can select the repository from the template dropdown menu, select `geco-bern/R-project-template`.\n\n![](https://github.com/bluegreen-labs/environmental_data_science_101/raw/main/images/new_repo_1.png)\n\nProceed as usual by naming your repository. However, be careful to select the \ncorrect owner of the project if you have multiple identities.\n![](https://github.com/bluegreen-labs/environmental_data_science_101/raw/main/images/new_repo_2.png)\n\n### Clone to your local computer\n\nThe new repository will be created and populated with the files of the template.\nYou can then clone the project to your local computer to add files.\n\nAlthough optional it is wise to rename the `*.Proj` file as this will not\nautomatically change to your repository name. However, retaining the original\nname should not change any of the functionality.\n\n```bash\ngit clone git@github.com:USER/YOUR_PROJECT.git\n```\n\n## Structure\n\nThe structure of the template follows the structure of an R package without\nactually being one. There are several reasons for this.\n\n- Familiarizes you with an R package structure\n  - allowing for an optional switch to an R package\n- Avoids top level aggregation of data, code and reporting files\n- Splits the dynamic reporting from academic writing (`vignettes` vs. `manuscript`)\n- Splits pre-processing of data from working / included data (`data-raw` vs. `data`)\n- Splits R code from other scripts (bash / python in `src`)\n- Splits R functions from R analysis scripts (`R` vs `analysis`)\n\nBelow you find a comprehensive list of what goes where an why, as well as some\nbest practices on how to structure further data within these folders.\n\n### The R folder\n\nThe `R` folder contains R functions, not scripts. This means code wrapped in a\nstructure as such\n\n```R\n# A demo function\n#\n# This function demonstrates the general layout\n# of a function\n\nmy_function \u003c- function(parameter) {\n  some_actions\n}\n```\n\nFunctions are actions you need more than once, which can not be generated\neasily with external packages and are tailored to your project.\n\nThese functions should stand on their own with limited links to additional\ncustom functions. Ideally you provide a brief title and description on the \nfunction's purpose before.\n\nWriting functions seems an initial waste of time, you could easily just copy and\npaste some code in your analysis scripts. However, this means that if you\ndecide certain aspects of this workflow you might have to hunt down these\nchanges in all analysis scripts. Failing to do so will result in corrupted \nanalysis. In addition, writing functions will make it easy to re-use the code\nwithin the context of a new project, and if proven to be generally useful\noutside a single research project it can be integrated in a formal package.\n\n### The src folder\n\nThe `src` folder contains scripts and code which is not R related, in packages\nthis folder often contains Fortran or C code which needs to be compiled. Here,\nit is common to store bash or python functions which might assist in data\ncleaning or data gathering which can't be done in R alone.\n\n### The data-raw folder\n\nThe `data-raw` folder contains, as the name suggests, raw data and the scripts\nto download and pre-process the data. This is data which requires significant\npre-processing to be of use in analysis. In other words, this data is not \nanalysis ready (within the context of the project).\n\nTo create full transparency in terms of the source of this raw data it is best\nto include (numbered) scripts to download and pre-process the data. Either in\nthese scripts, or in a separate README, include the source of the data (reference)\nUltimately, the output of the workflow in data-raw is data which is analysis ready.\n\nIt is best practice to store various raw data products in their own sub-folder,\nwith data downloading and processing scripts in the main `data-raw` folder.\n\n```\ndata-raw/\n├─ raw_data_product/\n├─ 00_download_raw_data.R\n├─ 01_process_raw_data.R\n```\n\nWhere possible it is good practice to store output data (in `data`) either as human \nreadable CSV files, or as R serialized files \n(generated using with the `saveRDS()` function).\n\nIt is common that raw data is large in size, which limits the option of storing\nthe data in a git repository. If this isn't possible this data can be excluded\nfrom the git repository by explicitly adding directories to `.gitignore` to\navoid accidentally adding them.\n\nWhen dealing with heterogeneous systems dynamic paths can be set to (soft) link\nto raw-data outside the project directory.\n\n### The data folder\n\nThe `data` folder contains analysis ready data. This is data which you can use,\nas is. This often contains the output of a `data-raw` pre-processing workflow,\nbut can also include data which doesn't require any intervention, e.g. a land\ncover map which is used as-is. Output from `data-raw` often undergoes a\ndramatic dimensionality reduction and will often fit github file size limits. In\nsome cases however some data products will still be too large, it is recommended\nto use similar practices as describe for `data-raw` to ensure transparency\non the sourcing of this data (and reproducible acquisition).\n\nIt is best to store data in transparently named sub-folders according to the\nproduct type, once more including references to the source of the data where\npossible. Once more, download scripts can be used to ensure this transparency\nas well.\n\n```\ndata/\n├─ data_product/\n├─ 00_download_data.R\n```\n\n### The analysis folder\n\nThe `analysis` folder contains, *surprise*, R scripts covering analysis of your\nanalysis ready data (in the `data` folder). These are R scripts with output\nwhich is limited to numbers, tables and figures. It should not include R\nmarkdown code!\n\nIt is often helpful to create additional sub-folders for statistics and figures,\nespecially if figures are large and complex (i.e. visualizations, rather than\ngraphical representations of statistical properties, such as maps). \n\nScripts can have a numbered prefix to indicate an order of execution, but this\nis generally less important as you will work on analysis ready data. If there\nis carry over between analysis, either merge the two files or use numbered\nprefixes.\n\n```\nanalysis/\n├─ statistics/\n│  ├─ 00_random_forest_model.R\n│  ├─ 01_random_forest_tuning.R\n├─ figures/\n│  ├─ global_model_results_map.R\n│  ├─ complex_process_visualization.R\n```\n\nOutput of the analysis routines can be written to file (`manuscript` folder) or\nvisualized on the console or plot viewer panel.\n\n### The vignettes folder\n\nThe `vignettes` folder contains dynamic notebooks, i.e. R markdown files. These\nmight serve a dual use between analysis and manuscript. However, the use case\nin reality should be considered very narrowly. In general, as they are commonly\nused, R markdown files are rarely portable. The argument that it is easy to \nshare rendered html files is invalid if you adhere to an open workflow with\ngithub based snapshots. The latter ensures that all code is visible, all data\nis visible, and the project is truly reproducible. Furthermore, R markdown\ndocuments mix two cognitive tasks, writing text and writing code. Switching\nbetween these two modes comes with undue overhead. If you code, you should not\nbe writing prose, and vise versa.\n\nUnless applied to small, educational, examples a markdown file has little place\nin a code heavy environment. In short, if your R markdown file contains more \ncode than it does text, it should be considered an R script or function \n(with comments or documentation). Conversely, if your markdown file contains\nmore text than code it probably is easier to collaborate on a true word \nprocessing file (or a Google Docs file). The use case where the notebooks might\nserve some importance is true reporting of general statistics.\n\nFinally, the use of R markdown also encourages bad project management practices.\nMost commonly this originates from the fact that rendering of the document is\nrelative to the location of the document itself. If no session management tools\nsuch as the package [`here`](https://here.r-lib.org/) are used this automatically\ncauses files to pile up in the top most level of a project, undoing most efforts\nto structure data and code. This is further compounded by the fact that there is\na tendency to remain within the working environment (document), and therefore \ncode blocks which should be functions are not translated as such.\n\nIn short, R markdown files have their function in reporting results, once\ngenerated (through functions or analysis scripts) but should be avoided to\ndevelop code / ideas (see cognitive switching remark)!\n\n### Capturing your session state\n\nIf you want to ensure full reproducibility you will need to capture the state of the system and libraries with which you ran the original analysis. Note that you will have to execute all code and required libraries for `renv` to correctly capture all used libraries.\n\nWhen setting up your project you can run:\n\n``` r\n# Initiate a {renv} environment\nrenv::init()\n```\n\nTo initiate your static R environment. Whenever you want to save the state of your project (and its packages) you can call:\n\n``` r\n# Save the current state of the environment / project\nrenv::snapshot()\n```\n\nTo save any changes made to your environment. All data will be saved in a project description file called a lock file (i.e. `renv.lock`). It is advised to update the state of your project regularly, and in particular before closing a project.\n\nWhen you move your project to a new system, or share a project on github with collaborators, you can revert to the original state of the analysis by calling:\n\n``` r\n# On a new system, or when inheriting a project\n# from a collaborator you can use a lock file\n# to restore the session/project state using\nrenv::restore()\n```\n\n\u003e NOTE: As mentioned in the {renv} documentation: \"For development and collaboration, the `.Rprofile`, `renv.lock` and `renv/activate.R` files should be committed to your version control system. But the `renv/library` directory should normally be ignored. Note that `renv::init()` will attempt to write the requisite ignore statements to the project `.gitignore`.\" We refer to \\@ref(learning-objectives-6) for details on github and its use.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeco-bern%2Fr_proj_template","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgeco-bern%2Fr_proj_template","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeco-bern%2Fr_proj_template/lists"}