https://github.com/fhdsl/sota2024_reportout
Materials associated with the Report Out for the State of the AnVIL 2024
https://github.com/fhdsl/sota2024_reportout
Last synced: 4 months ago
JSON representation
Materials associated with the Report Out for the State of the AnVIL 2024
- Host: GitHub
- URL: https://github.com/fhdsl/sota2024_reportout
- Owner: fhdsl
- Created: 2025-10-09T22:47:37.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-11-14T16:35:31.000Z (7 months ago)
- Last Synced: 2025-11-14T18:24:51.625Z (7 months ago)
- Language: HTML
- Size: 14.5 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SOTA2024_ReportOut
[](https://doi.org/10.5281/zenodo.17611423)
This repository contains all of the code to reproduce the analysis done for the State of the AnVIL 2024 Poll.
## Directory Structure:
### data
Raw data for this project is in a password protected, controlled access shared Google Drive because it contains some identifying information. This data is processed and de-identified and made available within the `wrangled_data` subdirectory.
#### annotations
These are codebook files created by the analysts explaining the columns in the raw data as well as possible values and dictionaries to categorize certain columns (e.g., institution).
* `codebook.txt`: codebook relating to raw data
* `controlledAccessData_codebook.txt`: Controlled access data mentioned in the poll as well as whether AnVIL hosts it.
* `institution_codebook.txt`: institutions and simplified categorization
#### wrangled_data
* `resultsTidy.rds`: wrangled data saved from `1_TidyData.Rmd` (with identifying information of email and raw institutional affiliation removed)
* `resultsTidy_personas.rds`: wrangled data saved from `2_PersonaStats.Rmd`
### analyses
* `1_TidyData.Rmd`: Fetching of Raw Data and wrangling steps for later analysis to create a de-identified tidy data file.
* `2_PersonaStats.Rmd`: Identification of personas and joining of persona categorization with tidy data.
* `3_MainAnalysis.Rmd`: Main analysis and plotting driver
* `4_Stats.Rmd`: Code to support all stated stats/general observations in the report out that aren't directly observed from plots/figures. Description of format for this:
* Chronological order of statements and sections aligning with layout of the preprint
* For each section, if there's a table that is used to support multiple statements, table is constructed within an expandable details section prior to any direct statements from the preprint
* For each statement, there's a section separator and the specific statement, followed by an expandable details section with code to show the support for the statement.
* `5_PCA.Rmd`: Performs PCA analysis for all respondents after subsetting and wrangling the data
### reports
This directory contains corresponding knit HTML files for each of the R Markdown files in the `analyses` directory and the figure creation R Markdown in the `figures` directory.
### resources
* `scripts/shared_functions.R`: some functions used repeatedly in analysis or for plotting
* `plots/`: plots from the main analysis saved as png files
* `supplemental_material/`: Includes the complete poll, supplementary Table 1 (relation of study aims and poll questions), and supplementary Table 2 (raw responses translated to awareness and use)
### figures
* `figureCreation.Rmd`: Uses `patchwork` to combine plots from `3_MainAnalysis.Rmd`to make figure panels and adjusts aesthetics as necessary.
* The figure panels themselves are saved as png files within this directory as well
## Other notes:
* Preprint information
* A poster presented at the AnVIL Community Conference 2025
* A companion website information
* AnVIL Collection and other outreach information