https://github.com/openwashdata/glossarywho
Tidy data from the WHO Glossary (https://www.who.int/publications/i/item/9789240105485)
https://github.com/openwashdata/glossarywho
Last synced: 4 months ago
JSON representation
Tidy data from the WHO Glossary (https://www.who.int/publications/i/item/9789240105485)
- Host: GitHub
- URL: https://github.com/openwashdata/glossarywho
- Owner: openwashdata
- License: cc-by-4.0
- Created: 2025-01-28T10:52:34.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-03T13:54:44.000Z (over 1 year ago)
- Last Synced: 2025-09-04T22:16:36.027Z (9 months ago)
- Language: Jupyter Notebook
- Homepage: https://openwashdata.github.io/glossarywho/
- Size: 2.71 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
---
output: github_document
always_allow_html: true
editor_options:
markdown:
wrap: 72
chunk_output_type: console
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
message = FALSE,
warning = FALSE,
fig.retina = 2,
fig.align = 'center'
)
```
# glossarywho
[](https://creativecommons.org/licenses/by/4.0/)
The goal of glossarywho is to provide data from the [WHO Glossary](https://www.who.int/publications/i/item/9789240105485) in a tidy format. Access definitions by themes [here](https://openwashdata.github.io/glossarywho/articles/Themes.html) or use the search bar at the top of the page.
## Installation
You can install the development version of glossarywho from
[GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("openwashdata/glossarywho")
```
```{r}
## Run the following code in console if you don't have the packages
## install.packages(c("dplyr", "knitr", "readr", "stringr", "gt", "kableExtra"))
library(dplyr)
library(knitr)
library(readr)
library(stringr)
library(gt)
library(kableExtra)
```
Alternatively, you can download the individual datasets as a CSV or XLSX
file from the table below.
1. Click Download CSV. A window opens that displays the CSV in
your browser.
2. Right-click anywhere inside the window and select "Save Page As...".
3. Save the file in a folder of your choice.
```{r, echo=FALSE, message=FALSE, warning=FALSE}
extdata_path <- "https://github.com/openwashdata/glossarywho/raw/main/inst/extdata/"
read_csv("data-raw/dictionary.csv") |>
distinct(file_name) |>
dplyr::mutate(file_name = str_remove(file_name, ".rda")) |>
dplyr::rename(dataset = file_name) |>
mutate(
CSV = paste0("[Download CSV](", extdata_path, dataset, ".csv)"),
XLSX = paste0("[Download XLSX](", extdata_path, dataset, ".xlsx)")
) |>
knitr::kable()
```
## Data
The package provides access to glossary terms, definitions and thematic areas from the WHO Glossary. The datasets are: themes and definitions.
```{r}
library(glossarywho)
```
### definitions
The dataset `definitions` contains data about definitions from the WHO glossary It has
`r nrow(definitions)` observations and `r ncol(definitions)`
variables
```{r}
definitions |>
head(3) |>
gt::gt() |>
gt::as_raw_html()
```
### themes
The dataset `themes` contains data about thematic areas from the WHO glossary. It has
`r nrow(themes)` observations and `r ncol(themes)`
variables
```{r}
themes |>
head(3) |>
gt::gt() |>
gt::as_raw_html()
```
For an overview of the variable names, see the following table.
```{r echo=FALSE, message=FALSE, warning=FALSE}
readr::read_csv("data-raw/dictionary.csv") |>
dplyr::filter(file_name == "definitions.rda") |>
dplyr::select(variable_name:description) |>
knitr::kable() |>
kableExtra::kable_styling("striped") |>
kableExtra::scroll_box(height = "200px")
```
## Example
```{r}
library(glossarywho)
library(ggplot2)
library(tidyverse)
# Plot a bar chart of count of definitions by thematic areas
themes |>
count(`Thematic Area`) |>
ggplot2::ggplot(aes(x = fct_reorder(`Thematic Area`, n), y = n)) +
geom_col(fill = "skyblue") +
coord_flip() +
labs(title = "Count of definitions by thematic areas",
x = "Count",
y = "Thematic area") +
theme_minimal() +
theme(axis.text.y = element_text(size = 8)) +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank())
```
```{r}
# Wordcloud of most common words from definitions
library(wordcloud)
library(tm)
# Create a corpus
corpus <- Corpus(VectorSource(definitions$Description))
# Clean the corpus
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removeWords, stopwords("en"))
corpus <- tm_map(corpus, stripWhitespace)
# Create a document term matrix
dtm <- DocumentTermMatrix(corpus)
# Create a wordcloud
wordcloud(words = names(sort(colSums(as.matrix(dtm)), decreasing = TRUE)),
freq = colSums(as.matrix(dtm)),
min.freq = 1,
max.words = 100,
random.order = FALSE,
colors = brewer.pal(8, "Dark2"))
```
## License
Data are available as
[CC-BY](https://github.com/openwashdata/%7B%7B%7Bpackagename%7D%7D%7D/blob/main/LICENSE.md).
## Citation
Please cite this package using:
```{r}
citation("glossarywho")
```