https://github.com/openwashdata/glossarywho

Tidy data from the WHO Glossary (https://www.who.int/publications/i/item/9789240105485)
https://github.com/openwashdata/glossarywho

Last synced: 6 months ago
JSON representation

Tidy data from the WHO Glossary (https://www.who.int/publications/i/item/9789240105485)

Host: GitHub
URL: https://github.com/openwashdata/glossarywho
Owner: openwashdata
License: cc-by-4.0
Created: 2025-01-28T10:52:34.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-02-03T13:54:44.000Z (over 1 year ago)
Last Synced: 2025-09-04T22:16:36.027Z (10 months ago)
Language: Jupyter Notebook
Homepage: https://openwashdata.github.io/glossarywho/
Size: 2.71 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
- Citation: CITATION.cff

Awesome Lists containing this project

README

          ---

output: github_document

always_allow_html: true

editor_options: 

  markdown: 

    wrap: 72

  chunk_output_type: console

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%",

  message = FALSE,

  warning = FALSE,

  fig.retina = 2,

  fig.align = 'center'

)

```

# glossarywho

[![License: CC BY

4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)

The goal of glossarywho is to provide data from the [WHO Glossary](https://www.who.int/publications/i/item/9789240105485) in a tidy format. Access definitions by themes [here](https://openwashdata.github.io/glossarywho/articles/Themes.html) or use the search bar at the top of the page.

## Installation

You can install the development version of glossarywho from

[GitHub](https://github.com/) with:

``` r

# install.packages("devtools")

devtools::install_github("openwashdata/glossarywho")

```

```{r}

## Run the following code in console if you don't have the packages

## install.packages(c("dplyr", "knitr", "readr", "stringr", "gt", "kableExtra"))

library(dplyr)

library(knitr)

library(readr)

library(stringr)

library(gt)

library(kableExtra)

```

Alternatively, you can download the individual datasets as a CSV or XLSX

file from the table below.

1.  Click Download CSV. A window opens that displays the CSV in

    your browser.

2.  Right-click anywhere inside the window and select "Save Page As...".

3.  Save the file in a folder of your choice.

```{r, echo=FALSE, message=FALSE, warning=FALSE}

extdata_path <- "https://github.com/openwashdata/glossarywho/raw/main/inst/extdata/"

read_csv("data-raw/dictionary.csv") |> 

  distinct(file_name) |> 

  dplyr::mutate(file_name = str_remove(file_name, ".rda")) |> 

  dplyr::rename(dataset = file_name) |> 

  mutate(

    CSV = paste0("[Download CSV](", extdata_path, dataset, ".csv)"),

    XLSX = paste0("[Download XLSX](", extdata_path, dataset, ".xlsx)")

  ) |> 

  knitr::kable()

```

## Data

The package provides access to glossary terms, definitions and thematic areas from the WHO Glossary. The datasets are: themes and definitions.

```{r}

library(glossarywho)

```

### definitions

The dataset `definitions` contains data about definitions from the WHO glossary It has

`r nrow(definitions)` observations and `r ncol(definitions)`

variables

```{r}

definitions |> 

  head(3) |> 

  gt::gt() |>

  gt::as_raw_html()

```

### themes

The dataset `themes` contains data about thematic areas from the WHO glossary. It has

`r nrow(themes)` observations and `r ncol(themes)`

variables

```{r}

themes |> 

  head(3) |> 

  gt::gt() |>

  gt::as_raw_html()

```

For an overview of the variable names, see the following table.

```{r echo=FALSE, message=FALSE, warning=FALSE}

readr::read_csv("data-raw/dictionary.csv") |>

  dplyr::filter(file_name == "definitions.rda") |>

  dplyr::select(variable_name:description) |> 

  knitr::kable() |> 

  kableExtra::kable_styling("striped") |> 

  kableExtra::scroll_box(height = "200px")

```

## Example

```{r}

library(glossarywho)

library(ggplot2)

library(tidyverse)

# Plot a bar chart of count of definitions by thematic areas 

themes |> 

  count(`Thematic Area`) |> 

  ggplot2::ggplot(aes(x = fct_reorder(`Thematic Area`, n), y = n)) +

  geom_col(fill = "skyblue") +

  coord_flip() +

  labs(title = "Count of definitions by thematic areas",

       x = "Count",

       y = "Thematic area") +

  theme_minimal() +

  theme(axis.text.y = element_text(size = 8)) +

  theme(panel.grid.major.y = element_blank(),

         panel.grid.minor.y = element_blank())

```

```{r}

# Wordcloud of most common words from definitions

library(wordcloud)

library(tm)

# Create a corpus

corpus <- Corpus(VectorSource(definitions$Description))

# Clean the corpus

corpus <- tm_map(corpus, content_transformer(tolower))

corpus <- tm_map(corpus, removePunctuation)

corpus <- tm_map(corpus, removeNumbers)

corpus <- tm_map(corpus, removeWords, stopwords("en"))

corpus <- tm_map(corpus, stripWhitespace)

# Create a document term matrix

dtm <- DocumentTermMatrix(corpus)

# Create a wordcloud

wordcloud(words = names(sort(colSums(as.matrix(dtm)), decreasing = TRUE)),

          freq = colSums(as.matrix(dtm)),

          min.freq = 1,

          max.words = 100,

          random.order = FALSE,

          colors = brewer.pal(8, "Dark2"))

```

## License

Data are available as

[CC-BY](https://github.com/openwashdata/%7B%7B%7Bpackagename%7D%7D%7D/blob/main/LICENSE.md).

## Citation

Please cite this package using:

```{r}

citation("glossarywho")

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/openwashdata/glossarywho

Awesome Lists containing this project

README