Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/alistaire47/passport

Travel smoothly between country name and code formats
https://github.com/alistaire47/passport

country-codes country-data country-names package r

Last synced: 2 months ago
JSON representation

Travel smoothly between country name and code formats

Host: GitHub
URL: https://github.com/alistaire47/passport
Owner: alistaire47
License: other
Created: 2017-05-25T19:04:25.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2020-12-03T04:33:45.000Z (about 4 years ago)
Last Synced: 2024-08-03T06:03:41.364Z (6 months ago)
Topics: country-codes, country-data, country-names, package, r
Language: R
Homepage: https://alistaire47.github.io/passport/
Size: 3.28 MB
Stars: 34
Watchers: 4
Forks: 0
Open Issues: 6
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

README

        ---

tags: [r]

output: github_document

---

```{r setup, echo = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-"

)

```

# passport

[![Travis-CI Build Status](https://travis-ci.org/alistaire47/passport.svg?branch=master)](https://travis-ci.org/alistaire47/passport) 

[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/alistaire47/passport?branch=master&svg=true)](https://ci.appveyor.com/project/alistaire47/passport) 

[![Coverage Status](https://codecov.io/gh/alistaire47/passport/branch/master/graph/badge.svg)](https://codecov.io/gh/alistaire47/passport)

[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/passport)](https://cran.r-project.org/package=passport) 

`passport` smooths the process of working with country names and codes via

powerful parsing, standardization, and conversion utilities arranged in a

simple, consistent API. Country name formats include multiple sources including

the Unicode CLDR common-sense standardizations in hundreds of languages.

## Installation

Install from CRAN with

```{r install-cran, eval=FALSE}

install.packages("passport")

```

or the development version from GitHub with

```{r install-github, eval=FALSE}

# install.packages("remotes")

remotes::install_github("alistaire47/passport")

```

---

## Travel smoothly between country name and code formats

Working with country data can be frustrating. Even with well-curated data like 

[`gapminder`](https://github.com/jennybc/gapminder), there are some oddities:

```{r intro, message=FALSE}

library(passport)

library(gapminder)

library(dplyr)    # Works equally well in any grammar.

library(tidyr)

set.seed(47)

grep("Korea", unique(gapminder$country), value = TRUE)

grep("Yemen", unique(gapminder$country), value = TRUE)

```

`passport` offers a framework for working with country names and codes without 

manually editing data or scraping codes from Wikipedia.

### I. Standardize

If data has non-standardized names, standardize them to an ISO 3166-1 code 

or other standardized code or name with `parse_country`:

```{r standardize-1}

gap <- gapminder %>% 

    # standardize to ISO 3166 Alpha-2 code

    mutate(country_code = parse_country(country))

gap %>%

    select(country, country_code, year, lifeExp) %>%

    sample_n(10)

```

If country names are particularly irregular, in unsupported languages, or are 

even just unique location names, `parse_country` can use Google Maps or Data 

Science Toolkit geocoding APIs to parse instead of regex:

```{r standardize-2, eval=FALSE}

parse_country(c("somewhere in Japan", "日本", "Japon", "जापान"), how = "google")

#> [1] "JP" "JP" "JP" "JP"

parse_country(c("1600 Pennsylvania Ave, DC", "Eiffel Tower"), how = "google")

#> [1] "US" "FR"

```

### II. Convert

If data comes with countries already coded, 

- convert them to ISO or other codes with `as_country_code()`

- convert them to country names with `as_country_name()`

- convert them to other languages with `as_country_name()`

```{r convert-1, message = FALSE}

# NATO member defense expenditure data; see `?nato`

data("nato", package = "passport")

nato %>% 

    select(country_stanag) %>% 

    distinct() %>%

    mutate(

        country_iso = as_country_code(country_stanag, from = "stanag"),

        country_name = as_country_name(country_stanag, from = "stanag", short = FALSE),

        country_name_thai = as_country_name(country_stanag, from = "stanag", to = "ta-my")

    )

```

Language formats largely follow [IETF language tag BCP

47](https://en.wikipedia.org/wiki/IETF_language_tag) format. For all available

formats, run `DT::datatable(codes)` for an interactive widget of format names

and further information.

### III. Format

A particularly common hangup with country data is presentation. While 

"Yemen, Rep." may be fine for exploratory work, to create a plot to share, 

such names need to be changed to something more palatable either by editing 

the data or manually overriding the labels directly on the plot.

If the existing format is already standardized, `passport` offers another 

option: use a formatter function created with `country_format`, just like for 

thousands separators or currency formatting. Reorder simply with 

`order_countries`:

```{r format, dpi=300}

library(ggplot2)

living_longer <- gap %>% 

    group_by(country_code) %>% 

    summarise(start_life_exp = lifeExp[which.min(year)], 

              stop_life_exp = lifeExp[which.max(year)], 

              diff_life_exp = stop_life_exp - start_life_exp) %>% 

    top_n(10, diff_life_exp) 

# Plot country codes...

ggplot(living_longer, aes(x = country_code, y = stop_life_exp - 3.3,

                          ymin = start_life_exp, 

                          ymax = stop_life_exp - 3.3, 

                          colour = factor(diff_life_exp))) + 

    geom_point(pch = 17, size = 15) + 

    geom_linerange(size = 10) + 

                     # ...just pass `labels` a formatter function!

    scale_x_discrete(labels = country_format(),

                     # Easily change order

                     limits = order_countries(living_longer$country_code, 

                                              living_longer$diff_life_exp)) + 

    scale_y_continuous(limits = c(30, 80)) + 

    labs(title = "Life gets better",

         subtitle = "Largest increase in life expectancy",

         x = NULL, y = "Life expectancy") + 

    theme(axis.text.x = element_text(angle = 30, hjust = 1), 

          legend.position = "none")

```

By default `country_format` will use Unicode CLDR (see below) English names, 

which are intelligible and suitable for most purposes. If desired, other 

languages or formats can be specified just like in `as_country_name`.

---

## Data

The data underlying `passport` comes from a number of sources, including

- [The Unicode Common Locale Data Repository (CLDR)

Project](http://cldr.unicode.org/) supplies country names in many, many

languages, from Afrikaans to Zulu. Even better, [CLDR aspires to use the most

customary name](http://cldr.unicode.org/translation/displaynames/country-names) instead of

formal or official ones, e.g. "Switzerland" instead of "Swiss Confederation".

- [The United Nations Statistics 

Division](https://unstats.un.org/unsd/methodology/m49/overview/) maintains and 

publishes the M.49 region code and the UN geoscheme region codes and names.

- [The CIA World 

Factbook](https://www.cia.gov/library/publications/the-world-factbook/index.html)

supplies a standardized set of names and codes. 

- [The National Geospatial-Intelligence Agency 

(NGA)](http://geonames.nga.mil/gns/html/countrycodes.html) is the organization 

responsible for standardizing US government use of country codes. It inherited 

the now-deprecated FIPS 10-4 from NIST, which it turned into the GEC, which is 

now also deprecated in favor of GENC, a US government profile of ISO 3166.

- [Wikipedia](https://en.wikipedia.org/wiki/Category:Lists_of_country_codes) 

offers a rich set of country codes, some of which are aggregated here.

- Open Knowledge International's Frictionless Data supplies [a set of codes 

collated from a number of sources](https://www.datahub.io/core/country-codes) on 

datahub.io.

- The regex powering `parse_country()` are from

[`countrycode`](https://github.com/vincentarelbundock/countrycode). If you 

would like to improve both packages, please contribute regex there!

## Licensing

`passport` is licensed as open-source software under

[GPL-3](https://www.gnu.org/licenses/gpl.html). Unicode CLDR data is licensed 

according to [its own 

license](https://github.com/unicode-cldr/cldr-json/blob/master/LICENSE), a copy 

of which is included. `countrycode` regex are used as a modification under 

GPL-3; see the included aggregation script for modifying code and date.