https://github.com/gesistsa/minty

🌿MINimal TYpe guesser
https://github.com/gesistsa/minty

Last synced: about 1 year ago
JSON representation

🌿MINimal TYpe guesser

Host: GitHub
URL: https://github.com/gesistsa/minty
Owner: gesistsa
License: other
Created: 2024-03-12T10:03:28.000Z (over 2 years ago)
Default Branch: v0.0
Last Pushed: 2025-04-04T08:56:16.000Z (about 1 year ago)
Last Synced: 2025-04-26T09:39:45.256Z (about 1 year ago)
Language: R
Homepage: https://gesistsa.github.io/minty/
Size: 2.14 MB
Stars: 7
Watchers: 2
Forks: 0
Open Issues: 4
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

README

          ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# minty 

[![R-CMD-check](https://github.com/gesistsa/minty/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/gesistsa/minty/actions/workflows/R-CMD-check.yaml)

[![CRAN status](https://www.r-pkg.org/badges/version/minty)](https://CRAN.R-project.org/package=minty)

`minty` (**Min**imal **ty**pe guesser) is a package with the type inferencing and parsing tools (the so-called 1e parsing engine) extracted from `readr` (with permission, see this issue [tidyverse/readr#1517](https://github.com/tidyverse/readr/issues/1517)). Since July 2021, these tools are not used internally by `readr` for parsing text files. Now `vroom` is used by default, unless explicitly call the first edition parsing engine (see the explanation on [editions](https://github.com/tidyverse/readr?tab=readme-ov-file#editions)).

`readr`'s 1e type inferencing and parsing tools are used by various R packages, e.g. `readODS` and `surveytoolbox` for parsing in-memory objects, but those packages do not use the main functions (e.g. `readr::read_delim()`) of `readr`. As explained in the README of `readr`, those 1e code will be eventually removed from `readr`.

`minty` aims at providing a set of minimal, long-term, and compatible type inferencing and parsing tools for those packages. You might consider `minty` to be 1.5e parsing engine.

## Installation

You can install the development version of minty like so:

``` r

if (!require("remotes")){

    install.packages("remotes")

}

remotes::install_github("gesistsa/minty")

```

## Example

A character-only data.frame

```{r}

text_only <- data.frame(maybe_age = c("17", "18", "019"),

                        maybe_male = c("true", "false", "true"),

                        maybe_name = c("AA", "BB", "CC"),

                        some_na = c("NA", "Not good", "Bad"),

                        dob = c("2019/07/21", "2019/08/31", "2019/10/01"))

str(text_only)

```

```{r}

## built-in function type.convert:

## except numeric, no type inferencing

str(type.convert(text_only, as.is = TRUE))

```

Inferencing the column types

```{r}

library(minty)

data <- type_convert(text_only)

data

```

```{r}

str(data)

```

### Type-based parsing tools

```{r}

parse_datetime("1979-10-14T10:11:12.12345")

```

```{r}

fr <- locale("fr")

parse_date("1 janv. 2010", "%d %b %Y", locale = fr)

```

```{r}

de <- locale("de", decimal_mark = ",")

parse_number("1.697,31", local = de)

```

```{r}

parse_number("$1,123,456.00")

```

```{r}

## This is perhaps Python

parse_logical(c("True", "False"))

```

### Type guesser

```{r}

parse_guess(c("True", "TRUE", "false", "F"))

```

```{r}

parse_guess(c("123.45", "1990", "7619.0"))

```

```{r}

res <- parse_guess(c("2019-07-21", "2019-08-31", "2019-10-01", "IDK"), na = "IDK")

res

```

```{r}

str(res)

```

## Differences: `readr` vs `minty`

Unlike `readr` and `vroom`, please note that `minty` is mainly for **non-interactive usage**. Therefore, `minty` emits fewer messages and warnings than `readr` and `vroom`.

```{r}

data <- minty::type_convert(text_only)

data

```

```{r}

data <- readr::type_convert(text_only)

data

```

`verbose` option is added if you like those messages, default to `FALSE`. To keep this package as minimal as possible, these optional messages are printed with base R (not `cli`).

```{r}

data <- minty::type_convert(text_only, verbose = TRUE)

```

At the moment, `minty` does not use [the `problems` mechanism](https://vroom.r-lib.org/reference/problems.html) by default.

```{r}

minty::parse_logical(c("true", "fake", "IDK"), na = "IDK")

```

```{r}

readr::parse_logical(c("true", "fake", "IDK"), na = "IDK")

```

Some features from `vroom` have been ported to `minty`, but not `readr`.

```{r}

## tidyverse/readr#1526

minty::type_convert(data.frame(a = c("NaN", "Inf", "-INF"))) |> str()

```

```{r}

readr::type_convert(data.frame(a = c("NaN", "Inf", "-INF"))) |> str()

```

`guess_max` is available for `parse_guess()` and `type_convert()`, default to `NA` (same as `readr`).

```{r}

minty::parse_guess(c("1", "2", "drei"))

```

```{r}

minty::parse_guess(c("1", "2", "drei"), guess_max = 2)

```

```{r}

readr::parse_guess(c("1", "2", "drei"))

```

For `parse_guess()` and `type_convert()`, `trim_ws` is considered before type guessing (the expected behavior of `vroom::vroom()` / `readr::read_delim()`).

```{r}

minty::parse_guess(c("   1", " 2 ", " 3  "), trim_ws = TRUE)

```

```{r}

readr::parse_guess(c("   1", " 2 ", " 3  "), trim_ws = TRUE)

```

```{r}

##tidyverse/readr#1536

minty::type_convert(data.frame(a = "1 ", b = " 2"), trim_ws = TRUE) |> str()

```

```{r}

readr::type_convert(data.frame(a = "1 ", b = " 2"), trim_ws = TRUE) |> str()

```

## Similar packages

For parsing ambiguous date(time)

* [timeless](https://github.com/schochastics/timeless)

* [anytime](https://github.com/eddelbuettel/anytime)

Guess column types of a text file

* [hdd](https://CRAN.R-project.org/package=hdd)

## Acknowledgements

Thanks to:

* The [Tidyverse Team](https://github.com/tidyverse) for allowing us to spin off the code from `readr`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gesistsa/minty

Awesome Lists containing this project

README