https://github.com/gesistsa/minty
🌿MINimal TYpe guesser
https://github.com/gesistsa/minty
Last synced: about 1 year ago
JSON representation
🌿MINimal TYpe guesser
- Host: GitHub
- URL: https://github.com/gesistsa/minty
- Owner: gesistsa
- License: other
- Created: 2024-03-12T10:03:28.000Z (over 2 years ago)
- Default Branch: v0.0
- Last Pushed: 2025-04-04T08:56:16.000Z (about 1 year ago)
- Last Synced: 2025-04-26T09:39:45.256Z (about 1 year ago)
- Language: R
- Homepage: https://gesistsa.github.io/minty/
- Size: 2.14 MB
- Stars: 7
- Watchers: 2
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# minty 
[](https://github.com/gesistsa/minty/actions/workflows/R-CMD-check.yaml)
[](https://CRAN.R-project.org/package=minty)
`minty` (**Min**imal **ty**pe guesser) is a package with the type inferencing and parsing tools (the so-called 1e parsing engine) extracted from `readr` (with permission, see this issue [tidyverse/readr#1517](https://github.com/tidyverse/readr/issues/1517)). Since July 2021, these tools are not used internally by `readr` for parsing text files. Now `vroom` is used by default, unless explicitly call the first edition parsing engine (see the explanation on [editions](https://github.com/tidyverse/readr?tab=readme-ov-file#editions)).
`readr`'s 1e type inferencing and parsing tools are used by various R packages, e.g. `readODS` and `surveytoolbox` for parsing in-memory objects, but those packages do not use the main functions (e.g. `readr::read_delim()`) of `readr`. As explained in the README of `readr`, those 1e code will be eventually removed from `readr`.
`minty` aims at providing a set of minimal, long-term, and compatible type inferencing and parsing tools for those packages. You might consider `minty` to be 1.5e parsing engine.
## Installation
You can install the development version of minty like so:
``` r
if (!require("remotes")){
install.packages("remotes")
}
remotes::install_github("gesistsa/minty")
```
## Example
A character-only data.frame
```{r}
text_only <- data.frame(maybe_age = c("17", "18", "019"),
maybe_male = c("true", "false", "true"),
maybe_name = c("AA", "BB", "CC"),
some_na = c("NA", "Not good", "Bad"),
dob = c("2019/07/21", "2019/08/31", "2019/10/01"))
str(text_only)
```
```{r}
## built-in function type.convert:
## except numeric, no type inferencing
str(type.convert(text_only, as.is = TRUE))
```
Inferencing the column types
```{r}
library(minty)
data <- type_convert(text_only)
data
```
```{r}
str(data)
```
### Type-based parsing tools
```{r}
parse_datetime("1979-10-14T10:11:12.12345")
```
```{r}
fr <- locale("fr")
parse_date("1 janv. 2010", "%d %b %Y", locale = fr)
```
```{r}
de <- locale("de", decimal_mark = ",")
parse_number("1.697,31", local = de)
```
```{r}
parse_number("$1,123,456.00")
```
```{r}
## This is perhaps Python
parse_logical(c("True", "False"))
```
### Type guesser
```{r}
parse_guess(c("True", "TRUE", "false", "F"))
```
```{r}
parse_guess(c("123.45", "1990", "7619.0"))
```
```{r}
res <- parse_guess(c("2019-07-21", "2019-08-31", "2019-10-01", "IDK"), na = "IDK")
res
```
```{r}
str(res)
```
## Differences: `readr` vs `minty`
Unlike `readr` and `vroom`, please note that `minty` is mainly for **non-interactive usage**. Therefore, `minty` emits fewer messages and warnings than `readr` and `vroom`.
```{r}
data <- minty::type_convert(text_only)
data
```
```{r}
data <- readr::type_convert(text_only)
data
```
`verbose` option is added if you like those messages, default to `FALSE`. To keep this package as minimal as possible, these optional messages are printed with base R (not `cli`).
```{r}
data <- minty::type_convert(text_only, verbose = TRUE)
```
At the moment, `minty` does not use [the `problems` mechanism](https://vroom.r-lib.org/reference/problems.html) by default.
```{r}
minty::parse_logical(c("true", "fake", "IDK"), na = "IDK")
```
```{r}
readr::parse_logical(c("true", "fake", "IDK"), na = "IDK")
```
Some features from `vroom` have been ported to `minty`, but not `readr`.
```{r}
## tidyverse/readr#1526
minty::type_convert(data.frame(a = c("NaN", "Inf", "-INF"))) |> str()
```
```{r}
readr::type_convert(data.frame(a = c("NaN", "Inf", "-INF"))) |> str()
```
`guess_max` is available for `parse_guess()` and `type_convert()`, default to `NA` (same as `readr`).
```{r}
minty::parse_guess(c("1", "2", "drei"))
```
```{r}
minty::parse_guess(c("1", "2", "drei"), guess_max = 2)
```
```{r}
readr::parse_guess(c("1", "2", "drei"))
```
For `parse_guess()` and `type_convert()`, `trim_ws` is considered before type guessing (the expected behavior of `vroom::vroom()` / `readr::read_delim()`).
```{r}
minty::parse_guess(c(" 1", " 2 ", " 3 "), trim_ws = TRUE)
```
```{r}
readr::parse_guess(c(" 1", " 2 ", " 3 "), trim_ws = TRUE)
```
```{r}
##tidyverse/readr#1536
minty::type_convert(data.frame(a = "1 ", b = " 2"), trim_ws = TRUE) |> str()
```
```{r}
readr::type_convert(data.frame(a = "1 ", b = " 2"), trim_ws = TRUE) |> str()
```
## Similar packages
For parsing ambiguous date(time)
* [timeless](https://github.com/schochastics/timeless)
* [anytime](https://github.com/eddelbuettel/anytime)
Guess column types of a text file
* [hdd](https://CRAN.R-project.org/package=hdd)
## Acknowledgements
Thanks to:
* The [Tidyverse Team](https://github.com/tidyverse) for allowing us to spin off the code from `readr`