Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mkearney/chr
🔤 Lightweight R package for manipulating [string] characters
https://github.com/mkearney/chr
character chr extract mkearney-r-package ngram r r-package regex rstats string-manipulation strings text-processing
Last synced: 27 days ago
JSON representation
🔤 Lightweight R package for manipulating [string] characters
- Host: GitHub
- URL: https://github.com/mkearney/chr
- Owner: mkearney
- License: other
- Created: 2017-12-22T00:30:20.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-10-18T15:50:59.000Z (about 6 years ago)
- Last Synced: 2024-08-13T07:14:35.194Z (4 months ago)
- Topics: character, chr, extract, mkearney-r-package, ngram, r, r-package, regex, rstats, string-manipulation, strings, text-processing
- Language: R
- Homepage: https://chr.mikewk.com
- Size: 715 KB
- Stars: 18
- Watchers: 5
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- jimsghstars - mkearney/chr - 🔤 Lightweight R package for manipulating [string] characters (R)
README
---
output: github_document
---```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, comment = "#>")
library(chr)
```# chr
[![lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
R package for simple string manipulation
## Description
Clean, wrangle, and parse character [string] vectors using base exclusively base
R functions.## Install
```{r install, eval=FALSE}
## install devtools is not alreasy installed
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}## install chr from github
devtools::install_github("mkearney/chr")## load chr
library(chr)
```## Usage
### Detect
**Detect** text patterns (an easy-to-use wrapper for `base::grep()` and `base::grepl()`).
```{r grep}
## return logical vector
chr_detect(letters, "a|b|c|x|y|z")## return inverted logical values
chr_detect(letters, "a|b|c|x|y|z", invert = TRUE)## return matching positions
chr_detect(letters, "a|b|c|x|y|z", which = TRUE)## return inverted matching positions
chr_detect(letters, "a|b|c|x|y|z", which = TRUE, invert = TRUE)## return matching values
chr_detect(letters, "a|b|c|x|y|z", value = TRUE)## return inverted matching values
chr_detect(letters, "a|b|c|x|y|z", value = TRUE, invert = TRUE)
```### Extract
**Extract** text patterns.
```{r extract}
## some text strings
x <- c("this one is @there
has #MultipleLines https://github.com and
http://twitter.com @twitter",
"this @one #istotally their and
some non-ascii symbols: \u00BF \u037E",
"this one is they're https://github.com",
"this one #HasHashtags #afew #ofthem",
"and more @kearneymw at https://mikew.com")## extract all URLS
chr_extract_links(x)## extract all hashtags
chr_extract_hashtags(x)## extract mentions
chr_extract_mentions(x)
```### Count
**Count** number of matches.
```{r count}
## extract all there/their/they're
chr_count(x, "there|their|they\\S?re", ignore.case = TRUE)
```### Remove
**Remove** text patterns.
```{r remove}
## remove URLS
chr_remove_links(x)## string together functions with magrittr pipe
library(magrittr)## remove mentions and extra [white] spaces
chr_remove_mentions(x) %>%
chr_remove_ws()## remove hashtags
chr_remove_hashtags(x)## remove hashtags, line breaks, and extra spaces
x %>%
chr_remove_hashtags() %>%
chr_remove_linebreaks() %>%
chr_remove_ws()## remove links and extract words
x %>%
chr_remove_links() %>%
chr_remove_mentions() %>%
chr_extract_words()
```### Replace
**Replace** text with string.
```{r replace}
## replace their with they're
chr_replace(x, "their", "they're", ignore.case = TRUE)
```ASCII functions currently *in progress*. For example, replace non-ASCII symbols
with similar ASCII characters (*work in progress*).```{r ascii}
## ascii version
chr_replace_nonascii(x)
```### n-grams
Create **ngram**s at the character-level.
```{r ngrams}
## character vector
x <- c("Acme Pizza, Inc.", "Tom's Sports Equipment, LLC")## 2 char level ngram
chr_ngram_char(x, n = 2L)## 3 char level ngram in lower case and stripped of punctation and white space
chr_ngram_char(x, n = 3L, lower = TRUE, punct = TRUE, space = TRUE)
```### Contributions
Please note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms.