https://github.com/mkearney/chr

🔤 Lightweight R package for manipulating [string] characters
https://github.com/mkearney/chr

character chr extract mkearney-r-package ngram r r-package regex rstats string-manipulation strings text-processing

Last synced: 2 months ago
JSON representation

🔤 Lightweight R package for manipulating [string] characters

Host: GitHub
URL: https://github.com/mkearney/chr
Owner: mkearney
License: other
Created: 2017-12-22T00:30:20.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-10-18T15:50:59.000Z (over 6 years ago)
Last Synced: 2025-03-26T05:41:53.492Z (3 months ago)
Topics: character, chr, extract, mkearney-r-package, ngram, r, r-package, regex, rstats, string-manipulation, strings, text-processing
Language: R
Homepage: https://chr.mikewk.com
Size: 715 KB
Stars: 17
Watchers: 4
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

jimsghstars - mkearney/chr - 🔤 Lightweight R package for manipulating [string] characters (R)

README

        ---

output: github_document

---

```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, comment = "#>")

library(chr)

```

# chr  

[![lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)

R package for simple string manipulation

## Description

Clean, wrangle, and parse character [string] vectors using base exclusively base

R functions.

## Install

```{r install, eval=FALSE}

## install devtools is not alreasy installed

if (!requireNamespace("devtools", quietly = TRUE)) {

  install.packages("devtools")

}

## install chr from github

devtools::install_github("mkearney/chr")

## load chr

library(chr)

```

## Usage

### Detect

**Detect** text patterns (an easy-to-use wrapper for `base::grep()` and `base::grepl()`).

```{r grep}

## return logical vector

chr_detect(letters, "a|b|c|x|y|z")

## return inverted logical values

chr_detect(letters, "a|b|c|x|y|z", invert = TRUE)

## return matching positions

chr_detect(letters, "a|b|c|x|y|z", which = TRUE)

## return inverted matching positions

chr_detect(letters, "a|b|c|x|y|z", which = TRUE, invert = TRUE)

## return matching values

chr_detect(letters, "a|b|c|x|y|z", value = TRUE)

## return inverted matching values

chr_detect(letters, "a|b|c|x|y|z", value = TRUE, invert = TRUE)

```

### Extract

**Extract** text patterns.

```{r extract}

## some text strings

x <- c("this one is @there

  has #MultipleLines https://github.com and 

  http://twitter.com @twitter",

  "this @one #istotally their and 

  some non-ascii symbols: \u00BF \u037E", 

  "this one is they're https://github.com", 

  "this one #HasHashtags #afew #ofthem", 

  "and more @kearneymw at https://mikew.com")

## extract all URLS

chr_extract_links(x)

## extract all hashtags

chr_extract_hashtags(x)

## extract mentions

chr_extract_mentions(x)

```

### Count

**Count** number of matches.

```{r count}

## extract all there/their/they're

chr_count(x, "there|their|they\\S?re", ignore.case = TRUE)

```

### Remove

**Remove** text patterns.

```{r remove}

## remove URLS

chr_remove_links(x)

## string together functions with magrittr pipe

library(magrittr)

## remove mentions and extra [white] spaces

chr_remove_mentions(x) %>%

  chr_remove_ws()

## remove hashtags

chr_remove_hashtags(x)

## remove hashtags, line breaks, and extra spaces

x %>%

  chr_remove_hashtags() %>%

  chr_remove_linebreaks() %>%

  chr_remove_ws()

## remove links and extract words

x %>%

  chr_remove_links() %>%

  chr_remove_mentions() %>%

  chr_extract_words()

```

### Replace

**Replace** text with string.

```{r replace}

## replace their with they're

chr_replace(x, "their", "they're", ignore.case = TRUE)

```

ASCII functions currently *in progress*. For example, replace non-ASCII symbols

with similar ASCII characters (*work in progress*).

```{r ascii}

## ascii version

chr_replace_nonascii(x)

```

### n-grams

Create **ngram**s at the character-level.

```{r ngrams}

## character vector

x <- c("Acme Pizza, Inc.", "Tom's Sports Equipment, LLC")

## 2 char level ngram

chr_ngram_char(x, n = 2L)

## 3 char level ngram in lower case and stripped of punctation and white space

chr_ngram_char(x, n = 3L, lower = TRUE, punct = TRUE, space = TRUE)

```

### Contributions

Please note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mkearney/chr

Awesome Lists containing this project

README