{"id":19951419,"url":"https://github.com/girishji/re2","last_synced_at":"2025-04-09T09:09:46.533Z","repository":{"id":56924456,"uuid":"352336379","full_name":"girishji/re2","owner":"girishji","description":"R interface to Google re2 (C++) regular expression engine ","archived":false,"fork":false,"pushed_at":"2025-01-19T20:02:44.000Z","size":595,"stargazers_count":30,"open_issues_count":0,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-02T07:11:16.036Z","etag":null,"topics":["r","re2","regex","regex-engine","regexp"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/girishji.png","metadata":{"files":{"readme":"README.Rmd","changelog":"ChangeLog","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-28T13:26:52.000Z","updated_at":"2025-03-22T10:43:40.000Z","dependencies_parsed_at":"2025-04-02T07:10:44.752Z","dependency_job_id":"436c4460-ef83-47ce-9413-0c7dfedcb5d8","html_url":"https://github.com/girishji/re2","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/girishji%2Fre2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/girishji%2Fre2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/girishji%2Fre2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/girishji%2Fre2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/girishji","download_url":"https://codeload.github.com/girishji/re2/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248008630,"owners_count":21032556,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["r","re2","regex","regex-engine","regexp"],"created_at":"2024-11-13T01:07:57.548Z","updated_at":"2025-04-09T09:09:46.510Z","avatar_url":"https://github.com/girishji.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\u003c!-- Keep this file sync'ed with vignette --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"README-\"\n)\nlibrary(re2)\n```\n\n# re2: R interface to Google RE2\n\n## Overview \n\nre2 package provides pattern matching, extraction, replacement and other string processing operations using Google's [RE2](https://github.com/google/re2) (C++) regular-expression library. The interface is consistent, and similar to [stringr](https://github.com/tidyverse/stringr).\n\nWhy re2?\n\nRegular expression matching can be done in two ways: using recursive\nbacktracking or using finite automata-based techniques.\n\nPerl, PCRE, Python, Ruby, Java, and many other languages rely on\nrecursive backtracking for their regular expression implementations.\nThe problem with this approach is that performance can degrade very \nquickly. Time complexity can be exponential.\nIn contrast, re2 uses finite automata-based techniques for regular \nexpression matching,\nguaranteeing linear time execution and a fixed stack footprint. See \nlinks to Russ Cox's excellent articles below.\n\n\n## Installation\n\n```r\n# Install the released version from CRAN:\ninstall.packages(\"re2\")\n\n# Install the development version from GitHub:\n# install.packages(\"devtools\")\ndevtools::install_github(\"girishji/re2\")\n```\n\n## Usage\n\nre2 provides three types of regular-expression functions:\n\n- Find the presence of a pattern in string\n- Extract substrings that match a pattern\n- Replace matched groups\n\nAll functions take a vector of strings as argument. Regular-expression patterns can be compiled, and reused for performance.\n\nHere are the primary verbs of re2:\n\n* `re2_detect(x, pattern)` finds if a pattern is present in string\n```{r}\nre2_detect(c(\"barbazbla\", \"foobar\", \"foxy brown\"), \"(foo)|(bar)baz\")\n```\n\n* `re2_count(x, pattern)` counts the number of matches in string\n\n```{r}\nre2_count(c(\"yellowgreen\", \"steelblue\", \"maroon\"), \"e\")\n```\n\n* `re2_subset(x, pattern)` selects strings that match\n```{r}\nre2_subset(c(\"yellowgreen\", \"steelblue\", \"goldenrod\"), \"ee\")\n```\n\n* `re2_match(x, pattern, simplify = FALSE)` extracts first matched substring\n\n```{r}\nre2_match(\"ruby:1234 68 red:92 blue:\", \"(\\\\w+):(\\\\d+)\")\n```\n```{r}\n# Groups can be named:\n\nre2_match(c(\"barbazbla\", \"foobar\"), \"(foo)|(?P\u003cTestGroup\u003ebar)baz\")\n```\n```{r}\n# Use pre-compiled regular expression:\n\nre \u003c- re2_regexp(\"(foo)|(bar)baz\", case_sensitive = FALSE)\nre2_match(c(\"BaRbazbla\", \"Foobar\"), re)\n```\n\n* `re2_match_all(x, pattern)` extracts all matched substrings\n```{r}\nre2_match_all(\"ruby:1234 68 red:92 blue:\", \"(\\\\w+):(\\\\d+)\")\n```\n\n* `re2_replace(x, pattern, rewrite)` replaces first matched pattern in string\n```{r}\nre2_replace(\"yabba dabba doo\", \"b+\", \"d\")\n```\n```{r}\n# Use groups in rewrite:\n\nre2_replace(\"bunny@wunnies.pl\", \"(.*)@([^.]*)\", \"\\\\2!\\\\1\")\n```\n\n* `re2_replace_all(x, pattern, rewrite)` replaces all matched patterns in string, or performs multiple replacements on each element of string.\n```{r}\nre2_replace_all(\"yabba dabba doo\", \"b+\", \"d\")\n# Multiple replacements\nre2_replace_all(c(\"one\", \"two\"), c(\"one\" = \"1\", \"1\" = \"2\", \"two\" = \"2\"))\n```\n\n* `re2_extract_replace(x, pattern, rewrite)` extracts and substitutes (ignores non-matching portions of x)\n```{r}\nre2_extract_replace(\"bunny@wunnies.pl\", \"(.*)@([^.]*)\", \"\\\\2!\\\\1\")\n```\n\n* `re2_split(x, pattern, simplify = FALSE, n = Inf)` splits string based on pattern\n```{r}\nre2_split(\"How vexingly quick daft zebras jump!\", \" quick | zebras\")\n```\n\n* `re2_locate(x, pattern)` seeks the start and end of pattern in string\n```{r}\nre2_locate(c(\"yellowgreen\", \"steelblue\"), \"l(b)?l\")\n```\n\n* `re2_locate_all(x, pattern)` locates start and end of all occurrences of pattern in string\n```{r}\nre2_locate_all(c(\"yellowgreen\", \"steelblue\"), \"l\")\n```\n\nIn all the above functions, regular-expression pattern is vectorized. \n\nRegular-expression pattern can be compiled using `re2_regexp(pattern, ...)`. Here are some of the options:\n\n* `case_sensitive`: Match is case-sensitive\n* `encoding`: UTF8 or Latin1\n* `literal`: Interpret pattern as literal, not regexp\n* `longest_match`: Search for longest match, not first match\n* `posix_syntax`: Restrict regexps to POSIX egrep syntax\n\n`help(re2_regexp)` lists available options.\n\n`re2_get_options(regexp_ptr)` returns a list of options stored\nin the compiled regular-expression object.\n\n## Regexp Syntax\n\nre2 supports pearl style regular expressions (with extensions like\n\\\\d, \\\\w, \\\\s, ...) and provides most of the functionality of\nPCRE -- eschewing only backreferences and look-around\nassertions. \n    \nSee [RE2 Syntax](https://github.com/girishji/re2/wiki/Syntax) for the syntax supported by RE2, and a comparison with PCRE and PERL regexps.\n\nFor those not familiar with Perl's regular expressions,\nhere are some examples of the most commonly used extensions:\n\n|     |     |\n| --- | --- |\n| `\"hello (\\\\w+) world\"`  | `\\w` matches a \"word\" character |\n| `\"version (\\\\d+)\"`      | `\\d` matches a digit |\n| `\"hello\\\\s+world\"`      | `\\s` matches any whitespace character |\n| `\"\\\\b(\\\\w+)\\\\b\"`        | `\\b` matches non-empty string at word boundary |\n| `\"(?i)hello\"`           | `(?i)` turns on case-insensitive matching |\n| `\"/\\\\*(.*?)\\\\*/\"`       | `.*?` matches . minimum no. of times possible |\n\nThe double backslashes are needed when writing R string literals.\nHowever, they should not be used when writing raw string literals:\n\n|     |     |\n| --- | --- |\n| `r\"(hello (\\w+) world)\"`  | `\\w` matches a \"word\" character |\n| `r\"(version (\\d+))\"`      | `\\d` matches a digit |\n| `r\"(hello\\s+world)\"`      | `\\s` matches any whitespace character |\n| `r\"(\\b(\\w+)\\b)\"`          | `\\b` matches non-empty string at word boundary |\n| `r\"((?i)hello)\"`          | `(?i)` turns on case-insensitive matching |\n| `r\"(/\\*(.*?)\\*/)\"`        | `.*?` matches `.` minimum no. of times possible |\n\n\n## References\n\n* [Regular Expression Matching Can Be Simple And Fast](https://swtch.com/~rsc/regexp/regexp1.html)\n* [Regular Expression Matching: the Virtual Machine Approach](https://swtch.com/~rsc/regexp/regexp2.html)\n* [Regular Expression Matching in the Wild](https://swtch.com/~rsc/regexp/regexp3.html)\n* [RE2 Syntax](https://github.com/google/re2/wiki/Syntax)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgirishji%2Fre2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgirishji%2Fre2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgirishji%2Fre2/lists"}