{"id":13423829,"url":"https://github.com/qinwf/re2r","last_synced_at":"2025-12-25T12:38:48.183Z","repository":{"id":81815073,"uuid":"49948299","full_name":"qinwf/re2r","owner":"qinwf","description":"RE2 Regular Expression in R. ","archived":false,"fork":false,"pushed_at":"2020-03-13T13:25:34.000Z","size":1104,"stargazers_count":99,"open_issues_count":20,"forks_count":15,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-10-26T23:12:41.639Z","etag":null,"topics":["r","re2","regular-expression"],"latest_commit_sha":null,"homepage":"https://qinwenfeng.com/re2r_doc","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qinwf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-01-19T11:29:20.000Z","updated_at":"2024-09-29T22:36:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"9f87559e-3ef1-4d16-a0c6-2e909d04c682","html_url":"https://github.com/qinwf/re2r","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qinwf%2Fre2r","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qinwf%2Fre2r/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qinwf%2Fre2r/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qinwf%2Fre2r/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qinwf","download_url":"https://codeload.github.com/qinwf/re2r/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243767370,"owners_count":20344913,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["r","re2","regular-expression"],"created_at":"2024-07-31T00:00:43.427Z","updated_at":"2025-12-25T12:38:48.134Z","avatar_url":"https://github.com/qinwf.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"re2r\n====\n\n[![Build Status](https://travis-ci.org/qinwf/re2r.svg?branch=master)](https://travis-ci.org/qinwf/re2r) [![Build status](https://ci.appveyor.com/api/projects/status/n34unrvurpv18si5/branch/master?svg=true)](https://ci.appveyor.com/project/qinwf/re2r/branch/master) [![CRAN\\_Status\\_Badge](http://www.r-pkg.org/badges/version/re2r)](http://cran.r-project.org/package=re2r) [![codecov](https://codecov.io/gh/qinwf/re2r/branch/master/graph/badge.svg)](https://codecov.io/gh/qinwf/re2r)\n\nRE2 is a primarily DFA based regexp engine from Google that is very fast at matching large amounts of text.\n\nInstallation\n------------\n\nFrom CRAN:\n\n``` r\ninstall.packages(\"re2r\")\n```\n\nFrom GitHub:\n\n``` r\nlibrary(devtools)\ninstall_github(\"qinwf/re2r\", build_vignettes = T)\n```\n\nTo learn how to use, you can check out the [vignettes](https://qinwenfeng.com/re2r_doc/).\n\nRelated Work\n------------\n\n[Google Summer of Code](https://github.com/rstats-gsoc/gsoc2016/wiki/re2-regular-expressions) - re2 regular expressions.\n\nBrief Intro\n-----------\n\n### 1. Search a string for a pattern\n\n`re2_detect(string, pattern)` searches the string expression for a pattern and returns boolean result.\n\n``` r\ntest_string = \"this is just one test\";\nre2_detect(test_string, \"(o.e)\")\n```\n\n    ## [1] TRUE\n\nHere is an example of email pattern.\n\n``` r\nshow_regex(\"\\\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\\\.[a-zA-Z]{2,4}\\\\b\", width = 670, height = 280)\n```\n\n![email pattern](https://raw.githubusercontent.com/qinwf/re2r/master/inst/img/email.png)\n\n``` r\nre2_detect(\"test@gmail.com\", \"\\\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\\\.[a-zA-Z]{2,4}\\\\b\")\n```\n\n    ## [1] TRUE\n\n`re2_match(string, pattern)` will return the capture groups in `()`.\n\n``` r\n(res = re2_match(test_string, \"(o.e)\"))\n```\n\n    ##      .match .1   \n    ## [1,] \"one\"  \"one\"\n\nThe return result is a character matrix. `.1` is the first capture group and it is unnamed group.\n\nCreate named capture group with `(?P\u003cname\u003epattern)` syntax.\n\n``` r\n(res = re2_match(test_string, \"(?P\u003ctestname\u003ethis)( is)\"))\n```\n\n    ##      .match    testname .2   \n    ## [1,] \"this is\" \"this\"   \" is\"\n\n``` r\nis.matrix(res)\n```\n\n    ## [1] TRUE\n\n``` r\nis.character(res)\n```\n\n    ## [1] TRUE\n\n``` r\nres$testname\n```\n\n    ## testname \n    ##   \"this\"\n\nIf there is no capture group, the matched origin strings will be returned.\n\n``` r\ntest_string = c(\"this is just one test\", \"the second test\");\n(res = re2_match(test_string, \"is\"))\n```\n\n    ##      .match\n    ## [1,] \"is\"  \n    ## [2,] NA\n\n`re2_match_all()` will return the all of patterns in a string instead of just the first one.\n\n``` r\nres = re2_match_all(c(\"this is test\", \n            \"this is test, and this is not test\", \n            \"they are tests\"), \n          pattern = \"(?P\u003ctestname\u003ethis)( is)\")\nprint(res)\n```\n\n    ## [[1]]\n    ##      .match    testname .2   \n    ## [1,] \"this is\" \"this\"   \" is\"\n    ## \n    ## [[2]]\n    ##      .match    testname .2   \n    ## [1,] \"this is\" \"this\"   \" is\"\n    ## [2,] \"this is\" \"this\"   \" is\"\n    ## \n    ## [[3]]\n    ##      .match testname .2\n\n``` r\nis.list(res)\n```\n\n    ## [1] TRUE\n\nmatch all numbers\n\n``` r\ntexts = c(\"pi is 3.14529..\",\n          \"-15.34 °F\",\n          \"128 days\",\n          \"1.9e10\",\n          \"123,340.00$\",\n          \"only texts\")\n(number_pattern = re2(\".*?(?P\u003cnumber\u003e-?\\\\d+(,\\\\d+)*(\\\\.\\\\d+(e\\\\d+)?)?).*?\"))\n```\n\n    ## re2 pre-compiled regular expression\n    ## \n    ## pattern: .*?(?P\u003cnumber\u003e-?\\d+(,\\d+)*(\\.\\d+(e\\d+)?)?).*?\n    ## number of capturing subpatterns: 4\n    ## capturing names with indices: \n    ## .match number .2 .3 .4\n    ## expression size: 56\n\n``` r\n(res = re2_match(texts, number_pattern))\n```\n\n    ##      .match          number       .2     .3       .4   \n    ## [1,] \"pi is 3.14529\" \"3.14529\"    NA     \".14529\" NA   \n    ## [2,] \"-15.34\"        \"-15.34\"     NA     \".34\"    NA   \n    ## [3,] \"128\"           \"128\"        NA     NA       NA   \n    ## [4,] \"1.9e10\"        \"1.9e10\"     NA     \".9e10\"  \"e10\"\n    ## [5,] \"123,340.00\"    \"123,340.00\" \",340\" \".00\"    NA   \n    ## [6,] NA              NA           NA     NA       NA\n\n``` r\nres$number\n```\n\n    ## [1] \"3.14529\"    \"-15.34\"     \"128\"        \"1.9e10\"     \"123,340.00\"\n    ## [6] NA\n\n``` r\nshow_regex(number_pattern)\n```\n\n![number pattern](https://raw.githubusercontent.com/qinwf/re2r/master/inst/img/number.png)\n\n### 2. Replace a substring\n\n``` r\nre2_replace(string, pattern, rewrite)\n```\n\nSearches the string \"input string\" for the occurence(s) of a substring that matches 'pattern' and replaces the found substrings with \"rewrite text\".\n\n``` r\ninput_string = \"this is just one test\";\nnew_string = \"my\"\nre2_replace(new_string, \"(o.e)\", input_string)\n```\n\n    ## [1] \"my\"\n\nmask the middle three digits of a US phone number\n\n``` r\ntexts = c(\"415-555-1234\",\n          \"650-555-2345\",\n          \"(416)555-3456\",\n          \"202 555 4567\",\n          \"4035555678\",\n          \"1 416 555 9292\")\n\nus_phone_pattern = re2(\"(1?[\\\\s-]?\\\\(?\\\\d{3}\\\\)?[\\\\s-]?)(\\\\d{3})([\\\\s-]?\\\\d{4})\")\n\nre2_replace(texts, us_phone_pattern, \"\\\\1***\\\\3\")\n```\n\n    ## [1] \"415-***-1234\"   \"650-***-2345\"   \"(416)***-3456\"  \"202 *** 4567\"  \n    ## [5] \"403***5678\"     \"1 416 *** 9292\"\n\n### 3. Extract a substring\n\n``` r\nre2_extract(string, pattern, replacement)\n```\n\nExtract matching patterns from a string.\n\n``` r\nre2_extract(\"yabba dabba doo\", \"(.)\")\n```\n\n    ## [1] \"y\"\n\n``` r\nre2_extract(\"test@me.com\", \"(.*)@([^.]*)\")\n```\n\n    ## [1] \"test@me\"\n\n### 4. `Regular Expression Object` for better performance\n\nWe can create a regular expression object (RE2 object) from a string. It will reduce the time to parse the syntax of the same pattern.\n\nAnd this will also give us more option for the pattern. run `help(re2)` to get more detials.\n\n``` r\nregexp = re2(\"test\",case_sensitive = FALSE)\nprint(regexp)\n```\n\n    ## re2 pre-compiled regular expression\n    ## \n    ## pattern: test\n    ## number of capturing subpatterns: 0\n    ## capturing names with indices: \n    ## .match\n    ## expression size: 10\n\n``` r\nregexp = re2(\"test\",case_sensitive = FALSE)\nre2_match(\"TEST\", regexp)\n```\n\n    ##      .match\n    ## [1,] \"TEST\"\n\n``` r\nre2_replace(\"TEST\", regexp, \"ops\")\n```\n\n    ## [1] \"ops\"\n\n### 5. Multithread\n\nUse `parallel` option to enable multithread feature. It will improve performance for large inputs with a multi core CPU.\n\n``` r\nre2_match(string, pattern, parallel = T)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqinwf%2Fre2r","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqinwf%2Fre2r","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqinwf%2Fre2r/lists"}