{"id":15642919,"url":"https://github.com/sgreben/regex-builder","last_synced_at":"2025-04-30T10:05:41.326Z","repository":{"id":56208903,"uuid":"63005155","full_name":"sgreben/regex-builder","owner":"sgreben","description":"Write regular expressions in pure Java","archived":false,"fork":false,"pushed_at":"2020-11-20T12:21:32.000Z","size":195,"stargazers_count":61,"open_issues_count":5,"forks_count":10,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-02-25T06:43:23.604Z","etag":null,"topics":["builder","capture-groups","expression-builder","fluent","java","regex","wrapper"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sgreben.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-07-10T15:25:33.000Z","updated_at":"2023-10-29T01:49:58.000Z","dependencies_parsed_at":"2022-08-15T14:40:53.430Z","dependency_job_id":null,"html_url":"https://github.com/sgreben/regex-builder","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sgreben%2Fregex-builder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sgreben%2Fregex-builder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sgreben%2Fregex-builder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sgreben%2Fregex-builder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sgreben","download_url":"https://codeload.github.com/sgreben/regex-builder/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242635331,"owners_count":20161437,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["builder","capture-groups","expression-builder","fluent","java","regex","wrapper"],"created_at":"2024-10-03T11:58:07.891Z","updated_at":"2025-03-09T02:30:39.038Z","avatar_url":"https://github.com/sgreben.png","language":"Java","readme":"# Java Regex Builder\n\nWrite regexes as **plain Java code**. Unlike opaque regex strings, commenting your expressions and reusing regex fragments is straightforward.\n\nThe **regex-builder** library is implemented as a light-weight wrapper around `java.util.regex`. It consists of three main components: the expression builder `Re`, its fluent API equivalent `FluentRe`, and the character class builder  `CharClass`. The components are introduced in the examples below as well as in the API overview tables at the end of this document.\n\nThere's a [discussion](https://www.reddit.com/r/java/comments/4tyk90/github_sgrebenregexbuilder_write_regular/) of this project over on the Java subreddit.\n\n- [Maven dependency](#maven-dependency)\n- [Examples](#examples)\n  - [Apache log](#apache-log)\n  - [Apache log (fluent API)](#apache-log-fluent-api)\n  - [Date (DD/MM/YYYY HH:MM:SS)](#date-ddmmyyyy-hhmmss)\n  - [Hex color](#hex-color)\n- [Reusing expressions](#reusing-expressions)\n  - [Reusable Apache log expression](#reusable-apache-log-expression)\n- [API](#api)\n  - [Expression builder](#expression-builder)\n  - [CharClass builder](#charclass-builder)\n\n## Maven dependency\n\n```xml\n\u003cdependency\u003e\n  \u003cgroupId\u003ecom.github.sgreben\u003c/groupId\u003e\n  \u003cartifactId\u003eregex-builder\u003c/artifactId\u003e\n  \u003cversion\u003e1.2.1\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n## Examples\n\nImports:\n```java\nimport com.github.sgreben.regex_builder.CaptureGroup;\nimport com.github.sgreben.regex_builder.Expression;\nimport com.github.sgreben.regex_builder.Pattern;\nimport static com.github.sgreben.regex_builder.CharClass.*;\nimport static com.github.sgreben.regex_builder.Re.*;\n```\n\n### Apache log\n\n- Regex string: `(\\\\S+) (\\\\S+) (\\\\S+) \\\\[([\\\\w:/]+\\\\s[+\\\\-]\\\\d{4})\\\\] \\\"(\\\\S+) (\\\\S+) (\\\\S+)\\\" (\\\\d{3}) (\\\\d+)`\n- Java code:\n```java\n\nCaptureGroup ip, client, user, dateTime, method, request, protocol, responseCode, size;\nExpression token = repeat1(nonWhitespaceChar());\n\nip = capture(token);\nclient = capture(token);\nuser = capture(token);\ndateTime = capture(sequence(\n  repeat1(union(wordChar(),':','/')),  whitespaceChar(), oneOf(\"+\\\\-\"), repeat(digit(), 4)\n));\nmethod = capture(token);\nrequest = capture(token);\nprotocol = capture(token);\nresponseCode = capture(repeat(digit(), 3));\nsize = capture(number());\n\nPattern p = Pattern.compile(sequence(\n  ip, ' ', client, ' ', user, \" [\", dateTime, \"] \\\"\", method, ' ', request, ' ', protocol, \"\\\" \", responseCode, ' ', size\n));\n```\nNote that capture groups are plain java objects - no need to mess around with group indices or string group names. You can use the expression like this:\n```java\nString logLine = \"127.0.0.1 - - [21/Jul/2014:9:55:27 -0800] \\\"GET /home.html HTTP/1.1\\\" 200 2048\";\nMatcher m = p.matcher(logLine);\n\nassertTrue(m.matches());\n\nassertEquals(\"127.0.0.1\", m.group(ip));\nassertEquals(\"-\", m.group(client));\nassertEquals(\"-\", m.group(user));\nassertEquals(\"21/Jul/2014:9:55:27 -0800\", m.group(dateTime));\nassertEquals(\"GET\", m.group(method));\nassertEquals(\"/home.html\", m.group(request));\nassertEquals(\"HTTP/1.1\", m.group(protocol));\nassertEquals(\"200\", m.group(responseCode));\nassertEquals(\"2048\", m.group(size));\n```\n\nOr, if you'd like to rewrite the log to a simpler \"ip - request - response code\" format, you can simply do\n```java\nString result = m.replaceFirst(replacement(ip, \" - \", request, \" - \", responseCode));\n```\n\n### Apache log (fluent API)\n\nThe above example can also be expressed using the fluent API implemented in `FluentRe`. To use it, you have import it as\n\n```java\nimport static com.github.sgreben.regex_builder.CharClass.*;\nimport com.github.sgreben.regex_builder.FluentRe;\n```\n\n```java\nCaptureGroup ip, client, user, dateTime, method, request, protocol, responseCode, size;\nFluentRe nonWhitespace = FluentRe.match(nonWhitespaceChar()).repeat1();\n\nip = nonWhitespace.capture();\nclient = nonWhitespace.capture();\nuser = nonWhitespace.capture();\ndateTime = FluentRe\n    .match(union(wordChar(), oneOf(\":/\"))).repeat1()\n    .then(whitespaceChar())\n    .then(oneOf(\"+\\\\-\"))\n    .then(FluentRe.match(digit()).repeat(4))\n    .capture();\nmethod = nonWhitespace.capture();\nrequest = nonWhitespace.capture();\nprotocol = nonWhitespace.capture();\nresponseCode = FluentRe.match(digit()).repeat(3).capture();\nsize = FluentRe.match(digit()).repeat1().capture();\n\nPattern p = FluentRe.match(beginInput())\n    .then(ip).then(' ')\n    .then(client).then(' ')\n    .then(user).then(\" [\")\n    .then(dateTime).then(\"] \\\"\")\n    .then(method).then(' ')\n    .then(request).then(' ')\n    .then(protocol).then(\"\\\" \")\n    .then(responseCode).then(' ')\n    .then(size)\n    .then(endInput())\n    .compile();\n```\n\n### Date (DD/MM/YYYY HH:MM:SS)\n\n- Regex string: `(\\d\\d\\)/(\\d\\d)\\/(\\d\\d\\d\\d) (\\d\\d):(\\d\\d):(\\d\\d)`\n- Java code:\n```java\nExpression twoDigits = repeat(digit(), 2);\nExpression fourDigits = repeat(digit(), 4);\nCaptureGroup day = capture(twoDigits);\nCaptureGroup month = capture(twoDigits);\nCaptureGroup year = capture(fourDigits);\nCaptureGroup hour = capture(twoDigits);\nCaptureGroup minute = capture(twoDigits);\nCaptureGroup second = capture(twoDigits);\nExpression dateExpression = sequence(\n  day, '/', month, '/', year, ' ', // DD/MM/YYY\n  hour, ':', minute, ':', second,    // HH:MM:SS\n);\n```\n\nUse the expression like this:\n```java\nPattern p = Pattern.compile(dateExpression)\nMatcher m = p.matcher(\"01/05/2015 12:30:22\");\nm.find();\nassertEquals(\"01\", m.group(day));\nassertEquals(\"05\", m.group(month));\nassertEquals(\"2015\", m.group(year));\nassertEquals(\"12\", m.group(hour));\nassertEquals(\"30\", m.group(minute));\nassertEquals(\"22\", m.group(second));\n```\n\n### Hex color\n\n- Regex string: `#([a-fA-F0-9]){3}(([a-fA-F0-9]){3})?`\n- Java code:\n```java\nExpression threeHexDigits = repeat(hexDigit(), 3);\nCaptureGroup hexValue = capture(\n    threeHexDigits,              // #FFF\n    optional(threeHexDigits)  // #FFFFFF\n);\nExpression hexColor = sequence(\n  '#', hexValue\n);\n```\n\nUse the expression like this:\n```java\nPattern p = Pattern.compile(hexColor);\nMatcher m = p.matcher(\"#0FAFF3 and #1bf\");\nm.find();\nassertEquals(\"0FAFF3\", m.group(hexValue));\nm.find();\nassertEquals(\"1bf\", m.group(hexValue));\n```\n\n## Reusing expressions\n\nTo reuse an expression cleanly, it should be packaged as a class. To access the capture groups contained in the expression,\neach capture group should be exposed as a final field or method.\n\nTo allow the resulting object to be used as an expression, `regex-builder` provides a utility class `ExpressionWrapper`,\nwhich exposes a method `setExpression(Expression expr)` and implements the `Expresssion` interface.\n\n```java\nimport com.github.sgreben.regex_builder.ExpressionWrapper;\n```\n\nTo use the class, simply extend it and call `setExpression` in your constructor or initialization block.\nYou can then pass it to any `regex-builder` method that expects an `Expression`.\n\n### Reusable Apache log expression\nUsing `ExpressionWrapper`, we can package the Apache log\nexample above as follows:\n```java\npublic class ApacheLog extends ExpressionWrapper {\n    public final CaptureGroup ip, client, user, dateTime, method, request, protocol, responseCode, size;\n\n    {\n        Expression nonWhitespace = repeat1(CharClass.nonWhitespaceChar());\n        ip = capture(nonWhitespace);\n        client = capture(nonWhitespace);\n        user = capture(nonWhitespace);\n        dateTime = capture(sequence(\n            repeat1(union(wordChar(), ':', '/')),\n            whitespaceChar(),\n            oneOf(\"+\\\\-\"),\n            repeat(digit(), 4)\n        ));\n        method = capture(nonWhitespace);\n        request = capture(nonWhitespace);\n        protocol = capture(nonWhitespace);\n        responseCode = capture(repeat(CharClass.digit(), 3));\n        size = capture(repeat1(CharClass.digit()));\n\n        Expression expression = sequence(\n            ip, ' ', client, ' ', user, \" [\", dateTime, \"] \\\"\", method, ' ', request, ' ', protocol, \"\\\" \", responseCode, ' ', size,\n        );\n        setExpression(expression);\n    }\n}\n```\n\nWe can then use instances of the packaged expression like this:\n\n```java\npublic static boolean sameIP(String twoLogs) {\n    ApacheLog log1 = new ApacheLog();\n    ApacheLog log2 = new ApacheLog();\n    Pattern p = Pattern.compile(sequence(\n        log1, ' ', log2\n    ));\n    Matcher m = p.matcher(twoLogs);\n    m.find();\n    return m.group(log1.ip).equals(m.group(log2.ip));\n}\n```\n\n\n## API\n\n### Expression builder\n\n| Builder method              | `java.util.regex` syntax |\n| --------------------------- | ------------------------ |\n| repeat(e, N)                | e{N}                     |\n| repeat(e)                   | e*                       |\n| repeat(e).possessive()      | e*+                      |\n| repeatPossessive(e)         | e*+                      |\n| repeat1(e)                  | e+                       |\n| repeat1(e).possessive()     | e++                      |\n| repeat1Possessive(e)        | e++                      |\n| optional(e)                 | e?                       |\n| optional(e).possessive()    | e?+                      |\n| optionalPossessive(e)       | e?+                      |\n| capture(e)                  | (e)                      |\n| positiveLookahead(e)        | (?=e)                    |\n| negativeLookahead(e)        | (?!e)                    |\n| positiveLookbehind(e)       | (?\u003c=e)                   |\n| negativeLookbehind(e)       | (?\u003c!e)                   |\n| backReference(g)            | \\g                       |\n| separatedBy(sep, e)         | (?:e((?:sep)(?:e))*)?    |\n| separatedBy1(sep, e)        | e(?:(?:sep)(?:e))*       |\n| choice(e1,...,eN)           | (?:e1\\|...\\| eN)         |\n| sequence(e1,...,eN)         | e1...eN                  |\n| string(s)                   | \\Qs\\E                    |\n| word()                      | \\w+                      |\n| number()                    | \\d+                      |\n| whitespace()                | \\s*                      |\n| whitespace1()               | \\s+                      |\n| CaptureGroup g = capture(e) | (?g e)                   |\n\n### CharClass builder\n\n| Builder method                | `java.util.regex` syntax |\n| ----------------------------- | ------------------------ |\n| range(from, to)               | [from-to]                |\n| range(f1, t1, ..., fN, tN)    | [f1-t1f2-t2...fN-tN]     |\n| oneOf(\"abcde\")                | [abcde]                  |\n| union(class1, ..., classN)    | [[class1]...[classN]]    |\n| complement(class1)            | [\\^class1]               |\n| anyChar()                     | .                        |\n| digit()                       | \\d                       |\n| nonDigit()                    | \\D                       |\n| hexDigit()                    | [a-fA-F0-9]              |\n| nonHexDigit()                 | [\\^a-fA-F0-9]            |\n| wordChar()                    | \\w                       |\n| nonWordChar()                 | \\W                       |\n| wordBoundary()                | \\b                       |\n| nonWordBoundary()             | \\B                       |\n| whitespaceChar()              | \\s                       |\n| nonWhitespaceChar()           | \\S                       |\n| verticalWhitespaceChar()      | \\v                       |\n| nonVerticalWhitespaceChar()   | \\V                       |\n| horizontalWhitespaceChar()    | \\h                       |\n| nonHorizontalWhitespaceChar() | \\H                       |\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsgreben%2Fregex-builder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsgreben%2Fregex-builder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsgreben%2Fregex-builder/lists"}