{"id":13656758,"url":"https://github.com/openresty/sregex","last_synced_at":"2025-04-23T22:31:23.375Z","repository":{"id":5090012,"uuid":"6252484","full_name":"openresty/sregex","owner":"openresty","description":"A non-backtracking NFA/DFA-based Perl-compatible regex engine matching on large data streams","archived":false,"fork":false,"pushed_at":"2021-11-01T05:04:16.000Z","size":685,"stargazers_count":617,"open_issues_count":11,"forks_count":111,"subscribers_count":72,"default_branch":"master","last_synced_at":"2024-11-02T03:35:42.160Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"sephiroth74/HorizontalVariableListView","license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openresty.png","metadata":{"files":{"readme":"README.markdown","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-10-16T22:03:29.000Z","updated_at":"2024-10-31T16:23:06.000Z","dependencies_parsed_at":"2022-07-29T01:39:26.409Z","dependency_job_id":null,"html_url":"https://github.com/openresty/sregex","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openresty%2Fsregex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openresty%2Fsregex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openresty%2Fsregex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openresty%2Fsregex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openresty","download_url":"https://codeload.github.com/openresty/sregex/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223936158,"owners_count":17228105,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T05:00:31.952Z","updated_at":"2024-11-10T09:31:45.928Z","avatar_url":"https://github.com/openresty.png","language":"C","readme":"Name\n====\n\nlibsregex - A non-backtracking NFA/DFA-based Perl-compatible regex engine library for matching on large data streams\n\nTable of Contents\n=================\n\n* [Name](#name)\n* [Status](#status)\n* [Syntax Supported](#syntax-supported)\n* [API](#api)\n    * [Constants](#constants)\n    * [Memory pool API](#memory-pool-api)\n        * [sre_create_pool](#sre_create_pool)\n        * [sre_destroy_pool](#sre_destroy_pool)\n        * [sre_reset_pool](#sre_reset_pool)\n    * [Regex parsing and compilation API](#regex-parsing-and-compilation-api)\n        * [sre_regex_parse](#sre_regex_parse)\n        * [sre_regex_parse_multi](#sre_regex_parse_multi)\n        * [sre_regex_compile](#sre_regex_compile)\n    * [Regex execution API](#regex-execution-api)\n        * [Thompson VM](#thompson-vm)\n            * [sre_vm_thompson_create_ctx](#sre_vm_thompson_create_ctx)\n            * [sre_vm_thompson_exec](#sre_vm_thompson_exec)\n            * [Just-In-Time Support for Thompson VM](#just-in-time-support-for-thompson-vm)\n                * [sre_vm_thompson_jit_compile](#sre_vm_thompson_jit_compile)\n                * [sre_vm_thompson_jit_get_handler](#sre_vm_thompson_jit_get_handler)\n                * [sre_vm_thompson_jit_create_ctx](#sre_vm_thompson_jit_create_ctx)\n        * [Pike VM](#pike-vm)\n            * [sre_vm_pike_create_ctx](#sre_vm_pike_create_ctx)\n            * [sre_vm_pike_exec](#sre_vm_pike_exec)\n* [Examples](#examples)\n* [Installation](#installation)\n* [Test Suite](#test-suite)\n* [TODO](#todo)\n* [Author](#author)\n* [Copyright and License](#copyright-and-license)\n* [See Also](#see-also)\n\nStatus\n======\n\nThis library is already quite usable and some people are already using it in production.\n\nNevertheless this library is still under heavy development. The API is still in flux\nand may be changed quickly without notice.\n\nThis is a pure C library that is designed to have zero dependencies.\n\nNo pathological regexes exist for this regex engine because it does not\nuse a backtracking algorithm at all.\n\nAlready rewrote the code base of Russ Cox's re1 library using the nginx coding style (yes, I love it!), also incorporated a clone of the nginx memory pool into it for memory management.\n\nAlready ported the Thompson and Pike VM backends to sregex. The former is just for yes-or-no matching, and the latter also supports sub-match capturing.\n\nImplemented the case-insensitive matching mode via the `SRE_REGEX_CASELESS` flag.\n\nThe full streaming matching API for the sregex engine has already been implemented,\nfor both the Pike and Thompson regex VMs. The sub-match capturing also supports streaming processing.\nWhen the state machine is yielded (that is, returning `SRE_AGAIN` on the current input data chunk),\nsregex will always output the current value range for the `$\u0026` sub-match capture in the user-supplied\n`ovector` array.\n\nAlmost all the relevant test cases for PCRE 8.32 and Perl 5.16.2 have been imported into sregex's test suite\nand all tests are passing right now.\n\nAlready implemented an API for assembling multiple user regexes and\nreturning an ID indicating exactly which regex is matched\n(first), as well as the corresponding sub-match captures.\n\nThere is also a Just-in-Time (JIT) compiler targeting `x86_64` for the Thompson VM.\n\nSyntax Supported\n================\n\nThe following Perl 5 regex syntax features have already been implemented.\n\n    ^             match the beginning of lines\n    $             match the end of lines\n\n    \\A            match only at beginning of stream\n    \\z            match only at end of stream\n\n    \\b            match a word boundary\n    \\B            match except at a word boundary\n\n    .             match any char\n\n    [ab0-9]       character classes (positive)\n    [^ab0-9]      character classes (negative)\n\n    \\d            match a digit character ([0-9])\n    \\D            match a non-digit character ([^0-9])\n\n    \\s            match a whitespace character ([ \\f\\n\\r\\t])\n    \\S            match a non-whitespace character ([^ \\f\\n\\r\\t])\n\n    \\h            match a horizontal whitespace character\n    \\H            match a character that isn't horizontal whitespace\n\n    \\v            match a vertical whitespace character\n    \\V            match a character that isn't vertical whitespace\n\n    \\w            match a \"word\" character ([A-Za-z0-9_])\n    \\W            match a non-\"word\" character ([^A-Za-z0-9_])\n\n    \\cK           control char (example: VT)\n\n    \\N            match a character that isn't a newline\n\n    ab            concatenation; first match a, and then b\n    a|b           alternation; match a or b\n\n    (a)           capturing parentheses\n    (?:a)         non-capturing parantheses\n\n    a?            match 1 or 0 times, greedily\n    a*            match 0 or more times, greedily\n    a+            match 1 or more times, greedily\n\n    a??           match 1 or 0 times, not greedily\n    a*?           match 0 or more times, not greedily\n    a+?           match 1 or more times, not greedily\n\n    a{n}          match exactly n times\n    a{n,m}        match at least n but not more than m times, greedily\n    a{n,}         match at least n times, greedily\n\n    a{n}?         match exactly n times, not greedily (redundant)\n    a{n,m}?       match at least n but not more than m times, not greedily\n    a{n,}?        match at least n times, not greedily\n\nThe following escaping sequences are supported:\n\n    \\t          tab\n    \\n          newline\n    \\r          return\n    \\f          form feed\n    \\a          alarm\n    \\e          escape\n    \\b          backspace (in character class only)\n    \\x{}, \\x00  character whose ordinal is the given hexadecimal number\n    \\o{}, \\000  character whose ordinal is the given octal number\n\nEscaping a regex meta character yields the literal character itself, like `\\{` and `\\.`.\n\nOnly the octet mode is supported; no multi-byte character encoding love (yet).\n\nAPI\n===\n\nThis library provides a pure C API. This API is still in flux and may change in the near future\nwithout notice.\n\n[Back to TOC](#table-of-contents)\n\nConstants\n---------\n\nThis library provides the following public constants for use in the various API functions.\n\n* `SRE_OK`\n* `SRE_DECLINED`\n* `SRE_AGAIN`\n* `SRE_ERROR`\n\nThe actual meanings of these constants depend on the concrete API functions using them.\n\n[Back to TOC](#table-of-contents)\n\nMemory pool API\n---------------\n\nThis library utilizes a memory pool to simplify memory management. Most of the low-level API\nfunctions provided by this library does accept a memory pool pointer as an argument.\n\nThe operations on the memory pool on the user side are limited to\n\n1. creating a memory pool,\n2. destroying a memory pool, and\n3. resetting a memory pool.\n\n[Back to TOC](#table-of-contents)\n\n### sre_create_pool\n\n```C\nsre_pool_t *sre_create_pool(size_t size);\n```\n\nCreates a memory pool with a page size of `size`. Returns the pool as an opaque pointer type `sre_pool_t`.\n\nUsually the page size you specify should not be too large. Usually 1KB or 4KB should be sufficient.\nOptimal values depend on your actual regexes and input data pattern involved and should be\ntuned empirically.\n\nThe returned memory pool pointer is usually fed into other API functions provided by this library\nas an argument.\n\nIt is your responsibility to destroy the pool when you no longer need it via the [sre_destroy_pool](#sre_destroy_pool) function. Failing to destroy the pool will result in memory leaks.\n\n[Back to TOC](#table-of-contents)\n\n### sre_destroy_pool\n\n```C\nvoid sre_destroy_pool(sre_pool_t *pool);\n```\n\nDestroys the memory pool created by the [sre_create_pool](#sre_create_pool) function.\n\n[Back to TOC](#table-of-contents)\n\n### sre_reset_pool\n\n```C\nvoid sre_reset_pool(sre_pool_t *pool);\n```\n\n[Back to TOC](#table-of-contents)\n\nRegex parsing and compilation API\n---------------------------------\n\nBefore running a regex (or set of multiple regexes), you need to parse and compile them first, such that\nyou can run the compiled form of the regex(es) over and over again at maximum speed.\n\n[Back to TOC](#table-of-contents)\n\n### sre_regex_parse\n\n```C\ntypedef uint8_t     sre_char;\ntypedef uintptr_t   sre_uint_t;\ntypedef intptr_t    sre_int_t;\n\nsre_regex_t *sre_regex_parse(sre_pool_t *pool, sre_char *regex,\n    sre_uint_t *ncaps, int flags, sre_int_t *err_offset);\n```\n\nParses the string representation of the user regex specified by the `regex` parameter (as a null-terminated string).\n\nReturns a parsed regex object of the opaque pointer type `sre_regex_t` if no error happens. Otherwise returns a NULL pointer and set the offset in the `regex` string where the parse failure happens.\n\nThe parsed regex object pointer is an Abstract-Syntax-Tree (AST) representation of the string regex.\nIt can later be fed into API function calls like [sre_regex_compile](#sre_regex_compile) as an argument.\n\nThe first parameter, `pool`, is a memory pool created by the [sre_create_pool](#sre_create_pool) API function.\n\nThe `ncaps` parameter is used to output the number of sub-match captures found in the regex. This integer can later be used to extract sub-match captures.\n\nThe `flags` parameter specifies additional regex compiling flags like below:\n\n* `SRE_REGEX_CASELESS`\n    case-insensitive matching mode.\n\n[Back to TOC](#table-of-contents)\n\n### sre_regex_parse_multi\n\n```C\ntypedef uint8_t     sre_char;\ntypedef uintptr_t   sre_uint_t;\ntypedef intptr_t    sre_int_t;\n\nsre_regex_t *sre_regex_parse_multi(sre_pool_t *pool, sre_char **regexes,\n    sre_int_t nregexes, sre_uint_t *max_ncaps, int *multi_flags,\n    sre_int_t *err_offset, sre_int_t *err_regex_id);\n```\n\nSimilar to the [sre_regex_parse](#sre_regex_parse) API function but works on multiple\nregexes at once.\n\nThese regexes are specified by the C string array `regexes`, whose size is determined by the `nregexes` parameter.\n\nAll these input regexes are combined into a single parsed regex object, returned as the opaque\npointer of the type `sre_regex_t`, just like [sre_regex_parse](#sre_regex_parse). These regexes are\nlogically connected via the alternative regex operator (`|`), so the order of these regexes determine\ntheir relative precedence in a tie. Despite of being connected by `|` logically, the\n[regex execution API](#regex-execution-api) can still signify which of these regexes is matched\nby returning the regex ID which is the offset of the regex in the `regexes` input array.\n\nUpon failures, returns the NULL pointer and sets\n\n* the output parameter `err_regex_id` for the number of regex having syntax errors\n(i.e., the 0-based offset of the regex in the `regexes` input parameter array), and\n* the output parameter `err_offset` for the string offset in the guilty regex where the failure happens.\n\nThe output parameter `max_ncaps` returns the maximum number of sub-match captures in all these regexes.\nNote that, this is is the maximum instead of the sum.\n\nThe `multi_flags` is an input array consisting of the regex flags for every regex specified in the `regexes` array.\nThe size of this array must be no shorter than the size specified by `nregexes`. For what\nregex flags you can use, just check out the documentation for the [sre_regex_parse](#sre_regex_parse) API function.\n\n[Back to TOC](#table-of-contents)\n\n### sre_regex_compile\n\n```C\nsre_program_t *sre_regex_compile(sre_pool_t *pool, sre_regex_t *re);\n```\n\nCompiles the parsed regex object (returned by [sre_regex_parse](#sre_regex_parse)) into a bytecode\nrepresentation of the regex, of the opaque pointer type `sre_program_t`.\n\nReturns the NULL pointer in case of failures.\n\nThe memory pool specified by the `pool` parameter does not have to be the same as the one used\nby the earlier [sre_regex_parse](#sre_regex_parse) call. But you could use the same memory pool if you want.\n\nThe compiled regex form (or bytecode form) returned can be fed into one of the regex backend VMs\nprovided by this library for execution. See [regex execution API](#regex-execution-api) for more\ndetails.\n\n[Back to TOC](#table-of-contents)\n\nRegex execution API\n-------------------\n\nThe regex execution API provides various different virtual machines (VMs) for running\nthe compiled regexes by different algorithms.\n\nCurrently the following VMs are supported:\n\n* [Thompson VM](#thompson-vm)\n* [Pike VM](#pike-vm)\n\n[Back to TOC](#table-of-contents)\n\n### Thompson VM\n\nThe Thompson VM uses the Thompson NFA simulation algorithm to execute the compiled regex(es) by\nmatching against an input string (or input stream).\n\n[Back to TOC](#table-of-contents)\n\n#### sre_vm_thompson_create_ctx\n\n```C\nsre_vm_thompson_ctx_t *sre_vm_thompson_create_ctx(sre_pool_t *pool,\n    sre_program_t *prog);\n```\n\nCreates and returns a context structure (of the opaque type `sre_vm_thompson_ctx_t`) for\nthe Thompson VM. Returns NULL in case of failure (like running out of memory).\n\nThis return value can later be used by the [sre_vm_thompson_exec](#sre_vm_thompson_exec) function as an argument.\n\nThe `prog` parameter accepts the compiled bytecode form of the regex(es) returned by the [sre_regex_compile](#sre_regex_compile)\nfunction. This compiled regex(es) is embedded into the resulting context structure.\n\nAccepts a memory pool created by the [sre_create_pool](#sre_create_pool) function as the first argument. This memory pool does not have to be the same as the pool used for parsing or compiling the regex(es).\n\n[Back to TOC](#table-of-contents)\n\n#### sre_vm_thompson_exec\n\n```C\ntypedef intptr_t    sre_int_t;\ntypedef uint8_t     sre_char;\n\nsre_int_t sre_vm_thompson_exec(sre_vm_thompson_ctx_t *ctx, sre_char *input,\n    size_t size, unsigned int eof);\n```\n\nExecutes the compiled regex(es) on the input string data atop the Thompson VM (without Just-In-Time optimizations).\n\nThe `ctx` argument value is returned by the [sre_vm_thompson_create_ctx](#sre_vm_thompson_create_ctx)\nfunction. The compiled (bytecode) form of the regex(es) are already embedded in this `ctx` value.\nThis `ctx` argument can be changed by this function call and must be preserved for all the `sre_vm_thompson_exec` calls\non the same data stream. Different data streams MUST use different `ctx` instances. When a data stream is completely processed, the corresponding `ctx` instance MUST be discarded and cannot be reused again.\n\nThe input data is specified by a character data chunk in a data stream. The `input` parameter specifies the starting address of the data\nchunk, the `size` parameter specifies the size of the chunk, while the `eof` parameter identifies\nwhether this chunk is the last chunk in the stream. If you just want to match on a single\nC string, then always specify 1 as the `eof` argument and exclude the NULL string terminator in your C string while computing the `size` argument value.\n\nThis function may return one of the following values:\n\n* `SRE_OK`\n    A match is found.\n* `SRE_DECLINED`\n    No match can be found. This value can never be returned when the `eof` parameter is unset (because\n    a match MAY get found when seeing more input string data).\n* `SRE_AGAIN`\n    More data (in a subsequent call) is needed to obtain a match. The current data chunk can\n    be discarded after this call returns. This value can only be returned when the `eof` parameter is\n    not set.\n* `SRE_ERROR`\n    A fatal error has occurred (like running out of memory).\n\nThis function does not return the regex ID of the matched regex when multiple regexes are\nspecified at once via the [sre_regex_parse_multi](#sre_regex_parse_multi) function is used. This\nmay change in the future.\n\nSub-match captures are not supported in this Thompson VM by design. You should use the [Pike VM](#pike-vm) instead if you want that.\n\n[Back to TOC](#table-of-contents)\n\n#### Just-In-Time Support for Thompson VM\n\nThe Thompson VM comes with a Just-In-Time compiler. Currently only the x86_64 architecture is supported.\nSupport for other architectures may come in the future.\n\n[Back to TOC](#table-of-contents)\n\n##### sre_vm_thompson_jit_compile\n\n```C\ntypedef intptr_t    sre_int_t;\n\nsre_int_t sre_vm_thompson_jit_compile(sre_pool_t *pool, sre_program_t *prog,\n    sre_vm_thompson_code_t **pcode);\n```\n\nCompiles the bytecode form of the regex(es) created by [sre_regex_compile](#sre_regex_compile)\ndown into native code.\n\nIt returns one of the following values:\n\n* `SRE_OK`\n    Compilation is successful.\n* `SRE_DECLINED`\n    The current architecture is not supported.\n* `SRE_ERROR`\n    A fatal error occurs (like running out of memory).\n\nThe `pool` parameter specifies a memory pool created by [sre_create_pool](#sre_create_pool).\nThis pool is used for the JIT compilation.\n\nThe `prog` parameter is the compiled bytecode form of the regex(es) created by the [sre_regex_compile](#sre_regex_compile)\nfunction call.\n\nThe resulting JIT compiled native code along with the runtime information is saved in the output\nargument `pcode` of the opaque type `sre_vm_thompson_code_t`. This structure is allocated by this\nfunction in the provided memory pool.\n\nThis `sre_vm_thompson_code_t` object can later be executed by running the C function pointer\nfetched from this object via the [sre_vm_thompson_jit_get_handler](#sre_vm_thompson_jit_get_handler) call.\n\n[Back to TOC](#table-of-contents)\n\n##### sre_vm_thompson_jit_get_handler\n\n```C\ntypedef uint8_t     sre_char;\ntypedef intptr_t    sre_int_t;\ntypedef sre_int_t (*sre_vm_thompson_exec_pt)(sre_vm_thompson_ctx_t *ctx,\n    sre_char *input, size_t size, unsigned int eof);\n\nsre_vm_thompson_exec_pt sre_vm_thompson_jit_get_handler(\n    sre_vm_thompson_code_t *code);\n```\n\nFetches a C function pointer from the JIT compiled form of the regex(es) generated via an\nearlier [sre_vm_thompson_jit_compile](#sre_vm_thompson_jit_compile).\n\nThe C function pointer is of the exactly same function prototype of the interpreter entry\nfunction [sre_vm_thompson_exec](#sre_vm_thompson_exec). The only difference is that the\n`sre_vm_thompson_ctx_t` object MUST be created via the [sre_vm_thompson_jit_create_ctx](#sre_vm_thompson_jit_create_ctx)\nfunction instead of the [sre_vm_thompson_create_ctx](#sre_vm_thompson_create_ctx) function. Despite that, the resulting C function pointer can be used as the same way as [sre_vm_thompson_exec](#sre_vm_thompson_exec).\n\n[Back to TOC](#table-of-contents)\n\n##### sre_vm_thompson_jit_create_ctx\n\n```C\nsre_vm_thompson_ctx_t *sre_vm_thompson_jit_create_ctx(sre_pool_t *pool,\n    sre_program_t *prog);\n```\n\nAllocates a context structure for executing the compiled native code form of the regex(s) generated\nby the Just-In-Time compiler of the Thompson VM.\n\nThis context object should only be used by the C function returned by the [sre_vm_thompson_jit_get_handler](#sre_vm_thompson_jit_get_handler)\nfunction call. Use of this object in [sre_vm_thompson_exec](#sre_vm_thompson_exec) is prohibited.\n\n[Back to TOC](#table-of-contents)\n\n### Pike VM\n\nThe Pike VM uses an enhanced version of the Thompson NFA simulation algorithm that supports sub-match\ncaptures.\n\n[Back to TOC](#table-of-contents)\n\n#### sre_vm_pike_create_ctx\n\n```C\ntypedef intptr_t    sre_int_t;\n\nsre_vm_pike_ctx_t *sre_vm_pike_create_ctx(sre_pool_t *pool, sre_program_t *prog,\n    sre_int_t *ovector, size_t ovecsize);\n```\n\nCreates and returns a context structure (of the opaque type `sre_vm_pike_ctx_t`) for the\nPike VM. Returns NULL in case of failure (like running out of memory).\n\nThis return value can later be used by the [sre_vm_pike_exec](#sre_vm_pike_exec) function as an argument.\n\nThe `prog` parameter accepts the compiled bytecode form of the regex(es) returned by the [sre_regex_compile](#sre_regex_compile)\nfunction. This compiled regex(es) is embedded into the resulting context structure.\n\nAccepts a memory pool created by the [sre_create_pool](#sre_create_pool) function as the first argument. This memory pool does not have to be the same as the pool used for parsing or compiling the regex(es).\n\nThe `ovector` parameter specifies an array for outputting the beginning and end offsets of the (sub-)match captures.\nThe elements of the array are used like below:\n\n1. The 1st element of the array holds the beginning offset of the whole match,\n2. the 2nd element holds the end offset of the whole match,\n3. the 3rd element holds the beginning offset of the 1st sub-match capture,\n4. the 4th element holds the end offset of the 1st sub-match capture,\n5. the 5rd element holds the beginning offset of the 2st sub-match capture,\n6. the 6th element holds the end offset of the 2st sub-match capture,\n7. and so on...\n\nThe size of the `ovector` array is specified by the `ovecsize` parameter, in bytes. The size of the array\ncan be computed as follows:\n\n```\n    ovecsize = 2 * (ncaps + 1) * sizeof(sre_int_t)\n```\n\nwhere `ncaps` is the value previously output by the [sre_regex_parse](#sre_regex_parse) \nor [sre_regex_parse_multi](#sre_regex_parse_multi) function.\n\nThe `ovector` array is allocated by the caller and filled by this function call.\n\n[Back to TOC](#table-of-contents)\n\n#### sre_vm_pike_exec\n\n```C\ntypedef uint8_t     sre_char;\ntypedef intptr_t    sre_int_t;\n\nsre_int_t sre_vm_pike_exec(sre_vm_pike_ctx_t *ctx, sre_char *input, size_t size,\n    unsigned eof, sre_int_t **pending_matched);\n```\n\nExecutes the compiled regex(es) on the input string data atop the Pike VM (without Just-In-Time optimizations).\n\nThe `ctx` argument value is returned by the [sre_vm_pike_create_ctx](#sre_vm_pike_create_ctx)\nfunction. The compiled (bytecode) form of the regex(es) are already embedded in this `ctx` value.\nThis `ctx` argument can be changed by this function call and must be preserved for all the `sre_vm_pike_exec` calls\non the same data stream. Different data streams MUST use different `ctx` instances. When a data stream is completely processed, the corresponding `ctx` instance MUST be discarded and cannot be reused again.\n\nThe input data is specified by a character data chunk in a data stream. The `input` parameter specifies the starting address of the data\nchunk, the `size` parameter specifies the size of the chunk, while the `eof` parameter identifies\nwhether this chunk is the last chunk in the stream. If you just want to match on a single\nC string, then always specify 1 as the `eof` argument and exclude the NULL string terminator in your C string while computing the `size` argument value.\n\nThe `pending_matched` parameter outputs an array holding all the pending matched captures (whole-match only, no sub-matches) if\nno complete matches have been found yet (i.e., this call returns `SRE_AGAIN`).\nThis is very useful for doing regex substitutions on (large) data streams where the caller\ncan use the info in `pending_matched` to decide exactly how much data in the current to-be-thrown data chunk needs to be buffered.\nThe caller should never allocate the space for this array, rather,\nthis function call takes care of it and just sets the (double) pointer to point to its internal (read-only) storage.\n\nThis function may return one of the following values:\n\n* a non-negative value\n    A match is found and the value is the ID of the (first) matched regex if multiple regexes are\n    parsed at once via the [sre_regex_parse_multi](#sre_regex_parse_multi) function. A regex ID\n    is the 0-based index of the corresponding regex in the regexes array fed into the [sre_regex_parse_multi](#sre_regex_parse_multi)\n    function.\n* `SRE_DECLINED`\n    No match can be found. This value can never be returned when the `eof` parameter is unset (because\n    a match MAY get found when seeing more input string data).\n* `SRE_AGAIN`\n    More data (in a subsequent call) is needed to obtain a match. The current data chunk can\n    be discarded after this call returns. This value can only be returned when the `eof` parameter is\n    not set.\n* `SRE_ERROR`\n    A fatal error has occurred (like running out of memory).\n\n[Back to TOC](#table-of-contents)\n\nExamples\n========\n\nPlease check out the sregex-cli command-line utility's source for usage:\n\nhttps://github.com/agentzh/sregex/blob/master/src/sre_cli.c#L1\n\nThe `sregex-cli` command-line interface can be used as a convenient way to exercise the engine:\n\n    ./sregex-cli 'a|ab' 'blab'\n\nIt also supports the `--flags` option which can be used to enable case-insensitive matching:\n\n    ./sregex-cli --flags i 'A|AB' 'blab'\n\nAnd also the `--stdin` option for reading data chunks from stdin:\n\n    # one single data chunk to be matched:\n    perl -e '$s=\"foobar\";print length($s),\"\\n$s\"' \\\n        | ./sregex-cli --stdin foo\n\n    # 3 data chunks (forming a single input stream) to be matched:\n    perl -e '$s=\"foobar\";print length($s),\"\\n$s\" for 1..3' \\\n        | sregex-cli --stdin foo\n\nA real-world application of this library is the ngx_replace_filter module:\n\nhttps://github.com/agentzh/replace-filter-nginx-module\n\n[Back to TOC](#table-of-contents)\n\nInstallation\n============\n\n    make\n    make install\n\nGnu make and gcc are required. (On operating systems like FreeBSD and Solaris, you should type `gmake` instead of `make` here.)\n\nIt will build `libsregex.so` (or `libsregex.dylib` on Mac OS X), `libsregex.a`, and the command-line utility `sregex-cli` and install\n    them into the prefix `/usr/local/` by default.\n\nIf you want to install into a custom location, then just specify the `PREFIX` variable like this:\n\n    make PREFIX=/opt/sregex\n    make install PREFIX=/opt/sregex\n\nIf you are building a binary package (like an RPM package), then\nyou will find the `DESTDIR` variable handy, as in\n\n    make PREFIX=/opt/sregex\n    make install PREFIX=/opt/sregex DESTDIR=/path/to/my/build/root\n\nIf you run `make distclean` before `make`, then you also need bison 2.7+\nfor generating the regex parser files.\n\n[Back to TOC](#table-of-contents)\n\nTest Suite\n==========\n\nThe test suite is driven by Perl 5.\n\nTo run the test suite\n\n    make test\n\nGnu make, perl 5.16.2, and the following Perl CPAN modules are required:\n\n* Cwd\n* IPC::Run3\n* Test::Base\n* Test::LongString\n\nIf you already have `perl` installed in your system, you can use the following\ncommand to install these CPAN modules (you may need to run it using `root`):\n\n    cpan Cwd IPC::Run3 Test::Base Test::LongString\n\nYou can also run the test suite using the Valgrind Memcheck tool to check\nmemory issues in sregex:\n\n    make valtest\n\nBecause we have a huge test suite, to run the test suite in parallel, you can specify\nthe parallelism level with the `jobs` `make` variable, as in\n\n    make test jobs=8\n\nor similarly\n\n    make valtest jobs=8\n\nSo the test suite will run in 8 parallel jobs (assuming you have 8 CPU cores).\n\nThe streaming matching API is much more thoroughly excerised by the test suite of\nthe [ngx_replace_filter](https://github.com/agentzh/replace-filter-nginx-module) module.\n\n[Back to TOC](#table-of-contents)\n\nTODO\n====\n\n* implement the `(?i)` and `(?-i)` regex syntax.\n* implement a simplified version of the backreferences.\n* implement the comment notation `(?#comment)`.\n* implement the POSIX character class notation.\n* allow '\\0' be used in both the regex and the subject string.\n* add a bytecode optimizer to the regex VM (which also generates minimized DFAs for the Thompson VM).\n* add a JIT compiler for the Pike VM targeting x86_64.\n* port the existing x86_64 JIT compiler for the Thompson VM to other architectures like i386.\n* implement the generalized look-around assertions like `(?=pattern)`, `(?!pattern)`, `(?\u003c=pattern)`, and `(?\u003c!pattern)`.\n* implement the UTF-8, GBK, and Latin1 matching mode.\n\n[Back to TOC](#table-of-contents)\n\nAuthor\n======\n\nYichun \"agentzh\" Zhang (章亦春) \u003cagentzh@gmail.com\u003e, OpenResty Inc.\n\n[Back to TOC](#table-of-contents)\n\nCopyright and License\n=====================\n\nPart of this code is from the NGINX open source project: http://nginx.org/LICENSE\n\nThis library is licensed under the BSD license.\n\nCopyright (C) 2012-2017, by Yichun \"agentzh\" Zhang (章亦春), OpenResty Inc.\n\nCopyright (C) 2007-2009 Russ Cox, Google Inc. All rights reserved.\n\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\n* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\n\n* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\n* Neither the name of Google, Inc. nor the names of its contributors may be used to endorse or promote products derived from\nthis software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n\n[Back to TOC](#table-of-contents)\n\nSee Also\n========\n* Slides for my talk \"sregex: matching Perl 5 regexes on data streams\": http://agentzh.org/misc/slides/yapc-na-2013-sregex.pdf\n* The ngx_replace_filter module: https://github.com/agentzh/replace-filter-nginx-module\n* \"Implementing Regular Expressions\" http://swtch.com/~rsc/regexp/\n* The re1 project: http://code.google.com/p/re1/\n* The re2 project: http://code.google.com/p/re2/\n\n[Back to TOC](#table-of-contents)\n","funding_links":[],"categories":["Regular Expression","C (61)","正则表达式","Regex ##"],"sub_categories":["物理学","Web Frameworks ###"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenresty%2Fsregex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenresty%2Fsregex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenresty%2Fsregex/lists"}