{"id":13440665,"url":"https://github.com/laurikari/tre","last_synced_at":"2025-10-21T04:57:08.278Z","repository":{"id":4740127,"uuid":"5889454","full_name":"laurikari/tre","owner":"laurikari","description":"The approximate regex matching library and agrep command line tool.","archived":false,"fork":false,"pushed_at":"2025-07-28T20:46:51.000Z","size":658,"stargazers_count":861,"open_issues_count":42,"forks_count":140,"subscribers_count":36,"default_branch":"master","last_synced_at":"2025-10-21T04:57:03.408Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/laurikari.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog.old","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2012-09-20T16:25:28.000Z","updated_at":"2025-10-16T17:28:56.000Z","dependencies_parsed_at":"2023-07-06T01:52:36.655Z","dependency_job_id":"9422239a-c442-44db-8a60-b486e23c7e01","html_url":"https://github.com/laurikari/tre","commit_stats":{"total_commits":128,"total_committers":13,"mean_commits":9.846153846153847,"dds":0.21875,"last_synced_commit":"6092368aabdd0dbb0fbceb2766a37b98e0ff6911"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/laurikari/tre","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laurikari%2Ftre","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laurikari%2Ftre/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laurikari%2Ftre/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laurikari%2Ftre/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/laurikari","download_url":"https://codeload.github.com/laurikari/tre/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laurikari%2Ftre/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280207211,"owners_count":26290616,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-21T02:00:06.614Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:01:24.902Z","updated_at":"2025-10-21T04:57:08.247Z","avatar_url":"https://github.com/laurikari.png","language":"C","readme":"## Looking for maintainers!\n\nThis project is looking for maintainers.  For starters, there are a few\npull requests waiting for review.\n\nLet me know at ville@laurikari.net if you wish to step up!\n\nIntroduction\n============\n\nTRE is a lightweight, robust, and efficient POSIX compliant regexp\nmatching library with some exciting features such as approximate\n(fuzzy) matching.\n\nThe matching algorithm used in TRE uses linear worst-case time in\nthe length of the text being searched, and quadratic worst-case\ntime in the length of the used regular expression.\n\nIn other words, the time complexity of the algorithm is O(M^2N), where\nM is the length of the regular expression and N is the length of the\ntext.  The used space is also quadratic on the length of the regex,\nbut does not depend on the searched string.  This quadratic behaviour\noccurs only on pathological cases which are probably very rare in\npractice.\n\n\nHacking\n=======\n\nHere's how to work with this code.\n\nPrerequisites\n-------------\n\nYou will need the following tools installed on your system:\n\n  - autoconf\n  - automake\n  - gettext (including autopoint)\n  - libtool\n  - zip (optional)\n\n\nBuilding\n--------\n\nFirst, prepare the tree.  Change to the root of the source directory\nand run\n\n    ./utils/autogen.sh\n\nThis will regenerate various things using the prerequisite tools so\nthat you end up with a buildable tree.\n\nAfter this, you can run the configure script and build TRE as usual:\n\n    ./configure\n    make\n    make check\n    make install\n\n\nBuilding a source code package\n------------------------------\n\nIn a prepared tree, this command creates a source code tarball:\n\n    ./configure \u0026\u0026 make dist\n\nAlternatively, you can run\n\n    ./utils/build-sources.sh\n\nwhich builds the source code packages and puts them in the `dist`\nsubdirectory.  This script needs a working `zip` command.\n\n\nFeatures\n========\n\nTRE is not just yet another regexp matcher. TRE has some features\nwhich are not there in most free POSIX compatible implementations.\nMost of these features are not present in non-free implementations\neither, for that matter.\n\nApproximate matching\n--------------------\n\nApproximate pattern matching allows matches to be approximate, that\nis, allows the matches to be close to the searched pattern under some\nmeasure of closeness.  TRE uses the edit-distance measure (also known\nas the Levenshtein distance) where characters can be inserted,\ndeleted, or substituted in the searched text in order to get an exact\nmatch.\n\nEach insertion, deletion, or substitution adds the distance, or cost,\nof the match.  TRE can report the matches which have a cost lower than\nsome given threshold value.  TRE can also be used to search for\nmatches with the lowest cost.\n\nTRE includes a version of the agrep (approximate grep) command line\ntool for approximate regexp matching in the style of grep. Unlike\nother agrep implementations (like the one by Sun Wu and Udi Manber\nfrom University of Arizona) TRE agrep allows full regexps of any\nlength, any number of errors, and non-uniform costs for insertion,\ndeletion and substitution.\n\nStrict standard conformance\n---------------------------\n\nPOSIX defines the behaviour of regexp functions precisely.  TRE\nattempts to conform to these specifications as strictly as possible.\nTRE always returns the correct matches for subpatterns, for example.\nVery few other implementations do this correctly.  In fact, the only\nother implementations besides TRE that I am aware of (free or not)\nthat get it right are Rx by Tom Lord, Regex++ by John Maddock, and the\nAT\u0026T ast regex by Glenn Fowler and Doug McIlroy.\n\nThe standard TRE tries to conform to is the IEEE Std 1003.1-2001, or\nOpen Group Base Specifications Issue 6, commonly referred to as\n“POSIX”.  The relevant parts are the base specifications on regular\nexpressions (and the rationale) and the description of the `regcomp()`\nAPI.\n\nFor an excellent survey on POSIX regexp matchers, see the testregex\npages by Glenn Fowler of AT\u0026T Labs Research.\n\nPredictable matching speed\n--------------------------\n\nBecause of the matching algorithm used in TRE, the maximum time\nconsumed by any `regexec()` call is always directly proportional to\nthe length of the searched string. There is one exception: if back\nreferences are used, the matching may take time that grows\nexponentially with the length of the string.  This is because matching\nback references is an NP complete problem, and almost certainly\nrequires exponential time to match in the worst case.\n\nPredictable and modest memory consumption\n-----------------------------------------\n\nA `regexec()` call never allocates memory from the heap. TRE allocates\nall the memory it needs during a `regcomp()` call, and some temporary\nworking space from the stack frame for the duration of the `regexec()`\ncall.  The amount of temporary space needed is constant during\nmatching and does not depend on the searched string. For regexps of\nreasonable size TRE needs less than 50K of dynamically allocated\nmemory during the `regcomp()` call, less than 20K for the compiled\npattern buffer, and less than two kilobytes of temporary working space\nfrom the stack frame during a `regexec()` call.  There is no time /\nmemory tradeoff. TRE is also small in code size; statically linking\nwith TRE increases the executable size less than 30K (gcc-3.2, x86,\nGNU/Linux).\n\nWide character and multibyte character set support\n--------------------------------------------------\n\nTRE supports multibyte character sets.  This makes it possible to use\nregexps seamlessly with, for example, Japanese locales.  TRE also\nprovides a wide character API.\n\nBinary pattern and data support\n-------------------------------\n\nTRE provides APIs which allow binary zero characters both in regexps\nand searched strings.  The standard API cannot be easily used to, for\nexample, search for printable words from binary data (although it is\npossible with some hacking).  Searching for patterns which contain\nbinary zeroes embedded is not possible at all with the standard API.\n\nCompletely thread safe\n----------------------\n\nTRE is completely thread safe.  All the exported functions are\nre-entrant, and a single compiled regexp object can be used\nsimultaneously in multiple contexts; e.g. in `main()` and a signal\nhandler, or in many threads of a multithreaded application.\n\nPortable\n--------\n\nTRE is portable across multiple platforms.  Below is a table of\nplatforms and compilers used to develop and test TRE:\n\n\u003ctable\u003e\n\t\u003ctr\u003e\u003cth\u003ePlatform\u003c/th\u003e\t\t\t\t\u003cth\u003eCompiler\u003c/th\u003e\u003c/tr\u003e\n\t\u003ctr\u003e\u003ctd\u003eFreeBSD 14.1\u003c/td\u003e\t\t\t\u003ctd\u003eClang 18\u003c/td\u003e\u003c/tr\u003e\n\t\u003ctr\u003e\u003ctd\u003eUbuntu 22.04\u003c/td\u003e\t\t\t\u003ctd\u003eGCC 11\u003c/td\u003e\u003c/tr\u003e\n\t\u003ctr\u003e\u003ctd\u003emacOS 14.6\u003c/td\u003e\t\t\t\t\u003ctd\u003eClang 14\u003c/td\u003e\u003c/tr\u003e\n\t\u003ctr\u003e\u003ctd\u003eWindows 11\u003c/td\u003e\t\t\t\t\u003ctd\u003eMicrosoft Visual Studio 2022\u003c/td\u003e\u003c/tr\u003e\n\u003c/table\u003e\n\nTRE should compile without changes on most modern POSIX-like\nplatforms, and be easily portable to any platform with a hosted C\nimplementation.\n\nDepending on the platform, you may need to install libutf8 to get\nwide character and multibyte character set support.\n\nFree\n----\n\nTRE is released under a license which is essentially the same as the\n“2 clause” BSD-style license used in NetBSD.  See the file LICENSE for\ndetails.\n\nRoadmap\n-------\n\nThere are currently two features, both related to collating elements,\nmissing from 100% POSIX compliance.  These are:\n\n* Support for collating elements (e.g. `[[.\\\u003cX\u003e.]]`, where `\\\u003cX\u003e` is a\n  collating element).  It is not possible to support multi-character\n  collating elements portably, since POSIX does not define a way to\n  determine whether a character sequence is a multi-character\n  collating element or not.\n\n* Support for equivalence classes, for example `[[=\\\u003cX\u003e=]]`, where\n  `\\\u003cX\u003e` is a collating element.  An equivalence class matches any\n  character which has the same primary collation weight as `\\\u003cX\u003e`.\n  Again, POSIX provides no portable mechanism for determining the\n  primary collation weight of a collating element.\n\nNote that other portable regexp implementations don't support\ncollating elements either.  The single exception is Regex++, which\ncomes with its own database for collating elements for different\nlocales.  Support for collating elements and equivalence classes has\nnot been widely requested and is not very high on the TODO list at the\nmoment.\n\nThese are other features I'm planning to implement real soon now:\n\n* All the missing GNU extensions enabled in GNU regex, such as\n  `[[:\u003c:]]` and `[[:\u003e:]]`.\n\n* A `REG_SHORTEST` `regexec()` flag for returning the shortest match\n  instead of the longest match.\n\n* Perl-compatible syntax:\n  * `[:^class:]`\n    Matches anything but the characters in class.  Note that\n    `[^[:class:]]` works already, this would be just a convenience\n    shorthand.\n\n  * `\\A`\n    Match only at beginning of string.\n\n  * `\\Z`\n    Match only at end of string, or before newline at the end.\n\n  * `\\z`\n    Match only at end of string.\n\n  * `\\l`\n    Lowercase next char (think vi).\n\n  * `\\u`\n    Uppercase next char (think vi).\n\n  * `\\L`\n    Lowercase till `\\E` (think vi).\n\n  * `\\U`\n    Uppercase till `\\E` (think vi).\n\n  * `(?=pattern)`\n    Zero-width positive look-ahead assertions.\n\n  * `(?!pattern)`\n    Zero-width negative look-ahead assertions.\n\n  * `(?\u003c=pattern)`\n    Zero-width positive look-behind assertions.\n\n  * `(?\u003c!pattern)`\n    Zero-width negative look-behind assertions.\n\nDocumentation especially for the nonstandard features of TRE, such as\napproximate matching, is a work in progress (with “progress” loosely\ndefined...)  If you want to find an extension to use, reading the\n`include/tre/tre.h` header might provide some additional hints if you\nare comfortable with C source code.\n","funding_links":[],"categories":["Regular Expression","C"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaurikari%2Ftre","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flaurikari%2Ftre","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaurikari%2Ftre/lists"}