{"id":24107603,"url":"https://github.com/bclehmann/regexcompiler","last_synced_at":"2026-05-11T13:06:59.533Z","repository":{"id":250100501,"uuid":"831243878","full_name":"bclehmann/RegexCompiler","owner":"bclehmann","description":"A simple LLVM-based compiler for regular expressions","archived":false,"fork":false,"pushed_at":"2024-07-27T02:49:42.000Z","size":67,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-10T22:42:10.471Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bclehmann.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-20T02:59:10.000Z","updated_at":"2024-07-25T15:05:50.000Z","dependencies_parsed_at":"2025-01-10T22:40:55.701Z","dependency_job_id":"0aec0688-f1a5-4589-a969-546d84b6b204","html_url":"https://github.com/bclehmann/RegexCompiler","commit_stats":null,"previous_names":["bclehmann/regexcompiler"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bclehmann%2FRegexCompiler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bclehmann%2FRegexCompiler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bclehmann%2FRegexCompiler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bclehmann%2FRegexCompiler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bclehmann","download_url":"https://codeload.github.com/bclehmann/RegexCompiler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241089058,"owners_count":19907677,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-10T22:39:46.799Z","updated_at":"2026-05-11T13:06:54.501Z","avatar_url":"https://github.com/bclehmann.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RegexCompiler\n\nThis is a compiler based on LLLVM (version 14.0.6, from June 2022) that compiles regular expressions, which gives a ~70% speedup over conventional regex tools, at the cost of requiring that they be precompiled.\nI have not tested this project on newer versions of LLVM.\n\nThe code isn't amazing, this project was mostly to get my toes into compilers again.\n\n## Building\n\nEnsure you have LLVM installed (or compile it yourself, though note that this takes several hours even on powerful hardware). If you go down the route of compiling LLVM yourself I highly recommend\nyou ensure that Ninja is your configured build system in cmake, it will make compilation significantly faster (though still quite slow).\n\nOn Linux it may be preinstalled or available easily within your package manager. On Windows I recommend going to https://github.com/vovkos/llvm-package-windows, as the LLVM github repo only provides\nLLVM-based tools like clang and binaries for the LLVM-C API, not the C++ API used by this project. I have not tried building this on MacOS but I don't anticipate any problems.\n\nOnce you have LLVM installed you should be able to run `cmake -B your_build_dir`, though on Windows you may need to set some environment variables to ensure that cmake can find your LLVM directory.\nThis cmake command will create build files for your configured build system (defaults to Makefiles on Linux and Visual Studio solutions on Windows). This project is known to compile with both MSVC\nand gcc.\n\ncmake is configured to set the following flags:\n- `-Wall -Wextra -Wpedantic` on non-MSVC compilers\n- `/W4` on MSVC\n\nI target zero warnings with `-Wall -Wextra -Wpedantic`, as I'm mostly developing this on Linux. MSVC seems to be stricter with these warning settings, I'm not particularly familiar with MSVC warning\nlevels.\n\nFurther, the `ENV` cmake variable (defaults to `DEBUG`) adds these flags based on its value:\n- `ENV=DEBUG` sets `-g` on most compilers and `/DEBUG` on MSVC\n- `ENV=RELEASE` sets `-O3` on most compilers and `/O2` on MSVC\n\nIf you're using Makefiles I strongly recommend passing `-jN` to run N separate jobs in parallel, significantly improving compile times. They're not too bad, but it's nearing 9 seconds singlethreaded,\nlikely due to all the LLVM header files that need to be included. This can be reduced to about 2.5 seconds. This should be handled for you if you use Ninja, and I don't know enough to comment on\nother build systems.\n\nNote that Ninja will try to use as many threads as you have, which can cause it to run out of memory and fail when building LLVM itself. You might want to set it to manually set `-jN` to a more\nconservative value if you run into issues. Fortunately you will be able to restart it from where it left off, but it's unfortunate that it needs some babysitting.\n\n## Use\n\nOnce you've built a binary, run `./RegexCompiler abc` to compile a regex. This will produce an `out.ll` LLVM IR file which can be compiled with `clang ./out.ll -x ir`, as well as any additional flags\nyou might want (e.g. `-O3` for optimization since the whole point of this project is to be faster than regex interpreters). This will produce an `a.out` or `a.exe` file that takes input through stdin.\nIt will return an exit code of 0 if the input matches the regex, and an exit code of 1 if it does not (and a different non-zero exit code in the case of error).\n\nThe LLVM IR can also be interpreted with `lli ./out.ll`. This is handy for tracking down bugs in codegen.\n\nCurrently the following metacharacters are supported:\n- `^` for start of input\n- `$` for end of input\n- `\\d` for any digit\n- A preceding `\\` for escaping metacharacters (and backslashes themselves)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbclehmann%2Fregexcompiler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbclehmann%2Fregexcompiler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbclehmann%2Fregexcompiler/lists"}