Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/doyensec/regexploit
Find regular expressions which are vulnerable to ReDoS (Regular Expression Denial of Service)
https://github.com/doyensec/regexploit
Last synced: 3 months ago
JSON representation
Find regular expressions which are vulnerable to ReDoS (Regular Expression Denial of Service)
- Host: GitHub
- URL: https://github.com/doyensec/regexploit
- Owner: doyensec
- License: apache-2.0
- Created: 2020-11-24T19:24:26.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2024-02-09T18:52:05.000Z (9 months ago)
- Last Synced: 2024-04-28T04:35:12.808Z (6 months ago)
- Language: Python
- Homepage:
- Size: 326 KB
- Stars: 763
- Watchers: 14
- Forks: 52
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-hacking-lists - doyensec/regexploit - Find regular expressions which are vulnerable to ReDoS (Regular Expression Denial of Service) (Python)
README
# Regexploit
![regexploit_logo](https://user-images.githubusercontent.com/6027823/110626827-7f46db80-81a1-11eb-9a3d-3e3376bd9a4f.png)
Find regexes which are vulnerable to Regular Expression Denial of Service (ReDoS).
**More info on [the Doyensec blog](https://blog.doyensec.com/2021/03/11/regexploit.html)**
Many default regular expression parsers have unbounded worst-case complexity. Regex matching may be quick when presented with a matching input string. However, certain non-matching input strings can make the regular expression matcher go into crazy backtracking loops and take ages to process. This can cause denial of service, as the CPU will be stuck trying to match the regex.
This tool is designed to:
* find regular expressions which are vulnerable to ReDoS
* give an example malicious string which will cause catastrophic backtracking## Worst-case complexity
This reflects the complexity of the regular expression matcher's backtracking procedure with respect to the length of the entered string.
Cubic complexity here means that if the vulnerable part of the string is doubled in length, the execution time should be about 8 times longer (2^3).
For exponential ReDoS with starred stars e.g. `(a*)*$` a fudge factor is used and the complexity will be greater than 10.For explotability, cubic complexity or higher is typically required unless truly giant strings are allowed as input.
## Example
Run `regexploit` and enter the regular expression `v\w*_\w*_\w*$` at the command line.
```
$ regexploit
v\w*_\w*_\w*$
Pattern: v\w*_\w*_\w*$
---
Worst-case complexity: 3 ⭐⭐⭐ (cubic)
Repeated character: [5f:_]
Final character to cause backtracking: [^WORD]
Example: 'v' + '_' * 3456 + '!'
```The part `\w*_\w*_\w*` contains three overlapping repeating groups (\w matches letters, digits *and underscores*). As showed in the line `Repeated character: [5f:_]`, a long string of `_` (0x5f) will match this section in many different ways. The worst-case complexity is 3 as there are 3 infinitely repeating groups. An example to cause ReDoS is given: it consists of the required prefix `v`, a long string of `_` and then a `!` (non-word character) to cause backtracking. Not all ReDoSes require a particular character at the end, but in this case, a long string of `_` will match the regex successfully and won't backtrack. The line `Final character to cause backtracking: [^WORD]` shows that a non-matching character (not a word character) is required at the end to prevent matching and cause ReDoS.
As another example, install a module version vulnerable to ReDoS such as `pip install ua-parser==0.9.0`.
To scan the installed python modules run `regexploit-python-env`.```
Importing ua_parser.user_agent_parser
Vulnerable regex in /somewhere/.env/lib/python3.9/site-packages/ua_parser/user_agent_parser.py #183
Pattern: \bSmartWatch *\( *([^;]+) *; *([^;]+) *;
Context: self.user_agent_re = re.compile(self.pattern)
---
Worst-case complexity: 3 ⭐⭐⭐
Repeated character: [20]
Example: 'SmartWatch(' + ' ' * 3456Worst-case complexity: 3 ⭐⭐⭐
Repeated character: [20]
Example: 'SmartWatch(0;' + ' ' * 3456Vulnerable regex in /somewhere/.env/lib/python3.9/site-packages/ua_parser/user_agent_parser.py #183
Pattern: ; *([^;/]+) Build[/ ]Huawei(MT1-U06|[A-Z]+\d+[^\);]+)[^\);]*\)
Context: self.user_agent_re = re.compile(self.pattern)
---
Worst-case complexity: 3 ⭐⭐⭐
Repeated character: [[0-9]]
Example: ';0 Build/HuaweiA' + '0' * 3456
...
```For each vulnerable regular expression it prints one or more malicious string to trigger ReDoS. Setting your user agent to `;0 Build/HuaweiA000000000000000...` and browsing a website using an old version of ua-parser may cause the server to take a long time to process your request, probably ending in status 502.
# Installation
Python 3.8+ is required. To extract regexes from JavaScript / TypeScript code, NodeJS 12+ is also required.
Optionally make a virtual environment
```bash
python3 -m venv .env
source .env/bin/activate
```Now actually install with pip
```
pip install regexploit
```# Usage
## Regexploit with a list of regexes
Enter regular expressions via stdin (one per line) into `regexploit`.
```bash
regexploit
```or via a file
```bash
cat myregexes.txt | regexploit
```## Extract regexes automatically
There is built-in support for parsing regexes out of Python, JavaScript, TypeScript, C#, YAML and JSON.
### Python codeParses Python code (without executing it) via the AST to find regexes. The regexes are then analysed for ReDoS.
```bash
regexploit-py my-project/
regexploit-py "my-project/**/*.py" --glob
```
### Javascript / TypescriptThis will use the bundled NodeJS package in `regexploit/bin/javascript` which parses your JavaScript as an AST with [eslint](https://github.com/typescript-eslint/typescript-eslint/tree/master/packages/parser) and prints out all regexes.
Those regexes are fed into the python ReDoS finder.
```bash
regexploit-js my-module/my-file.js another/file.js some/folder/
regexploit-js "my-project/node_modules/**/*.js" --glob
```N.B. there are differences between javascript and python regex parsing so there may be some errors. I'm [not sure I want](https://hackernoon.com/the-madness-of-parsing-real-world-javascript-regexps-d9ee336df983) to write a JS regex AST!
### Python imports
Search for regexes in all the python modules currently installed in your path / env. This means you can `pip install` whatever modules you are interested in and they will be analysed. Cpython code is included.
```bash
regexploit-python-env
```N.B. this doesn't parse the python code to an AST and will only find regexes compiled automatically on module import. Modules are actually imported, **so code in the modules will be executed**. This is helpful for finding regexes which are built up from smaller strings on load e.g. [CVE-2021-25292 in Pillow](https://github.com/python-pillow/Pillow/commit/3bce145966374dd39ce58a6fc0083f8d1890719c)
### JSON / YAML
Yaml support requires pyyaml, which can be installed with `pip install regexploit[yaml]`.
```bash
regexploit-json *.json
regexploit-yaml *.yaml
```
### C# (.NET)```bash
regexploit-csharp something.cs
```
# :trophy: Bugs reported :trophy:* [CVE-2020-5243: uap-core](https://github.com/ua-parser/uap-core/security/advisories/GHSA-cmcx-xhr8-3w9p) affecting uap-python, [uap-ruby](https://github.com/ua-parser/uap-ruby/security/advisories/GHSA-pcqq-5962-hvcw), etc. (User-Agent header parsing)
* [CVE-2020-8492: cpython's urllib.request](https://github.com/python/cpython/commit/0b297d4ff1c0e4480ad33acae793fbaf4bf015b4) (WWW-Authenticate header parsing)
* [CVE-2021-21236: CairoSVG](https://github.com/advisories/GHSA-hq37-853p-g5cf) (SVG parsing)
* [CVE-2021-21240: httplib2](https://github.com/httplib2/httplib2/security/advisories/GHSA-93xj-8mrv-444m) (WWW-Authenticate header parsing)
* [CVE-2021-25292: python-pillow](https://github.com/python-pillow/Pillow/commit/3bce145966374dd39ce58a6fc0083f8d1890719c) (PDF parsing)
* [CVE-2021-26813: python-markdown2](https://github.com/trentm/python-markdown2/pull/387) (Markdown parsing)
* [CVE-2021-27290: npm/ssri](https://doyensec.com/resources/Doyensec_Advisory_ssri_redos.pdf) (SRI parsing)
* [CVE-2021-27291: pygments](https://github.com/pygments/pygments/commit/2e7e8c4a7b318f4032493773732754e418279a14) lexers for ADL, CADL, Ceylon, Evoque, Factor, Logos, Matlab, Octave, ODIN, Scilab & Varnish VCL (Syntax highlighting)
* [CVE-2021-27292: ua-parser-js](https://github.com/faisalman/ua-parser-js/commit/809439e20e273ce0d25c1d04e111dcf6011eb566) (User-Agent header parsing)
* [CVE-2021-27293: RestSharp](https://github.com/restsharp/RestSharp/issues/1556) (JSON deserialisation in a .NET C# package)
* [bpo-38804: cpython's http.cookiejar](https://github.com/python/cpython/pull/17157) (Set-Cookie header parsing)
* [SimpleCrawler (archived)](https://doyensec.com/resources/Doyensec_Advisory_simplecrawler_redos.pdf) (HTML parsing)
* [CVE-2021-28092: is-svg](https://github.com/sindresorhus/is-svg/commit/01f8a087fab8a69c3ac9085fbb16035907ab6a5b) (SVG parsing)
* [nuget.org, NuGetGallery](https://github.com/NuGet/NuGetGallery/commit/25d2d3b32b2d9f0b1ca6e0a105b0210c2c4820f4) and [NuGet.Client](https://github.com/NuGet/NuGet.Client/commit/a0671e946ce71dc59def5cc8a67c6457d66f33bf) (Parsing NuGet package IDs)
* [markdown (python)](https://github.com/Python-Markdown/markdown/pull/1130) (Markdown parsing)
* [ansi-html (nodejs)](https://github.com/Tjatse/ansi-html/issues/19) (ANSI parsing)
* Plus unpublished bugs in a handful of pypi, npm, ruby and nuget packages## Credits
This tool has been created by Ben Caller of [Doyensec LLC](https://www.doyensec.com) during research time.
![alt text](https://doyensec.com/images/logo.svg "Doyensec Logo")