Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mariomka/regex-benchmark

It's just a simple regex benchmark of different programming languages.
https://github.com/mariomka/regex-benchmark

bench benchmark languages regex regexp

Last synced: 4 days ago
JSON representation

It's just a simple regex benchmark of different programming languages.

Awesome Lists containing this project

README

        

# Languages Regex Benchmark

It's just a simple regex benchmark for different programming languages.

Measures how long it takes to find and count non-overlapping occurrences with **default settings**.

> All benchmarks are wrong, but some are useful - [Szilard](https://github.com/szilard), [benchm-ml](https://github.com/szilard/benchm-ml)

I hope this benchmark can be helpful, but it's not only about performance, but each language also has its engine and offers different features (like UTF support, backreferences, capturing groups ...)

## Input text

The [input text](input-text.txt) is a concatenation of [Learn X in Y minutes](https://github.com/adambard/learnxinyminutes-docs) repository.

*Maybe isn't the best representative text. I'm searching other texts to add to the benchmark.*

## Regex patterns

- Email: ``[\w\.+-]+@[\w\.-]+\.[\w\.-]+``
- URI: ``[\w]+://[^/\s?#]+[^\s?#]+(?:\?[^\s#]*)?(?:#[^\s]*)?``
- IPv4: ``(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9])``

The above regex patterns aren't the best or the optimal. The focus is the benchmark, not the matching.

The patterns are applied to the whole file.

## Measure

Measuring is done inside the programs to avoid include startup, reading and writing times on results.

Elapsed time include pattern compilation, find and count occurrences.

## Performance

Docker image was run on: MacBook Pro (16-inch, 2019), 2.4 GHz Intel Core i9, 32 GB 2667 Mhz DDR4 with macOS Big Sur 11.2.3.

Language | Email(ms) | URI(ms) | IP(ms) | Total(ms)
--- | ---: | ---: | ---: | ---:
**Nim Regex** | 1.32 | 26.92 | 7.84 | 36.09
**Nim** | 22.70 | 21.49 | 6.75 | 50.94
**Rust** | 26.66 | 25.70 | 5.28 | 57.63
**PHP** | 42.87 | 46.30 | 5.17 | 94.33
**C++ Boost** | 44.97 | 44.13 | 15.13 | 104.23
**Javascript** | 59.00 | 47.23 | 1.50 | 107.73
**Perl** | 94.92 | 63.96 | 20.37 | 179.25
**Julia** | 104.58 | 86.55 | 5.01 | 196.14
**C PCRE2** | 126.10 | 112.17 | 13.10 | 251.37
**Crystal** | 128.19 | 112.70 | 13.18 | 254.07
**C# .Net Core** | 115.05 | 106.05 | 42.71 | 263.81
**Dart** | 104.10 | 107.64 | 76.51 | 288.25
**D ldc** | 165.46 | 165.20 | 4.85 | 335.51
**D dmd** | 187.94 | 189.92 | 5.32 | 383.18
**Ruby** | 233.88 | 208.85 | 43.14 | 485.86
**Python PyPy2** | 158.34 | 139.70 | 253.77 | 551.81
**Dart Native** | 278.54 | 307.54 | 5.77 | 591.85
**Python 2** | 197.92 | 131.74 | 294.42 | 624.08
**Kotlin** | 186.20 | 223.05 | 287.49 | 696.74
**Java** | 198.33 | 221.87 | 287.81 | 708.01
**Python PyPy3** | 258.78 | 221.89 | 257.35 | 738.03
**Python 3** | 273.86 | 190.79 | 319.13 | 783.78
**Go** | 248.14 | 241.28 | 360.90 | 850.32
**C++ STL** | 433.09 | 344.74 | 245.66 | 1023.49
**C# Mono** | 2859.05 | 2431.87 | 145.82 | 5436.75

### Optimized

> The following results are for the [optimized version](https://github.com/mariomka/regex-benchmark/tree/optimized).

Language | Email(ms) | URI(ms) | IP(ms) | Total(ms)
--- | ---: | ---: | ---: | ---:
**Rust** | 11.43 | 11.40 | 5.11 | 27.94
**Nim Regex** | 1.37 | 25.51 | 7.27 | 34.15
**Nim** | 22.79 | 21.64 | 6.77 | 51.21
**C PCRE2** | 46.22 | 36.92 | 4.73 | 87.87
**PHP** | 43.18 | 46.71 | 5.23 | 95.12
**C++ Boost** | 44.68 | 44.50 | 15.10 | 104.28
**Javascript** | 59.20 | 47.67 | 1.61 | 108.48
**C# .Net Core** | 61.76 | 47.86 | 11.63 | 121.25
**Perl** | 96.00 | 63.39 | 20.59 | 179.99
**Julia** | 104.31 | 87.98 | 5.16 | 197.45
**Crystal** | 129.52 | 116.33 | 13.12 | 258.97
**Dart** | 105.82 | 107.78 | 78.18 | 291.78
**D ldc** | 167.60 | 165.71 | 5.07 | 338.37
**D dmd** | 187.66 | 192.16 | 5.55 | 385.37
**Ruby** | 236.93 | 206.51 | 43.70 | 487.14
**Python PyPy2** | 161.33 | 143.56 | 258.06 | 562.96
**Dart Native** | 273.17 | 306.14 | 5.89 | 585.20
**Python 2** | 200.54 | 132.89 | 290.26 | 623.69
**Kotlin** | 184.13 | 220.31 | 273.76 | 678.21
**Java** | 190.74 | 223.77 | 275.24 | 689.75
**Python PyPy3** | 268.41 | 226.74 | 261.17 | 756.32
**Python 3** | 273.70 | 194.09 | 322.09 | 789.88
**Go** | 244.14 | 238.40 | 365.27 | 847.81
**C++ STL** | 433.18 | 341.07 | 246.85 | 1021.10
**C# Mono** | 1400.04 | 1189.50 | 145.73 | 2735.28

- **Language**: Indicates the language.
- **Email(ms)**, **URI(ms)**, **IP(ms)**: Indicates the time elapsed in milliseconds for finding and counting non-overlapping occurrences for the pattern.
- **Total(ms)**: Indicates the sum of the above times.

### Versions and notes

- **C**: gcc 7.5.0 & PCRE2 10.36-2
- **Crystal**: crystal 0.35.1 - LLVM: 8.0.0
- **C++**: g++ 7.5.0 | Boost 1.65.1.0
- **C#**: dotnet 5.0.201 | Mono 6.12.0.122
- **D**: DMD v2.089.0 | LDC 1.8.0
- **Dart**: Dart 2.12.2
- **Go**: go 1.16.2
- **Java**: OpenJDK 11.0.10
- **Javascript**: node v15.13.0
- **Julia**: Julia 1.6.0
- **Kotlin**: kotlinc-jvm 1.4.32
- **Nim**: Nim 1.4.4
- **Perl**: perl v5.26.1
- **PHP**: PHP 8.0.3
- **Python**: Python 2.7.17 | Python 3.6.9 | PyPy 7.3.3
- **Ruby**: ruby 2.5.1p57
- **Rust**: rustc 1.51.0 & regex 1.4.5

# How to run

The easiest way to run the benchmark is by using Docker.

```sh
git clone https://github.com/mariomka/regex-benchmark.git
cd regex-benchmark
docker run --rm -v $(pwd):/var/regex mariomka/regex-benchmark:1.6
```

# Contributing

All contributions are welcome, from tiny optimizations to new implementations.

There are only a few requirements:
- Follow the style of the current implementations
- Use the default settings for the regex engine
- Update `Dockerfile` if it's necessary

# Kudos

- Heng Li's for his work on [Benchmark of Regex Libraries](http://lh3lh3.users.sourceforge.net/reb.shtml).
- A "challenge" on [Madrid Devs](http://madriddevs.org/) group inspired me.
- [Programming subreddit](https://www.reddit.com/r/programming/), helped me to improve the benchmark.

# License

MIT © [Mario Juárez](https://github.com/mariomka).