Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rarecoil/pantagrule

large hashcat rulesets generated from real-world compromised passwords
https://github.com/rarecoil/pantagrule

hashcat hashcat-rules password-cracking passwords security-research

Last synced: 1 day ago
JSON representation

large hashcat rulesets generated from real-world compromised passwords

Awesome Lists containing this project

README

        

# Pantagrule
### gargantuan hashcat rulesets generated from compromised passwords

> **Project maintenance warning**: This project is deemed **completed**. No pull requests or changes will be made to this project in the future unless they are actual bugs or migrations to allow these rules to work with newer versions of hashcat.

Pantagrule is a series of rules for the [hashcat](https://hashcat.net/hashcat/) password cracker generated from large amounts of real-world password compromise data. While Pantagrule rule files can be large, the rules are both tunable and perform better than many existing rule sets.

Pantagrule was generated using [PACK](https://github.com/iphelix/pack/blob/master/rulegen.py)'s Levenshtein Reverse Path algorithm for automated rule generation (Kacherginsky, 2013). PACK's output was then sorted based upon the number of times PACK generated the rule to make the base ruleset. This process is similar to the rules generated by [_NSAKEY](https://github.com/NSAKEY/nsa-rules) for password cracking competitions in 2014 (_NSAKEY, 2014), however, Pantagrule was generated off a significantly larger set of passwords.
Version 2 of Pantagrule was developed off of the publicly-available [hashes.org](https://hashes.org) "founds" corpus, a best-in-class public wordlist. This yields more transparent results than the original variant, which used a proprietary corpus containing 842,643,513 unique passwords.

When such large rulesets are fed through PACK, millions of rules result. However, since most of the rules generated appear only a handful of times, most of the useful rules are the ones that are most commonly generated by the algorithm. This repository contains a subset of rules generated by PACK whilst iterating through the existing corpus.

### Optimised variants

In order to generate a second-pass optimisation of the rules against real-world data, the top one million generated rules was run against the Pwned Passwords NTLM list using the rockyou wordlist. Any rule that cracked a password was added to its own list and poorer-performing rules were discarded.

Four optimisation types were created:

* `popular.rule`: pantagrule.1m run against the top 25,000,000 passwords of the HIBP set.
* `random.rule`: pantagrule.1m run against 25,000,000 randomly selected passwords from the HIBP set.
* `hybrid.rule`: A sorted list of a combination of the most successful `popular` and `random` rules, then cut in half, in an attempt to make a lighter, "balanced" ruleset that works across a larger sample set.
* `one.rule`: A version of _OneRuleToRuleThemAll_ in which the top performing `hybrid` rules are appended, and the list is truncated to the size of the `dive` rule set. Interestingly, there is only a couple-thousand-rule overlap with _OneRuleToRuleThemAll_ and the Pantagrule rules, making the two strategies complementary. Pantagrule's `one` performs better than other known lists of this size, and it is recommended that you start with this ruleset before attempting one of the larger variants.

#### Pantagrule hashorg.v6

After the success of these large rulesets, an attempt was made of the inverse of the royce variant, in which the original Pantagrule methodology was used but both sets of data were different. Pantagrule now uses the public hashes.org "founds" list as its wordlist base for rule generation, and an optimisation pass was then made against the *V6* NTLM list from Have I Been Pwned. Given the fully-public nature of the data used, it also allows the publication of raw reproducibility data, including `pantagrule.v2.1m.rule`, which are the top one million rules generated by this methodology. The data for V5 and V6 is the same for the top 25 million passwords.

For this version, the way `one` is generated has changed. To generate `one`, the full 1 million list was appended to `OneRuleToRuleThemAll.rule` and then the entire set calibrated on Pwned V6, vs. just appending rules and truncating.

Naming conventions for the rules have now changed to be of the format `pantagrule.${corpus}.${trainingversion}.${extension}`. This makes it easier to understand what the rule was optimised for. For example, for `pantagrule.hashorg.v6.random`, We used the `random` methodology with hashes.org as the basis for the rule generation, optimised on Pwned Passwords V6.

#### Original rules

Original rules were trained using the proprietary wordlist alongside the Pwned Passwords NTLM v5 set using `rockyou.txt` as a base. Since the "training data" and the validation data are the same, it would make sense to see them optimised for the V5 dataset.

#### The `royce` variants

[Upon request](https://github.com/rarecoil/pantagrule/issues/1) of hashcat contributor [Royce Williams](https://github.com/roycewilliams), optimisations of the top one million rules were also run with the [hashes.org founds list](https://github.com/rarecoil/hashes.org-list). This is due to the HIBP corpus being relatively dirty, and the hashes.org founds list being likely to yield a more practical ruleset for real-world cracking. These have been added as the `royce` variants. The `royce` optimisations appear to consist of marginally fewer rules overall, and `random.royce` is substantially more effective on a long tail of passwords than the original `random`. Performance did not increase over the existing rules on some variants, but given that the training and validation data of the original Pantagrule are both from the _Pwned Passwords_ dataset, this does not seem surprising. Pantagrule `royce` variants exist in the `rules/royce` folder.

## Performance vs. other commonly-used rules

In order to test any successes of the Pantagrule strategy against other rulesets, we will run validation data across the top 25 million passwords of Pwned Passwords V5 and the top 100 million passwords of Pwned Passwords V5 to get an understanding of rule effectiveness at cracking the "long tail" with each ruleset. The canonical `rockyou.txt` will be our dictionary and our baseline.

Original variant generation was done on an 8x 1070Ti rig running hashcat v5.1.0. The `royce` Pantagrule variants were created on a [4x Radeon VII rig](https://gist.github.com/rarecoil/54340280d81528dcb024ef5df2535c86) running hashcat git build `v5.1.0-1774-gf96594ef`. The hashorg.v6 variants were created and validated (very slowly) on a single [NVIDIA Tesla M4](https://www.techpowerup.com/gpu-specs/tesla-m4.c2770), a single 1070Ti, and hashcat `v6.1.0`.

In order to note rule performance against very common passwords, 0-25M is broken out into its own column. The _RPP_ column is the _rules per percent_ on the 100M dataset. This is calculated by using the formula `rpp = Math.round(num_rules / (0_100m_percent - 6.450))`. The higher this number, the more rules are run per percentage cracked. This helps realise the diminishing returns in rulesets and gives an idea of the amplified cost of running the rules on slower hashes.

| Rules | Number of Rules | V5 25M | V5 100M | RPP |
|----------------|-----------------|-----------|--------|-----|
| No Rules (just rockyou.txt) | 0 | 16.549% | 6.450% | N/A |
| pantagrule.private.v5.one | 99,092 | 79.814% |69.417% | 1,574 |
| pantagrule.private.v5.hybrid | 355,205 | 81.346% | 73.372% | 5,308 |
| pantagrule.private.v5.popular | 478,736 | **81.792%** | 73.544% | 7,135 |
| pantagrule.private.v5.random | 616,236 | 81.687% | 69.805% | 8,828 |
| pantagrule.hashorg.v6.one | 99,092 | 74.500% | 60.573% | 1,831 |
| pantagrule.hashorg.v6.hybrid | 339,953 | 77.649% | 68.341% | 5,493 |
| pantagrule.hashorg.v6.popular | 514,416 | 80.668% | 72.377% | 6,931 |
| pantagrule.hashorg.v6.random | 638,773 | 80.603% | 72.713% | 8,614 |
| pantagrule.private.hashorg.one.royce | 99,092 | 79.618% | 69.092% | 1,582 |
| pantagrule.private.hashorg.hybrid.royce | 314,268 | 81.068% | 73.082% | 4,716 |
| pantagrule.private.hashorg.popular.royce | 420,984 | 81.386% | 73.102% | 6,316 |
| pantagrule.private.hashorg.random.royce | 592,235 | 81.659% | **74.010%** | 8,766 |
| [best64](https://github.com/hashcat/hashcat/blob/master/rules/best64.rule) | 64 | 45.117% |24.985% | 3 |
| [hob064](https://github.com/praetorian-code/Hob0Rules) | 68 | 37.786% | 19.773% | 5 |
| [OneRuleToRuleThemAll](https://github.com/NotSoSecure/password_cracking_rules) | 52,014 | 78.058% | 64.541% | 895 |
| [d3adhob0](https://github.com/praetorian-code/Hob0Rules) | 57,548 | 51.274% | 34.800% | 2,030 |
| [dive](https://github.com/hashcat/hashcat/blob/master/rules/dive.rule) | 99,092 | 77.111% | 63.314% | 1,743 |
| [_NSAKEY V1](https://github.com/NSAKEY/nsa-rules/blob/master/_NSAKEY.v1.dive.rule) | 123,289 | 76.42% | 64.121% | 2,138 |
| [_NSAKEY V2](https://github.com/NSAKEY/nsa-rules/blob/master/_NSAKEY.v2.dive.rule) | 123,289 | 76.882% | 64.472% | 2,124 |

## Conclusion

This work confirms the limitations of the PACK LRP algorithm originally witnessed by _NSAKEY on modern data sets when using the rockyou dictionary. While the LRP algorithm does generate rules that increase cracking percentage, it does so at a large increase in search space. For this reason, Pantagrule is most useful in cases where difficult cracking requires exotic rules.

It is important to note that if you can use PACK to generate rules based off of a specific corpus and then target your remaining hashes with it, you are likely to yield a greater cracking percentage than using one of these large rulesets. For example, Pantagrule V2 does not perform as well on PPv5 as the v5-calibrated ruleset.

Since the original Pantagrule release, these rules have proven themselves on multiple red team engagements at large technology companies and consultancies alike. The original `pantagrule.1m` list cracked 8% of the remaining HIBP hashes that had stood up to the corpus used to generate Pantagrule, the above common rule sets, a 7-character alphanumeric brute force, and KoreLogic's [PathWell topologies](https://blog.korelogic.com/blog/2014/04/04/pathwell_topologies).

As even the author of the _One Rule to Rule Them All_ (Hunt, 2017) meta-rule states, there is no such thing as a rule that works better than others. Every use case is different, and every rule source may be one that helps you more than another on a specific hash dump or with a specific wordlist. Note that this data does not show _what_ has been cracked; some rules have cracked hashes that other rules have not.

## Citations

0. Rabelais, F. (1532). _[Pantagruel](https://en.wikipedia.org/wiki/Gargantua_and_Pantagruel)_. Paris: Libr. générale française.
1. Kacherginsky, P. (2013). _Automatic Password Rule Analysis and Generation_. [online] Available at: https://medium.com/@iphelix/automatic-password-rule-analysis-and-generation-7d2574516e48 [Accessed 4 Oct. 2019].
2. \_NSAKEY. (2014). _NSAKEY/nsa-rules._ [online] Available at: https://github.com/NSAKEY/nsa-rules [Accessed 4 Oct. 2019].
3. Hunt, W. (2017). _One Rule to Rule Them All_. [online] Available at: https://www.notsosecure.com/one-rule-to-rule-them-all/ [Accessed 4 Oct. 2019].

## License

Pantagrule rules are released under the MIT license. Feel free to integrate them into your own tooling.