https://github.com/vialab/semantic-guesser

Training and testing of linguistic passwords models.
https://github.com/vialab/semantic-guesser

cracking nlp passwords pcfg security

Last synced: 5 months ago
JSON representation

Training and testing of linguistic passwords models.

Host: GitHub
URL: https://github.com/vialab/semantic-guesser
Owner: vialab
License: other
Created: 2012-02-14T14:27:12.000Z (over 14 years ago)
Default Branch: master
Last Pushed: 2022-06-21T21:30:53.000Z (about 4 years ago)
Last Synced: 2024-03-26T08:11:59.985Z (over 2 years ago)
Topics: cracking, nlp, passwords, pcfg, security
Language: Python
Homepage:
Size: 159 MB
Stars: 24
Watchers: 26
Forks: 7
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

# Semantic Password Guesser

Tools for training probabilistic context-free grammars on password lists. The
models encode syntactic and semantic linguistic patterns and can be used to
generate guesses.

[Read the paper](http://vialab.dc-uoit.net/wordpress/wp-content/papercite-data/pdf/ver2014a.pdf)

Cite:

```
@inproceedings{Veras2014,
title={On Semantic Patterns of Passwords and their Security Impact.},
author={Veras, Rafael and Collins, Christopher and Thorpe, Julie},
booktitle={NDSS},
year={2014}
}
```

## Basic Usage

To train a grammar with a password list:

```
cd semantic_guesser
python -m learning.train password_list.txt ~/grammars/test_grammar -vv
```

A password list has one password per line:

```
$ head password_list.txt
@fl!pm0de@
pass
steveol
chotzi
lb2512
scotch
passwerd
flipmode
flipmode
alden2
```

The resulting folder has a number of tab-separated, human readable files:

- `rules.txt` - grammar's base structures in highest probability order.
- `nonterminals/*.txt` - each file lists the terminal strings generated by a nonterminal symbol. For instance, `jj.txt` lists all strings classified as adjective along with their probabilities.

### Options

```
usage: train.py [-h] [--estimator {mle,laplace}] [-a ABSTRACTION] [-v]
[--tags {pos_semantic,pos,backoff,word}] [-w NUM_WORKERS]
[passwords] output_folder

positional arguments:
passwords a password list
output_folder a folder to store the grammar model

optional arguments:
-h, --help show this help message and exit
--estimator {mle,laplace}
-a ABSTRACTION, --abstraction ABSTRACTION
Detail level of the grammar. An integer > 0
proportional to the desired specificity.
-v verbose level (e.g., -vvv)
--tags {pos_semantic,pos,backoff,word}
-w NUM_WORKERS, --num_workers NUM_WORKERS
number of cores available for parallel work

```

## Sampling from a grammar

Sample 1,000 passwords from `mygrammar`:

```
python -m guessing.sample 1000 mygrammar
```

## Generating guesses

The guess generator is a C++ program, you need to compile it first.

```
cd guessing
make
```

Then run it with a trained grammar model.

```
guessmaker -g /path/to/my/grammar --mangle
```

guessmaker implements the algorithms described in [Matt Weir's dissertation][1]: next and deadbeat. Deadbeat is the default.

The grammars have only lowercase strings. By passing `--mangle` in the above command we derive uppercase, lowercase, capitalized, and camelcase (when applicable) versions of every guess.

## Password probability

You can calculate the probability of a password given a grammar:

```
python -m guessing.score \
--uppercase \
--camelcase \
--capitalized \
path_to_my_grammar \
a_list_of_passwords.txt
```

If you will be using `guessmaker --mangle` to generate guesses, unless you pass `--uppercase`, `--camelcase` and/or `--capitalized` to `guessing.score`, it will assume that non-lowercase passwords cannot be guessed by the grammar (_p=0_).

## Calculating password strength

We can calculate the strength of a password given a grammar using Filippone and Dell'Amico's [Monte Carlo strength evaluation](http://www.dcs.gla.ac.uk/~maurizio/Publications/ccs15.pdf). The strength is an estimate for how many passwords would need to be output (using the guess generation procedure above) before the password is guessed. We need a large sample (see how to generate samples above) from the grammar. The largest the sample the more accurate the estimates.

```
python -m guessing.sample 1000 path_to_grammar/ > sample.txt
python -m guessing.score path_to_grammar/ passwords.txt > scored_passwords.txt
python -m guessing.strength sample.txt scored_passwords.txt
```

## Environment Setup

venv is preferred:

```
cd semantic_guesser
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
```

Then download NLTK data:

```
python -m nltk.downloader wordnet wordnet_ic
```

[1]: http://purl.flvc.org/fsu/fd/FSU_migr_etd-1213 "Weir, C. M. (2010). Using Probabilistic Techniques to Aid in Password Cracking Attacks."

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vialab/semantic-guesser

Awesome Lists containing this project

README