{"id":40992701,"url":"https://github.com/vialab/semantic-guesser","last_synced_at":"2026-01-22T07:41:59.533Z","repository":{"id":2468047,"uuid":"3440457","full_name":"vialab/semantic-guesser","owner":"vialab","description":"Training and testing of linguistic passwords models.","archived":false,"fork":false,"pushed_at":"2022-06-21T21:30:53.000Z","size":166448,"stargazers_count":24,"open_issues_count":5,"forks_count":7,"subscribers_count":26,"default_branch":"master","last_synced_at":"2024-03-26T08:11:59.985Z","etag":null,"topics":["cracking","nlp","passwords","pcfg","security"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vialab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-02-14T14:27:12.000Z","updated_at":"2024-02-09T11:39:05.000Z","dependencies_parsed_at":"2022-08-24T17:50:13.327Z","dependency_job_id":null,"html_url":"https://github.com/vialab/semantic-guesser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vialab/semantic-guesser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vialab%2Fsemantic-guesser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vialab%2Fsemantic-guesser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vialab%2Fsemantic-guesser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vialab%2Fsemantic-guesser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vialab","download_url":"https://codeload.github.com/vialab/semantic-guesser/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vialab%2Fsemantic-guesser/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28658160,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T01:17:37.254Z","status":"online","status_checked_at":"2026-01-22T02:00:07.137Z","response_time":144,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cracking","nlp","passwords","pcfg","security"],"created_at":"2026-01-22T07:41:59.475Z","updated_at":"2026-01-22T07:41:59.527Z","avatar_url":"https://github.com/vialab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Semantic Password Guesser\n\nTools for training probabilistic context-free grammars on password lists. The\nmodels encode syntactic and semantic linguistic patterns and can be used to\ngenerate guesses.\n\n[Read the paper](http://vialab.dc-uoit.net/wordpress/wp-content/papercite-data/pdf/ver2014a.pdf)\n\nCite:\n\n```\n@inproceedings{Veras2014,\n  title={On Semantic Patterns of Passwords and their Security Impact.},\n  author={Veras, Rafael and Collins, Christopher and Thorpe, Julie},\n  booktitle={NDSS},\n  year={2014}\n}\n```\n\n\n## Basic Usage\n\nTo train a grammar with a password list:\n\n```\ncd semantic_guesser  \npython -m learning.train password_list.txt ~/grammars/test_grammar -vv\n```\n\nA password list has one password per line:\n\n```\n$ head password_list.txt\n@fl!pm0de@\npass\nsteveol\nchotzi\nlb2512\nscotch\npasswerd\nflipmode\nflipmode\nalden2\n```\n\nThe resulting folder has a number of tab-separated, human readable files:\n\n- `rules.txt` - grammar's base structures in highest probability order.\n- `nonterminals/*.txt` - each file lists the terminal strings generated by a nonterminal symbol. For instance, `jj.txt` lists all strings classified as adjective along with their probabilities.\n\n### Options\n\n```\nusage: train.py [-h] [--estimator {mle,laplace}] [-a ABSTRACTION] [-v]\n                [--tags {pos_semantic,pos,backoff,word}] [-w NUM_WORKERS]\n                [passwords] output_folder\n\npositional arguments:\n  passwords             a password list\n  output_folder         a folder to store the grammar model\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --estimator {mle,laplace}\n  -a ABSTRACTION, --abstraction ABSTRACTION\n                        Detail level of the grammar. An integer \u003e 0\n                        proportional to the desired specificity.\n  -v                    verbose level (e.g., -vvv)\n  --tags {pos_semantic,pos,backoff,word}\n  -w NUM_WORKERS, --num_workers NUM_WORKERS\n                        number of cores available for parallel work\n\n```\n\n## Sampling from a grammar\n\nSample 1,000 passwords from `mygrammar`:\n\n```\npython -m guessing.sample 1000 mygrammar\n```\n\n## Generating guesses\n\nThe guess generator is a C++ program, you need to compile it first.\n\n```\ncd guessing\nmake\n```\n\nThen run it with a trained grammar model.\n\n```\nguessmaker -g /path/to/my/grammar --mangle\n```\n\nguessmaker implements the algorithms described in [Matt Weir's dissertation][1]: next and deadbeat. Deadbeat is the default.\n\nThe grammars have only lowercase strings. By passing `--mangle` in the above command we derive uppercase, lowercase, capitalized, and camelcase (when applicable) versions of every guess.\n\n## Password probability\n\nYou can calculate the probability of a password given a grammar:\n\n```\npython -m guessing.score \\\n  --uppercase            \\\n  --camelcase            \\\n  --capitalized          \\\n  path_to_my_grammar     \\\n\ta_list_of_passwords.txt\n```\n\nIf you will be using `guessmaker --mangle` to generate guesses, unless you pass `--uppercase`, `--camelcase` and/or `--capitalized` to `guessing.score`, it will assume that non-lowercase passwords cannot be guessed by the grammar (_p=0_).\n\n## Calculating password strength\n\nWe can calculate the strength of a password given a grammar using Filippone and Dell'Amico's [Monte Carlo strength evaluation](http://www.dcs.gla.ac.uk/~maurizio/Publications/ccs15.pdf). The strength is an estimate for how many passwords would need to be output (using the guess generation procedure above) before the password is guessed. We need a large sample (see how to generate samples above) from the grammar. The largest the sample the more accurate the estimates.\n\n```\npython -m guessing.sample 1000 path_to_grammar/ \u003e sample.txt\npython -m guessing.score path_to_grammar/ passwords.txt \u003e scored_passwords.txt\npython -m guessing.strength sample.txt scored_passwords.txt\n```\n\n\n## Environment Setup\n\nvenv is preferred:\n\n```\ncd semantic_guesser\npython3 -m venv env\nsource env/bin/activate\npip install -r requirements.txt\n```\n\nThen download NLTK data:\n\n```\npython -m nltk.downloader wordnet wordnet_ic\n```\n\n[1]: http://purl.flvc.org/fsu/fd/FSU_migr_etd-1213 \"Weir, C. M. (2010). Using Probabilistic Techniques to Aid in Password Cracking Attacks.\"\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvialab%2Fsemantic-guesser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvialab%2Fsemantic-guesser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvialab%2Fsemantic-guesser/lists"}