{"id":17166665,"url":"https://github.com/willf/entropy","last_synced_at":"2026-06-15T16:32:49.141Z","repository":{"id":66569792,"uuid":"117068398","full_name":"willf/entropy","owner":"willf","description":"Character-based ngram entropy model for text","archived":false,"fork":false,"pushed_at":"2018-02-19T21:47:38.000Z","size":1565,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-11-23T18:06:45.521Z","etag":null,"topics":["entropy","go"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/willf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-01-11T07:55:40.000Z","updated_at":"2020-09-04T02:04:42.000Z","dependencies_parsed_at":null,"dependency_job_id":"01398b3c-0333-4d88-a7e6-07bf745820ee","html_url":"https://github.com/willf/entropy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/willf/entropy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fentropy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fentropy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fentropy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fentropy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/willf","download_url":"https://codeload.github.com/willf/entropy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fentropy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34372121,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-15T02:00:07.085Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["entropy","go"],"created_at":"2024-10-14T23:06:20.069Z","updated_at":"2026-06-15T16:32:49.119Z","avatar_url":"https://github.com/willf.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Entropy\n\n[![Build Status](https://travis-ci.org/willf/entropy.svg?branch=master)](https://travis-ci.org/willf/entropy)\n[![Coverage Status](https://coveralls.io/repos/github/willf/entropy/badge.svg?branch=master)](https://coveralls.io/github/willf/entropy?branch=master)\n[![Go Report Card](https://goreportcard.com/badge/github.com/willf/entropy)](https://goreportcard.com/report/github.com/willf/entropy)\n[![GoDoc](https://godoc.org/github.com/willf/entropy?status.svg)](http://godoc.org/github.com/willf/entropy)\n\nReally, a character N-gram entropy modeller\n\nThis learns a n-gram model on a set of strings, and then can predict\nthe entropy of other strings.\n\nFor example, it has been noted that (good) passwords have high entropy,\nand we should be able to use that fact to find (good) passwords in code (where they shouldn't be).\n\nTo build the executable (make the output directory anything you want)\n\n```bash\ngo build -o bin/string_entropy cmd/string_entropy/string_entropy.go\n```\n\nYou can do a similar build for `create_google_books_ngram_model.go`\n\nTo train:\n\n-  Get some (source) code to train on, and train on it.\n\nThe following trains on the 1.7.3 Go distribution code, after removing some crypto files, as well as test files.\n\nThe resulting model can be found in the `data` directory.\n\n```bash\nfind /usr/local/Cellar/go/1.7.3/libexec/src/ | grep \"\\.go\" | grep -v \"crypto\" | grep -v \"_test\" | xargs cat \u003e /tmp/go_text\nbin/password_entropy -train -in /tmp/go_text -model data/go-3.tsv -ngram_size 3\n```\n\nTo predict:\n\n- Use the model to predict on some source code, for example,\nthe source for this program, which has some high-entropy\npasswords in it, looking at lines at least 10 characters long (after compressing spaces)\n\n```bash\ncat src/cmd/string_entropy/string_entropy.go |  bin/string_entropy -predict -model data/go-3.tsv -min 10  | sort -g | head -5\n-16.095489\t-997.920341\t62\t // magic_password := \"PXKXoyThngGrjCgBLuf2ivrpFFNKA9UgBHrxpLaW\"\n-14.334451\t-1576.789572\t110\t outf.Write([]byte(fmt.Sprintf(\"%f\\t%f\\t%v\\t%s\\n\", p.LogProbAverage, p.LogProbTotal, p.NumberOfNGrams, p.Text)))\n-14.186484\t-113.491869\t8\t modf = f2\n-14.186484\t-113.491869\t8\t modf = f2\n-14.107883\t-211.618242\t15\t model.Dump(modf)\n```\n\nColumns are, for each line: average log probability (take negative for entropy), total\nlog probability, number of ngrams, and the line.\n\nThe `Sccanf` line reminds me that format strings always look line line noise, and now we have the science to prove it!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillf%2Fentropy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwillf%2Fentropy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillf%2Fentropy/lists"}