{"id":17129622,"url":"https://github.com/janos/compromised","last_synced_at":"2025-04-13T07:15:59.655Z","repository":{"id":57550769,"uuid":"309114902","full_name":"janos/compromised","owner":"janos","description":"Compromised/Pwned Passwords API On-premisses","archived":false,"fork":false,"pushed_at":"2022-12-11T13:07:10.000Z","size":187,"stargazers_count":24,"open_issues_count":0,"forks_count":4,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-13T07:15:53.731Z","etag":null,"topics":["go","golang","password","pwnedpasswords","service"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/janos.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-01T14:34:20.000Z","updated_at":"2025-03-18T09:02:12.000Z","dependencies_parsed_at":"2023-01-26T23:46:09.780Z","dependency_job_id":null,"html_url":"https://github.com/janos/compromised","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janos%2Fcompromised","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janos%2Fcompromised/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janos%2Fcompromised/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janos%2Fcompromised/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/janos","download_url":"https://codeload.github.com/janos/compromised/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248675396,"owners_count":21143768,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["go","golang","password","pwnedpasswords","service"],"created_at":"2024-10-14T19:10:07.610Z","updated_at":"2025-04-13T07:15:59.234Z","avatar_url":"https://github.com/janos.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Compromised\n\n[![Go](https://github.com/janos/compromised/workflows/Go/badge.svg)](https://github.com/janos/compromised/actions)\n[![PkgGoDev](https://pkg.go.dev/badge/resenje.org/compromised)](https://pkg.go.dev/resenje.org/compromised)\n[![NewReleases](https://newreleases.io/badge.svg)](https://newreleases.io/github/janos/compromised)\n\n**Validate if a password has already been compromised with on-premises service.**\n\nThis service is meant for people and organizations that want to protect their users from using already compromised passwords without exposing any information (password hash or even a part of it) to a third-party service, such is https://haveibeenpwned.com/. The same dataset is used as on haveibeenpwned, but only locally, as it provides the complete dataset do be downloaded.\n\nThis service is created for and used in production by [NewReleases](https://newreleases.io).\n\nFor any online service, NIST [SP 800-63B guidelines](https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-63b.pdf) state that user-provided passwords should be checked against existing data breaches.\n\nThis service provides a CLI interface to run an HTTP API service to validate if a specific password has been compromised and how many times.\n\nIts initial setup is not trivial as it requires a database to be generated from a publicly available data collection, while providing various options to reduce the database size.\n\n## Installation\n\nCompromised service binaries have no external dependencies and can just be copied and executed locally.\n\nBinary downloads of the Compromised service can be found on the [Releases page](https://github.com/janos/compromised/releases/latest).\n\nTo install on Linux:\n\n```sh\nwget https://github.com/janos/compromised/releases/latest/download/compromised-linux-amd64 -O /usr/local/bin/compromised\nchmod +x /usr/local/bin/compromised\n```\n\nYou may need additional privileges to write to `/usr/local/bin`, but the file can be saved at any location that you want.\n\nSupported operating systems and architectures:\n\n- macOS 64bit `darwin-amd64`\n- macOS 64bit `darwin-arm64`\n- Linux 64bit `linux-amd64`\n- Linux 32bit `linux-386`\n- Linux ARM 64bit `linux-arm64`\n- Linux ARM 32bit `linux-armv6`\n- Windows 64bit `windows-amd64`\n- Windows 32bit `windows-386`\n\nThis tool is implemented using the [Go programming language](https://golang.org) and can also be installed by issuing a `go get` command:\n\n```sh\ngo install resenje.org/compromised/cmd/compromised@latest\n```\n\n## Usage\n\nThis service does not distribute any passwords or password hashes. It relies on the validity of data provided by https://haveibeenpwned.com/Passwords and provides command to generate a searchable database from that data.\n\nIt provides an HTTP server with a JSON-encoded API endpoint to be used to validate if a password has been compromised and how many times.\n\nIn order to use the service it is required to generate the database and then start the service by loading the database.\n\n### Getting help\n\nDescriptions of available commands and flags can be printed with:\n\n```sh\ncompromised -h\n```\n\n```console\nUSAGE\n\n  compromised [options...] [command]\n\n  Executing the program without specifying a command will start a process in\n  the foreground and log all messages to stderr.\n\nCOMMANDS\n\n  daemon\n    Start program in the background.\n\n  stop\n    Stop program that runs in the background.\n\n  status\n    Display status of a running process.\n\n  config\n    Print configuration that program will load on start. This command is\n    dependent of -config-dir option value.\n\n  debug-dump\n    Send to a running process USR1 signal to log debug information in the log.\n\n  index-passwords\n    Generate passwords database from pwned passwords sha1 file.\n\n  version\n    Print version to Stdout.\n\nOPTIONS\n\n  -config-dir string\n        Directory that contains configuration files.\n  -h    Show program usage.\n```\n\nAnd flags of the `index-passwords` command:\n\n```sh\ncompromised index-passwords -h\n```\n\n```console\nUSAGE\n\n  index-passwords [input filename] [output directory]\n\nOPTIONS\n\n  -h    Show program usage.\n  -hash-counting string\n        Store approximate hash counts. Possible values: exact, approx, none. (default \"exact\")\n  -min-hash-count uint\n        Skip hashes with counts lower than specified with this flag. (default 1)\n  -shard-count int\n        Split hashes into a several files. Possible values: 1, 2, 4, 8, 16, 32, 64, 128, 256. (default 32)\n```\n\n### Indexing password hashes\n\nDownload Pwned passwords SHA1 ordered by hash 7z file from https://haveibeenpwned.com/Passwords. This file is several gigabytes long (version 6 is 10.1GB) so make sure that you have enough disk space.\n\n```sh\nwget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-sha1-ordered-by-hash-v8.7z\n```\n\nExtract a textual file from the downloaded 7z archive. This file is roughly twice in size of 7z archive that contains it, around 24G for version 6. Feel free to remove the 7z archive.\n\nGenerate the database with the following command:\n\n```sh\ncompromised index-passwords \\\n    pwned-passwords-sha1-ordered-by-hash-v6.txt \\\n    compromised-passwords-db\n```\n\nThis command will read the content of `pwned-passwords-sha1-ordered-by-hash-v6.txt` file (make sure that you enter the correct path to it) and store indexes in fast searchable database in `compromised-passwords-db` directory. Command `index-passwords` will create the directory itself and it will stop execution if it already exists. It is expected that the database size is around 12GB.\n\nBy default, all hashes are stored and indexed into 32 files called shards. It is possible to reduce the database size with two optional CLI flags `--hash-counting` and `--min-hash-count`.\n\nFor example:\n\n```sh\ncompromised index-passwords \\\n    --hash-counting approx \\\n    --min-hash-count 10 \\\n    pwned-passwords-sha1-ordered-by-hash-v6.txt \\\n    compromised-passwords\n```\n\nFlag `--hash-counting` with `approx` value stores approximate hash counts by having exact values for very small values of to around 17 and with the larger values less precise (with variance of around 5%), but close enough to make an estimation on password popularity. With this option, the complete database is 9.7GB large.\n\nFlag `--hash-counting` with `none` value does not store hash counts and API always returns 1 for count of compromised passwords. With this option, the complete database is 9.3GB large.\n\nFlag `--min-hash-count` receives a numerical value which filters out all password hashes which have less number of compromisations than specified. This way it is possible to reduce the size of the database by excluding less frequently used passwords. For example by `--min-hash-count 2` only excluding passwords with count 1, the database size is reduced to 7.6GB, or with `--min-hash-count 5` to 1.9GB, or with `--min-hash-count 10` to 800MB.\n\nYou can combine these two options according to available capacity and the level of security and information that you want to provide.\n\n### Configuration\n\nService configuration is stored in configuration file `compromised.yaml` in `/etc/compromised` directory by default. You can change the directory with `--config-dir` flag:\n\n```sh\ncompromised --config-dir /data/config/compromised\n```\n\nAll available options and their default values can be printed with:\n\n```sh\ncompromised config\n```\n\n```console\n# compromised\n---\nlisten: :8080\nlisten-instrumentation: 127.0.0.1:6060\nheaders:\n  Server: compromised/0.1.0-6ed439e-dirty\n  X-Frame-Options: SAMEORIGIN\npasswords-db: \"\"\nlog-dir: \"\"\nlog-level: DEBUG\nsyslog-facility: \"\"\nsyslog-tag: compromised\nsyslog-network: \"\"\nsyslog-address: \"\"\naccess-log-level: DEBUG\naccess-syslog-facility: \"\"\naccess-syslog-tag: compromised-access\ndaemon-log-file: daemon.log\ndaemon-log-file-mode: \"644\"\npid-file: /var/folders/l4/tn9ytbgs5xx76lshwgx5bj1w0000gn/T/compromised.pid\n\n# config directories\n---\n- /etc/compromised\n- /Users/janos/Library/Application Support/compromised\n```\n\n#### Environment variables\n\nThe service can be configured with environment variables as well. Variable names can be constructed based on the keys in configuration files.\n\nFor variables in `compromised.yaml`, capitalize all letters, replace `-` with `_` and prepend `COMPROMISED_` prefix. For example, to set `passwords-db`, the environment variable is `COMPROMISED_PASSWORDS_DB`:\n\n```sh\nCOMPROMISED_PASSWORDS_DB=/path/to/passwords-db compromised\n```\n\n### Starting the service\n\nExecuting the program without specifying a command will start a process in the foreground and log all messages to stderr:\n\n```sh\ncompromised\n```\n\nService requires `passwords-db` directory to be specified:\n\n```sh\ncat /etc/compromised/compromised.yaml\n```\n\n```yaml\npasswords-db: /data/storage/compromised/passwords\n```\n\nTo write logs to files on local filesystem:\n\n```sh\ncat /etc/compromised/compromised.yaml\n```\n\n```yaml\npasswords-db: /data/storage/compromised/passwords\nlog-dir: /data/log/compromised\n```\n\nPaths in configuration files are given only as examples.\n\n### Running in the background\n\nThe service can be run in the background and managed by itself with commands:\n\n```sh\ncompromised daemon\n```\n\n```sh\ncompromised status\n```\n\n```sh\ncompromised stop\n```\n\nOr you can choose a process manager to manage it. For example this is a systemd service file:\n\n```\n[Unit]\nDescription=Compromised\nAfter=network.target\n\n[Service]\nExecStart=/usr/local/bin/compromised\nExecStop=/bin/kill $MAINPID\nKillMode=none\nRestart=on-failure\nRestartPreventExitStatus=255\nLimitNOFILE=65536\nPrivateTmp=true\nNoNewPrivileges=true\n\n[Install]\nWantedBy=default.target\n```\n\n### Using the API\n\nIn order to minimize the exposure of passwords that are checked, only SHA1 hash of a password is accepted by the API.\n\nFirst calculate the hash (use printf, not echo as echo is appending new line):\n\n```sh\nprintf 12345678 | sha1\n```\n\n```console\n7c222fb2927d828af22f592134e8932480637c0d\n```\n\nThen make an HTTP request like this one.\n\n```sh\ncurl http://localhost:8080/v1/passwords/7c222fb2927d828af22f592134e8932480637c0d\n```\n\n```json\n{\"compromised\":true,\"count\":2996082}\n```\n\nMake sure that the port is the same as you configured it for the `listen` option.\n\nOf if you choose a very strong password:\n\n```sh\nprintf \"my not compromised password\" | sha1sum\n```\n\n```console\nd391477a0849048fc28e62850a25518d72afd013\n```\n\nThen the HTTP response will look like this:\n\n```sh\ncurl http://localhost:8080/v1/passwords/d391477a0849048fc28e62850a25518d72afd013\n```\n\n```json\n{\"compromised\":false}\n```\n\n### Instrumentation API\n\nBeside the main API, there is another API endpoint, by default available on port `6060` only on `localhost` which exposes some of the instrumentation information about the service:\n\n- Prometheus metrics `http://localhost:6060/metrics`\n- Most basic health check endpoint `http://localhost:6060/status`\n- Most basic JSON health check endpoint `http://localhost:6060/api/status`\n- Go pprof `http://localhost:6060/debug/pprof/`\n\nInstrumentation API can be disabled with an empty value for `listen-instrumentation` configuration option in `/etc/compromised/compromised.yaml`:\n\n```yaml\nlisten-instrumentation: \"\"\n```\n\n## Using the Go library\n\nAs this service is written in the Go programming language, an HTTP client package is provided, but also a package that allows loading the database in your own application if you do not want to manage the `compromised` service.\n\n### HTTP Client\n\n```go\npackage main\n\nimport (\n\t\"contex\"\n\t\"crypto/sha1\"\n\t\"fmt\"\n\n\thttppasswords \"resenje.org/compromised/pkg/passwords/http\"\n)\n\nfunc main() {\n\t// url with host and port where compromised service is listening\n\ts, err := httppasswords.New(\"http://localhost:8080\", nil)\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\n\tc, err := s.IsPasswordCompromised(contex.Background(), sha1.Sum([]byte(\"my password\")))\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\n\tfmt.Println(\"this password has been compromised\", c, \"times\")\n}\n```\n\n### Embed DB\n\n```go\npackage main\n\nimport (\n\t\"contex\"\n\t\"crypto/sha1\"\n\t\"fmt\"\n\n\tfilepasswords \"resenje.org/compromised/pkg/passwords/file\"\n)\n\nfunc main() {\n\ts, err := filepasswords.New(\"/path/to/passwords-db\")\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\tdefer s.Close()\n\n\tc, err := s.IsPasswordCompromised(contex.Background(), sha1.Sum([]byte(\"my password\")))\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\n\tfmt.Println(\"this password has been compromised\", c, \"times\")\n}\n```\n\n## Database format\n\nDatabase stores SHA1 hashes in binary format and count values associated with them. A database is generated once and can be used only in read only mode.\n\nSHA1 hashes are 20 bytes long and they are split into 3 bytes long _partitions_ and 17 bytes long _remainders_. This allows to categorize hashes into 16777216 (count of all 3 bytes long integers) partitions.\n\nAll hash _remainders_ are stored in multiple files called shards named _hashes-*.db_, where _*_ is a base36-encoded positive integer. Shard count _shardCount_ is configurable and can be set to 1, 2, 4, 8, 16, 32, 64, 128 or 256. Shard file number for a particular hash is determined by its first byte with formula _byte/256*shardCount_, which ensures that every shard contains the same number of _partitions_ distributed in a serial manner.\n\nDatabase files are _db.json_, _index.db_ and a series of _hashes-*.db_.\n\nFile _db.json_ stores JSON-encoded meta information about the database.\n\nFile _index.db_ stores information where a _partition_ of hashes with a common prefix can be found in a particular _hashes-*.db_ shard.\n\nFiles _hashes-*.db_ store hash _remainders_ and count values associated for every hash.\n\nFile _index.db_ stores a total of 16777216 + _shardCount_ 32bit integers in an array. Each representing either a shard start or a single partition. In other words, _index.db_ associates a number for every possible partition and that number is the index of partition's last hash in the shard file that it belongs to.\n\n### index.db structure\n\nBinary file _index.db_ consists of an array of big endian encoded 32bit unsigned integers. Each integer represents a start of a shard as value 0x00000000 or a last hash index in a particular partition in a particular shard file.\n\n```\n  4 bytes\n+----------+\n\n+----------+\n|0x00000000|  shard 0 start\n+----------+\n|          |  shard 0, partition 0 end\n+----------+\n|   ...    |\n+----------+\n|          |  shard 0, partition n end\n+----------+\n|   ...    |\n+----------+\n|          |  shard 0, partition 16777216/shardCount end\n+----------+\n|0x00000000|  shard 1 start\n+----------+\n|          |  shard 1, partition (16777216/shardCount)+1 end\n+----------+\n|   ...    |\n+----------+\n|          |  shard 1, partition (16777216/shardCount)+n end\n+----------+\n|   ...    |\n+----------+\n|          |  shard 1, partition (16777216/shardCount)*2 end\n+----------+\n|   ...    |\n+----------+\n|          |  shard shardCount, partition 16777215 end\n+----------+\n```\n\nThis structure makes _index.db_ file length from 64MB and one byte, to 64MB and 256 bytes, depending on the _shardCount_ and length is irrelevant of the number of hashes.\n\nThis structure is justified as every _partition_ contains at least one compromised password hash.\n\nLimitation is that every shard can contain up to 4,294,967,296 (unsigned 32 bit integer count), or with the maximal _shardCount_ of 256, the database can contain up to 1,099,511,627,776 hashes. These values are larger enough than the number of compromised hashes which is currently 572,611,621, to assume that it will support the growth of the database in the foreseeable future.\n\n### hashes-*.db structure\n\nBinary files _hashes-*.db_ consist of an array of two part elements. The first part is a fixed size 17 bytes long SHA1 _remainder_, the second part holds information about the count of the hash that this _remainder_ belongs to and it is fixed for every database but configurable as indexing stage based on the precision that is needed:\n\n- exact - big endian encoded 32 bit unsigned integers - _countSize_ is 4 bytes\n- approx - 8 bits long approximation value - _countSize_ is 1 byte\n- none - count value is not stored - _countSize_ is 0 bytes\n\n```\n  17 bytes    countSize\n+-----------+-----------+\n\n+-----------+-----------+\n| remainder |   count   |  hash 1\n+-----------+-----------+\n| remainder |   count   |  ...\n+-----------+-----------+\n| remainder |   count   |  hash n\n+-----------+-----------+\n```\n\n### Performing a query\n\nTo perform a query on the database is to get the information if a particular SHA1 hash is in the database and what count value is associated with it.\n\nThe uniform distribution of SHA1 hashes allows the described database structure to be efficient in finding if the hash is present in the database or not.\n\nThe query for a particular hash starts with identifying which _shard_ and _partition_ that hash should belong to.\n\n_Shard_ is calculated with formula _byte/256*shardCount_, where _byte_ is the first byte of the hash, _shardCount_ is read from a _db.json_ file and _256_ is the size of a byte (unsigned 8 bit integer) and it is also the maximal number of shards that is supported.\n\nPartition number is a binary decoded 24 bit unsigned integer from the first 3 bytes of the hash.\n\nFile _index.db_ is read at the position of the partition number and the next one, getting the range of positions of remainders in that partition in the shard file.\n\nShard number is used to identify which shard file should be read at the remainder positions. Every remainder should be read sequentially and check if it matches the hash last 17 bytes. At average, 34 check iterations should be made. _Partition_ size of 3 bytes is chosen as optimal for the number of hashes in pwned passwords hashes list, as it leaves in average of 34 hashes per partition. If the match is found, count is decoded from the rest of the second part of the hashes file element.\n\n## Versioning\n\nTo see the current version of the binary, execute:\n\n```sh\ncompromised version\n```\n\nEach version is tagged and the version is updated accordingly in `version.go` file.\n\n## Contributing\n\nRead the [contribution guidelines](CONTRIBUTING.md).\n\n## License\n\nThis application is distributed under the BSD-style license found in the [LICENSE](LICENSE) file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjanos%2Fcompromised","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjanos%2Fcompromised","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjanos%2Fcompromised/lists"}