{"id":23328002,"url":"https://github.com/andy-byers/hmm","last_synced_at":"2026-06-23T16:31:18.288Z","repository":{"id":103711198,"uuid":"580937112","full_name":"andy-byers/hmm","owner":"andy-byers","description":"A small C++ library for working with hidden Markov models","archived":false,"fork":false,"pushed_at":"2022-12-21T22:29:27.000Z","size":21,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-19T18:50:48.301Z","etag":null,"topics":["cpp","hidden-markov-model","hmm"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andy-byers.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-21T20:51:01.000Z","updated_at":"2025-02-11T13:54:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"a2420884-4600-478d-b41b-ccbc06bf49e7","html_url":"https://github.com/andy-byers/hmm","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/andy-byers/hmm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andy-byers%2Fhmm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andy-byers%2Fhmm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andy-byers%2Fhmm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andy-byers%2Fhmm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andy-byers","download_url":"https://codeload.github.com/andy-byers/hmm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andy-byers%2Fhmm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34698687,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","hidden-markov-model","hmm"],"created_at":"2024-12-20T20:54:09.335Z","updated_at":"2026-06-23T16:31:18.269Z","avatar_url":"https://github.com/andy-byers.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# hmm\n`hmm` is a small C++ library for working with hidden Markov models (HMMs).\n\n## Features\n+ Implements HMM training, decoding, and evaluation\n+ Performs calculations in log space to avoid floating-point underflow\n+ Supports pseudocounts to avoid zero probabilities during parameter estimation\n+ Certain model parameters can be fixed during training\n+ Models can be serialized\n\n## Caveats\n+ Uses dense data structures to represent the model parameters\n  + Causes wasted space when there are relatively few possible transitions\n  + Makes this implementation unfeasible for high-order models\n\n## Description\nWe represent a HMM with 3 model parameters: $A$, $B$, and $\\pi$.\n$A$ represents the transition probabilities, where $A_{ij} = P(Y_{t+1}=j|Y_t=i)$ is the probability of transitioning into state $j$ from state $i$.\n$A$ is of size $N\\times N$, where $N$ is the number of available states.\n$B$ represents the emission probabilities, where $B_j(k) = P(x_t=k|y_t=j)$ is the probability of emitting symbol $k$ from state $j$.\n$B$ is of size $N\\times M$, where $M$ is the number of available symbols.\nFinally, $\\pi$ represents the initial state distribution, where $\\pi_{i}$ is the probability of starting in state $i$.\n$pi$ contains an element for each of the $N$ states.\n\n`hmm` uses the `hmm::model` class to represent a HMM.\n`hmm::model` can be constructed in several ways:\n1. Model parameters $\\theta = (A, B, \\pi)$ supplied as `hmm::model_parameters`\n2. Parameters read as text from a `std::istream` (see [Text format](#text-format))\n3. Parameters estimated from training examples\n\nOnce we have a `hmm::model`, we can use it to:\n1. Generate observation and state sequences given the model parameters\n2. Decode the forward, backward, and posterior probabilities\n3. Predict the most-likely state sequence corresponding to the observations (Viterbi)\n4. Train the model using example observations (Baum-Welch)\n\n## Build\n`hmm` is built using CMake.\nIn the project root directory, run\n```bash\nmkdir -p build \u0026\u0026 cd ./build\n```\n\nfollowed by\n```bash\ncmake .. \u0026\u0026 cmake --build .\n```\n\nto build the library and tests.\nThe tests must be built with assertions enabled.\nTo build the library in release mode without tests, the last command would look like:\n```bash\ncmake -DHMM_BUILD_TESTS=Off .. \u0026\u0026 cmake --build . --config Release\n```\n\nFinally, a cleaned-up install can be created at `PREFIX` using:\n```bash\ncmake --install . --prefix \"${PREFIX}\"\n```\n\n## Text format\n`hmm` supports a simple serialization protocol.\nThe model parameters, along with size parameters $N$ and $M$, are converted into text using C++ Standard Library functions and concatenated in the following order:\n```\nN M A B pi\n```\n\nThe spaces can be replaced with any number of whitespace characters, including newlines, and multiple models can be saved to the same stream.\nFor serialization and deserialization, respectively, `hmm::model` provides a `save(std::ostream \u0026)` method, and a constructor that takes a `std::istream \u0026`.\nHere's an example of what this text format might look like for a model with 2 states and 3 symbols. \nNote that each data element $p$ is a probability ($0\\le p\\le1$) and each row a discrete probability distribution ($\\sum_j p_j = 1$).\n```\n2 3\n\n0.8 0.2\n0.2 0.8\n\n0.5 0.4 0.1\n0.4 0.5 0.1\n\n0.5 0.5\n```\n\n## TODO\n+ Not sure if the initial distribution update is correct, needs to be tested\n+ Sparse matrices for handling large models (higher-order models converted to first-order with a ton of states)\n+ More testing\n+ Examples/use cases\n+ Documentation\n\n## References\n1. Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (2010). Biological Sequence Analysis.\n2. MATLAB. (2010). version 7.10.0 (R2010a). Natick, Massachusetts: The MathWorks Inc.\n3. Hasegawa-Johnson, M. (2020). ECE 417: Multimedia Signal Processing. The Grainger College of Engineering. Retrieved December 21, 2022, from https://courses.engr.illinois.edu/ece417/fa2020/\n4. Wikimedia Foundation. (2022). List of logarithmic identities. Wikipedia. Retrieved December 21, 2022, from https://en.wikipedia.org/wiki/List_of_logarithmic_identities ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandy-byers%2Fhmm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandy-byers%2Fhmm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandy-byers%2Fhmm/lists"}