{"id":13655956,"url":"https://github.com/mattjj/pyhsmm","last_synced_at":"2025-05-16T10:05:46.516Z","repository":{"id":2758755,"uuid":"3756692","full_name":"mattjj/pyhsmm","owner":"mattjj","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-25T05:40:47.000Z","size":5250,"stargazers_count":557,"open_issues_count":46,"forks_count":177,"subscribers_count":55,"default_branch":"master","last_synced_at":"2025-05-16T10:04:41.341Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mattjj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-MIT","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2012-03-18T17:40:13.000Z","updated_at":"2025-04-27T04:11:10.000Z","dependencies_parsed_at":"2022-08-06T12:31:18.633Z","dependency_job_id":"cf0d4089-e8f5-4e01-b187-cad8ea7a4fb6","html_url":"https://github.com/mattjj/pyhsmm","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mattjj%2Fpyhsmm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mattjj%2Fpyhsmm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mattjj%2Fpyhsmm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mattjj%2Fpyhsmm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mattjj","download_url":"https://codeload.github.com/mattjj/pyhsmm/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254509476,"owners_count":22082891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T04:00:43.028Z","updated_at":"2025-05-16T10:05:46.499Z","avatar_url":"https://github.com/mattjj.png","language":"Python","funding_links":[],"categories":["Resources and Frameworks","Probabilistic Methods","Python","概率统计","Uncategorized"],"sub_categories":["Others","General-Purpose Machine Learning","NLP","Uncategorized"],"readme":"[![Build\nStatus](https://travis-ci.org/mattjj/pyhsmm.svg?branch=master)](https://travis-ci.org/mattjj/pyhsmm)\n\n⚠️ Warning: this package is not maintained anymore ⚠️\n\n# Bayesian inference in HSMMs and HMMs #\n\nThis is a Python library for approximate unsupervised inference in\nBayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov\nModels (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM\nand HDP-HSMM, mostly with weak-limit approximations.\n\nThere are also some extensions:\n\n* [autoregressive models](https://github.com/mattjj/pyhsmm-autoregressive)\n* [switching linear dynamical systems](https://github.com/mattjj/pyhsmm-slds)\n* [factorial models](https://github.com/mattjj/pyhsmm-factorial)\n\n## Installing from PyPI ##\n\nGive this a shot:\n\n```bash\npip install pyhsmm\n```\n\nYou may need to install a compiler with `-std=c++11` support, like gcc-4.7 or higher.\n\nTo install manually from the git repo, you'll need `cython`. Then try this:\n\n```bash\npython setup.py install\n```\n\nIt might also help to look at the [travis file](https://raw.githubusercontent.com/mattjj/pyhsmm/master/.travis.yml) to\nsee how to set up a working install from scratch. The last Python version this package has been tested with is Python 3.7.\n\n## Running ##\n\nSee the examples directory.\n\nFor the Python interpreter to be able to import pyhsmm, you'll need it on your\nPython path. Since the current working directory is usually included in the\nPython path, you can probably run the examples from the same directory in which\nyou run the git clone with commands like `python pyhsmm/examples/hsmm.py`. You\nmight also want to add pyhsmm to your global Python path (e.g. by copying it to\nyour site-packages directory).\n\n## A Simple Demonstration ##\n\nHere's how to draw from the HDP-HSMM posterior over HSMMs given a sequence of\nobservations. (The same example, along with the code to generate the synthetic\ndata loaded in this example, can be found in `examples/basic.py`.)\n\nLet's say we have some 2D data in a data.txt file:\n\n```bash\n$ head -5 data.txt\n-3.711962552600095444e-02 1.456401745267922598e-01\n7.553818775915704942e-02 2.457422192223903679e-01\n-2.465977987699214502e+00 5.537627981813508793e-01\n-7.031638516485749779e-01 1.536468304146855757e-01\n-9.224669847039665971e-01 3.680035337673161489e-01\n```\n\nIn Python, we can plot the data in a 2D plot, collapsing out the time dimension:\n\n```python\nimport numpy as np\nfrom matplotlib import pyplot as plt\n\ndata = np.loadtxt('data.txt')\nplt.plot(data[:,0],data[:,1],'kx')\n```\n\n![2D data](https://raw.githubusercontent.com/mattjj/pyhsmm/master/images/data.png)\n\nWe can also make a plot of time versus the first principal component:\n\n```python\nfrom pyhsmm.util.plot import pca_project_data\nplt.plot(pca_project_data(data,1))\n```\n\n![Data first principal component vs time](https://raw.githubusercontent.com/mattjj/pyhsmm/master/images/data_vs_time.png)\n\nTo learn an HSMM, we'll use `pyhsmm` to create a `WeakLimitHDPHSMM` instance\nusing some reasonable hyperparameters. We'll ask this model to infer the number\nof states as well, so we'll give it an `Nmax` parameter:\n\n```python\nimport pyhsmm\nimport pyhsmm.basic.distributions as distributions\n\nobs_dim = 2\nNmax = 25\n\nobs_hypparams = {'mu_0':np.zeros(obs_dim),\n                'sigma_0':np.eye(obs_dim),\n                'kappa_0':0.3,\n                'nu_0':obs_dim+5}\ndur_hypparams = {'alpha_0':2*30,\n                 'beta_0':2}\n\nobs_distns = [distributions.Gaussian(**obs_hypparams) for state in range(Nmax)]\ndur_distns = [distributions.PoissonDuration(**dur_hypparams) for state in range(Nmax)]\n\nposteriormodel = pyhsmm.models.WeakLimitHDPHSMM(\n        alpha=6.,gamma=6., # better to sample over these; see concentration-resampling.py\n        init_state_concentration=6., # pretty inconsequential\n        obs_distns=obs_distns,\n        dur_distns=dur_distns)\n```\n\n(The first two arguments set the \"new-table\" proportionality constant for the\nmeta-Chinese Restaurant Process and the other CRPs, respectively, in the HDP\nprior on transition matrices. For this example, they really don't matter at\nall, but on real data it's much better to infer these parameters, as in\n`examples/concentration_resampling.py`.)\n\nThen, we add the data we want to condition on:\n\n```python\nposteriormodel.add_data(data,trunc=60)\n```\n\nThe `trunc` parameter is an optional argument that can speed up inference: it\nsets a truncation limit on the maximum duration for any state. If you don't\npass in the `trunc` argument, no truncation is used and all possible state\nduration lengths are considered. (pyhsmm has fancier ways to speed up message\npassing over durations, but they aren't documented.)\n\nIf we had multiple observation sequences to learn from, we could add them to the\nmodel just by calling `add_data()` for each observation sequence.\n\nNow we run a resampling loop. For each iteration of the loop, all the latent\nvariables of the model will be resampled by Gibbs sampling steps, including the\ntransition matrix, the observation means and covariances, the duration\nparameters, and the hidden state sequence. We'll also copy some samples so that\nwe can plot them.\n\n```python\nmodels = []\nfor idx in progprint_xrange(150):\n    posteriormodel.resample_model()\n    if (idx+1) % 10 == 0:\n        models.append(copy.deepcopy(posteriormodel))\n```\n\nNow we can plot our saved samples:\n\n```python\nfig = plt.figure()\nfor idx, model in enumerate(models):\n    plt.clf()\n    model.plot()\n    plt.gcf().suptitle('HDP-HSMM sampled after %d iterations' % (10*(idx+1)))\n    plt.savefig('iter_%.3d.png' % (10*(idx+1)))\n```\n\n![Sampled models](https://raw.githubusercontent.com/mattjj/pyhsmm/master/images/posterior_animation.gif)\n\nI generated these data from an HSMM that looked like this:\n\n![Randomly-generated model and data](https://raw.githubusercontent.com/mattjj/pyhsmm/master/images/truth.png)\n\nSo the posterior samples look pretty good!\n\nA convenient shortcut to build a list of sampled models is to write\n\n```python\nmodel_samples = [model.resample_and_copy() for itr in progprint_xrange(150)]\n```\n\nThat will build a list of model objects (each of which can be inspected,\nplotted, pickled, etc, independently) in a way that won't duplicate data that\nisn't changed (like the observations or hyperparameter arrays) so that memory\nusage is minimized. It also minimizes file size if you save samples like\n\n```python\nimport cPickle\nwith open('sampled_models.pickle','w') as outfile:\n    cPickle.dump(model_samples,outfile,protocol=-1)\n```\n\n## Extending the Code ##\nTo add your own observation or duration distributions, implement the interfaces\ndefined in `basic/abstractions.py`. To get a flavor of\nthe style, see [pybasicbayes](https://github.com/mattjj/pybasicbayes).\n\n## References ##\n* Matthew J. Johnson. [Bayesian Time Series Models and Scalable\n  Inference](http://www.mit.edu/~mattjj/thesis.pdf). MIT PhD Thesis, May 2014.\n\n* Matthew J. Johnson and Alan S. Willsky. [Bayesian Nonparametric Hidden\n  Semi-Markov Models](http://www.jmlr.org/papers/volume14/johnson13a/johnson13a.pdf).\n  Journal of Machine Learning Research (JMLR), 14:673–701, February 2013.\n\n* Matthew J. Johnson and Alan S. Willsky, [The Hierarchical Dirichlet Process\n  Hidden Semi-Markov Model](http://www.mit.edu/~mattjj/papers/uai2010.pdf). 26th\n  Conference on Uncertainty in Artificial Intelligence (UAI 2010), Avalon,\n  California, July 2010.\n\n```bibtex\n@article{johnson2013hdphsmm,\n    title={Bayesian Nonparametric Hidden Semi-Markov Models},\n    author={Johnson, Matthew J. and Willsky, Alan S.},\n    journal={Journal of Machine Learning Research},\n    pages={673--701},\n    volume={14},\n    month={February},\n    year={2013},\n}\n```\n\n## Authors ##\n\n[Matt Johnson](https://github.com/mattjj), [Alex Wiltschko](https://github.com/alexbw), [Yarden Katz](https://github.com/yarden), [Chia-ying (Jackie) Lee](https://github.com/jacquelineCelia), [Scott Linderman](https://github.com/slinderman), [Kevin Squire](https://github.com/kmsquire), [Nick Foti](https://github.com/nfoti).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmattjj%2Fpyhsmm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmattjj%2Fpyhsmm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmattjj%2Fpyhsmm/lists"}