{"id":32204580,"url":"https://github.com/chenhaotian/bayesian-bricks","last_synced_at":"2025-10-22T04:59:19.436Z","repository":{"id":56936366,"uuid":"238021293","full_name":"chenhaotian/Bayesian-Bricks","owner":"chenhaotian","description":"Basic building blocks in Bayesian statistics.","archived":false,"fork":false,"pushed_at":"2021-01-16T05:43:26.000Z","size":5617,"stargazers_count":7,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-22T04:58:53.148Z","etag":null,"topics":["bayesian-inference","bayesian-network","bayesian-statistics","em-algorithm","graphical-models","hierarchical-mixture-models","hierarchical-models","hierarchical-topic-models","mcmc-methods","rpackage","statistical-models"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chenhaotian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-03T17:26:15.000Z","updated_at":"2022-05-11T12:47:33.000Z","dependencies_parsed_at":"2022-08-21T06:20:46.934Z","dependency_job_id":null,"html_url":"https://github.com/chenhaotian/Bayesian-Bricks","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/chenhaotian/Bayesian-Bricks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chenhaotian%2FBayesian-Bricks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chenhaotian%2FBayesian-Bricks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chenhaotian%2FBayesian-Bricks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chenhaotian%2FBayesian-Bricks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chenhaotian","download_url":"https://codeload.github.com/chenhaotian/Bayesian-Bricks/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chenhaotian%2FBayesian-Bricks/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280382997,"owners_count":26321423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-inference","bayesian-network","bayesian-statistics","em-algorithm","graphical-models","hierarchical-mixture-models","hierarchical-models","hierarchical-topic-models","mcmc-methods","rpackage","statistical-models"],"created_at":"2025-10-22T04:59:16.830Z","updated_at":"2025-10-22T04:59:19.429Z","avatar_url":"https://github.com/chenhaotian.png","language":"R","readme":"# bbricks \n\n[![Build Status](https://travis-ci.org/chenhaotian/Bayesian-Bricks.svg?branch=master)](https://travis-ci.com/chenhaotian/Bayesian-Bricks) [![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/bbricks)](https://cran.r-project.org/package=bbricks) [![metacran downloads](https://cranlogs.r-pkg.org/badges/bbricks)](https://cran.r-project.org/package=bbricks) \n\n[bbricks](https://github.com/chenhaotian/Bayesian-Bricks) provides a collection of frequently used Bayesian parametric and nonparametric model *structures*, as well as a set of tools for common analytical *tasks*. \n\n+ *Structures* include linear Gaussian systems, Gaussian and Gaussian conjugate structure, Gaussian and Inverse-Wishart conjugate structure, Gaussian and Normal-Inverse-Wishart conjugate structure, Gaussian and Normal-Inverse-Gamma conjugate structure, Categorical and Dirichlet conjugate structure, Dirichlet Process on positive integers, Dirichlet Process in general, Hierarchical Dirichlet Process ... \n+ *Tasks* include updating posteriors, calculating marginal likelihood, calculating posterior predictive densities, sampling from posterior distribution, sampling from posterior predictive distribution, calculating MAP estimates ... \n\nSee [Mindset](#mindset) for the idea behind **bbricks** and [Examples](#examples) to get started. \n\n**Note:** This is a package designed for statisticians, and the ones who want to learn the basic Bayesian mindsets. \n\n**----Installation----** \n\n```R \n# install from CRAN: \ninstall.packages(\"bbricks\") \n# or install development version from GitHub: \n# install.packages(\"devtools\") \ndevtools::install_github(\"chenhaotian/Bayesian-Bricks\") \n```\n\n \n\n## Table of Contents \n\n[Installation](#bbricks) \n\n[Mindset](#Mindset) \n\n[Examples](#examples) \n\n+ [Hierarchical Bayesian Linear Regression](#hierarchical-bayesian-linear-regression) \n+ [Estimate Cancer Mortality Rates with Hierarchical Bayesian](#estimate-cancer-mortality-rates-with-hierarchical-bayesian) \n+ [Mixture of Gaussian](#mixture-of-gaussian) \n  + [Dirichlet Process Mixture Model](#dirichlet-process-mixture-model) \n  + [Mixture Model with Partially Observed Cluster Labels](#mixture-model-with-partially-observed-cluster-labels) \n+ [Hierarchical Mixture Models](#hierarchical-mixture-models) \n  + [Topic Modeling with HDP](#topic-modeling-with-hdp) \n+ [Hierarchical Mixture Models with Two Layers of Hierarchies](#hierarchical-mixture-models-with-two-layers-of-hierarchies) \n  + [Hierarchical Topic Modeling with HDP2](#hierarchical-topic-modeling-with-hdp2) \n+ [Bayesian Linear Regression](#bayesian-linear-regression) \n+ [Hidden Markov Model (HMM)](#hidden-markov-model-hmm) \n+ [Infinite States Hidden Markov Model (HDP-HMM)](#infinite-states-hidden-markov-model-hdp-hmm) \n\n[3.References](#references) \n\n \n\n## Mindset \n\nThe idea of **bbricks** came from the fact that modeling in Bayesian statistics is nothing more than applying a set of **tasks** on a specific **model structure**. \n\nWhere the most frequently appeared **tasks** are: \n\n+ Update prior info into posterior when new samples are observed. \n+ Sample from the posterior distribution. \n+ Calculate marginal likelihood of the data set. \n+ Calculate posterior predictive densities from the posterior distribution. \n+ ... \n\nAnd the **model structure**s are always constituted by ![](./notes_pictures/4d6fd63efeec1ee031fad293ef1c7e85.png) basic Bayesian modeling structures: \n![](./notes_pictures/basicStructures.png) \nWhere \n\n+ ![](./notes_pictures/1cfadaec6045bfd821038dce18a6e149.png) is the most basic \"parameter-observation\" structure. Models like Gaussian, Gamma and Exponential are in this category. \n+ ![](./notes_pictures/108ad30be304e27ce3a8b7dc70c74850.png) is the \"prior-posterior\" structure. Models like Gaussian-NIW(Gaussian observations with NIW prior), Categorical-Dirichlet(Categorical observations with Dirichlet prior) are in this category. \n+ ![](./notes_pictures/aa6d91472b50a7d81415d654a507a89b.png) is the \"hierarchical-Bayesian\" structure, and ![](./notes_pictures/b7e28f0b4f6a8e33b1515023a2046ad8.png) is the same hierarchical structure but with more hierarchies. Models like Hierarchical Dirichlet Process(HDP) and HDP with additional hidden layers are in this category. \n\n**bbricks** tries to provide a `type/class` for each basic **model structure** and a `function/method` for each **task**. \n\nSee [Examples](#examples) for details. \n\n## Examples \n\n \n\n### Hierarchical Bayesian Linear Regression \n\nThis is an example from Hoff(2009). Where we want to examine the relationship between math score and another variable, socioeconomic status (SES) of students from ![](./notes_pictures/7ef3015399686627df09e012720f47ac.png) different schools. The Conditional Probability Distributions (**CPD**s) of the model are defined as: \n\n![](./notes_pictures/2ea4ad39856898964cec43686e9026f6.png)\n\n\nWhere ![](./notes_pictures/6ee74a8824b43daec497b09ef0cee1d2.png) is the math score, ![](./notes_pictures/dde0892a5df571319f311da0451a3eb9.png) is a length ![](./notes_pictures/ff950401b8b80226c5def26d682366cf.png) row vector ![](./notes_pictures/8ebd4360e26975d64a1887b8ae953176.png) corresponding to each math score observation in ![](./notes_pictures/6ee74a8824b43daec497b09ef0cee1d2.png), NIW is Normal-Inverse-Wishart distribution, it's density function is defined as: \n\n![](./notes_pictures/f7a665cfa84aa060bb9c7617a4de2195.png)\n\n\nThe graphical model structure for this model is: \n\n![](./notes_pictures/hierarchicalBayesianLinearRegression.png) \n\nWhere ![](./notes_pictures/85079d770ad84e4f3c1a18f2a64f473b.png), ![](./notes_pictures/17864a7502663257beeaf5c731361825.png). \n\nTo enable sampling from this model, we first look at the CPDs of the random variables and their **Markov blanket**s: \n\n1. The Markov blanket for ![](./notes_pictures/0318b929d455b8e643c4929c087dac22.png) is ![](./notes_pictures/ba533bc2f2af7979e4ab0f9120a66354.png), the corresponding CPD of the blanket is a linear Gaussian system. **bbricks** provides an object of type `\"LinearGaussianGaussian\"` (see `?LinearGaussianGaussian` in R for details) to encode such a structure. \n2. The Markov blanket for ![](./notes_pictures/ffb5b4fbe04d8065b4e96f5946dac982.png) is ![](./notes_pictures/1e148cbf661280d725fb914a2683aa5b.png), the corresponding CPDs of the blanket forms an Gaussian-NIW conjugate structure. **bbricks** provides an object of type `\"GaussianNIW\"` (see `?GaussianNIW` in R for details) to encode such a structure. \n3. The Markov blanket for ![](./notes_pictures/55a1ffd931f53e76a212204926aa8590.png) is ![](./notes_pictures/882ff87703a111f8cb4afffb8ca00d30.png), the corresponding CPDs of the blanket forms an Gaussian-InvWishart conjugate structure. **bbricks** provides an object of type `\"GaussianInvWishart\"` (see `?GaussianInvWishart` in R for details) to encode such a structure. \n\nNote that `\"LinearGaussianGaussian\"`, `\"GaussianNIW\"` and `\"GaussianInvWishart\"` are all basic prior-posterior structure as shown in [Mindset](#mindset) graph ![](./notes_pictures/108ad30be304e27ce3a8b7dc70c74850.png). In **bbricks**, all objects representing structures same as graph ![](./notes_pictures/108ad30be304e27ce3a8b7dc70c74850.png) are also of type `\"BasicBayesian\"`. For example a `\"LinearGaussianGaussian\"` object is also an `\"BasicBayesian\"` object. \n\nTo estimate the posterior parameters ![](./notes_pictures/7adecfd6c6f5b1b720f7770ea1a0c9f2.png),  the Gibbs sampling procedure goes as: \n\n1. Sample ![](./notes_pictures/0318b929d455b8e643c4929c087dac22.png) from a `LinearGaussianGaussian` object which encodes the distribution of ![](./notes_pictures/cb243b4776d9ce547b03d3d49a8bf579.png). \n2. Sample ![](./notes_pictures/ffb5b4fbe04d8065b4e96f5946dac982.png) from a `GaussianNIW` object which encodes the distribution of ![](./notes_pictures/1d28cb7c9d17427fe8b5bc0bf77c7dc8.png) \n3. Sample ![](./notes_pictures/55a1ffd931f53e76a212204926aa8590.png) from a `GaussianInvWishart` object which encodes the distribution of ![](./notes_pictures/0c321066a790b068fee979ff394457dd.png) \n\nR code: \n\n```R \n## Gibbs sampling for hierarchical linear regression \n \nlibrary(bbricks) \n## load some hierarchical linear regression data \n## hlrData is a list of 3 numeric vectors \n## see ?hlrData for details \ndata(hlrData) \nx \u003c- hlrData$mathScore                    #math score as the dependent variable \nX \u003c- cbind(1,hlrData$socioeconomicStatus) #socioeconomic status as the independt variable \njs \u003c- hlrData$schoolID                    #school ID as the group IDs. \nJ \u003c- max(js) \n## Initialization---------------------------------------------- \n## initialize the Markov blanket of mu and Sigma \n## the prior parameters are: m=0, k=0.0001, v=3, S=diag(1) \nobjmS \u003c- GaussianNIW(gamma = list(m =c(mean(hlrData$mathScore),0),k=0.0001,v=3,S=diag(2))) \n## initialize the Markov blanket of sigma^2 \n## the prior parameters are: vs=2, Ss=diag(1) \nobjs \u003c- GaussianInvWishart(gamma = list(mu=0,v=2,S=diag(1))) \n## initialize mu and Sigma by sampling from the prior \nmuSigma \u003c- rPosterior(objmS) \n## initialize sigma^2 by sampling from the prior \nsigma2 \u003c- rPosterior(objs) \nbetaJ \u003c- matrix(0,J,2)                  #place-holder the beta_j, j=1:J \nepsilon \u003c- x                            #place-holder for the random noises \n## Main Gibbs loop--------------------------------------------- \nmaxit \u003c- 100                           #number of sampling iterations \nburnin \u003c- 50                           #number of burn-in samples \nmeanBeta \u003c- betaJ                      #place-hoder for the sample means of beta \nit \u003c- 1 \npb \u003c- txtProgressBar(min = 0,max = maxit,style = 3) \nwhile(it\u003c=maxit){ \n    ## Step1: sample beta_j, j in 1:100 \n    for(j in 1L:J){ \n        objb \u003c- LinearGaussianGaussian(gamma=list(Sigma=sigma2,m=muSigma$mu,S=muSigma$Sigma)) \n        idx \u003c- js == j \n        ss \u003c- sufficientStatistics(obj = objb,x=x[idx],A=X[idx,,drop=FALSE]) \n        posterior(obj = objb,ss = ss) \n        betaJ[j,] \u003c- rPosterior(objb) \n    } \n    ## calculate the sample mean \n    if(it\u003eburnin) meanBeta \u003c- meanBeta+betaJ/(maxit-burnin) \n    ## Step2: sample mu and Sigma \n    ssmS \u003c- sufficientStatistics(obj = objmS,x=betaJ) \n    posterior(obj = objmS,ss = ssmS) \n    muSigma \u003c- rPosterior(obj = objmS) \n    ## Step3: sample sigma^2 \n    for(j in 1L:J){ \n        idx \u003c- js == j \n        epsilon[idx] \u003c- x[idx]-X[idx,,drop=FALSE]%*%betaJ[j,] \n    } \n    sss \u003c- sufficientStatistics(obj = objs,x=epsilon) \n    posterior(obj = objs,ss = sss) \n    sigma2 \u003c- rPosterior(objs) \n    ## increase iteration counter  \n    it \u003c- it+1 \n    setTxtProgressBar(pb,it) \n    ## if continue sampling, then discard the information in objmS and objs \n    ## to make room for the new information in the next iteration. \n    if(it \u003c maxit){ \n        posteriorDiscard(obj = objmS,ss = ssmS) \n        posteriorDiscard(obj = objs,ss = sss) \n    } \n} \n## plot the result \n## gray lines are the betas of each school \n## black line is the beta for all the data as a whole \nplot(x=0, xlim = range(0.2,0.8),ylim = c(20,35),xlab = \"socioeconomic status\",ylab = \"math score\") \nfor(j in 1L:J) \n    abline(a=betaJ[j,2],b=betaJ[j,1],col=\"gray\") \nallSchools \u003c- lm(x~X-1)$coefficients \nabline(a=allSchools[2],b=allSchools[1],lwd=3) \n \n```\n\n \n\n### Estimate Cancer Mortality Rates with Hierarchical Bayesian \n\nThis is an example from Johson and Albert(2006), where we want to estimate the cancer mortality rates of multiple cities with hierarchical Bayesian method. \n\nThe model's graph structure is: \n\n![](./notes_pictures/cancer.png) \n\nWhere ![](./notes_pictures/6ee74a8824b43daec497b09ef0cee1d2.png) is a categorical random sample that takes one of two values: \"death\" or \"no death\". There are ![](./notes_pictures/9292e0002d94681feaa2ed052f11995d.png) cities in total, ![](./notes_pictures/9328ddcbfaa6213dc52bd17ed1c7937a.png) are the mortality rates of the cities. \n\nThe CPDs are: \n\n![](./notes_pictures/2dec12d0ff950ae097862b582b882165.png)\n\nTo enable sampling from this model, we first look at the CPDs of the random variables and their **Markov blanket**s: \n\n1. The Markov blanket for ![](./notes_pictures/0b40cc5805033a7aaa9f0e10066a7485.png) is ![](./notes_pictures/407ac0c1c3da08af34c81b2fc0684eea.png), the corresponding CPD of the blanket is a Categorical-Dirichlet conjugate structure. **bbricks** provides an object of type `\"CatDirichlet\"` (see `?CatDirichlet` in R for details) to encode such a structure. \n2. The Markov blanket for ![](./notes_pictures/8c087632bdd26f21799b9723227cabd4.png) is ![](./notes_pictures/6c2cafa2950a07e2722c2b5d8cba0544.png), the corresponding CPDs of the blanket is not very common thus not provided in **bbricks**, instead we can use Metropolis-Hastings algorithm to sample from it. \n\nNote that `\"CatDirichlet\"` is a basic prior-posterior structure as shown in [Mindset](#mindset) graph ![](./notes_pictures/108ad30be304e27ce3a8b7dc70c74850.png). In **bbricks**, all objects representing structures same as graph ![](./notes_pictures/108ad30be304e27ce3a8b7dc70c74850.png) are also of type `\"BasicBayesian\"`. For example a `\"CatDirichlet\"` object is also an `\"BasicBayesian\"` object. \n\nTo estimate ![](./notes_pictures/76f098f3b4f412094bb8d52dcb452505.png), we use the following Gibbs sampling procedure: \n\n1. sample ![](./notes_pictures/76f098f3b4f412094bb8d52dcb452505.png) from a `CatDirichlet` object which encodes the distribution of  ![](./notes_pictures/f39bd2dee478d7c65fe0c0fa62c20e5d.png). \n2. sample ![](./notes_pictures/8c087632bdd26f21799b9723227cabd4.png) from the distribution of ![](./notes_pictures/06509278492f8f9caf88c6fd5c60090c.png) with and independent Metropolis-Hastings algorithm. see `?MetropolisHastings` for details. \n\nR code: \n\n```R \n## Estimate cancer mortality rates using Gibbs sampling \n \nlibrary(bbricks) \n \n## see ?cancerData for details \ndata(cancerData) \n## Step1: Initialization---------------------------------------------- \nK \u003c- length(cancerData)                          #then number of cities \neta \u003c- 1                                         #assume eta is known, eta=1 \n## initialize alpha, PI, and sufficient statistics \na \u003c- rexp(2,rate = eta)                 #initialize alpha \nPI \u003c- matrix(0,K,2L)                    #initialize pi \ncityPrior \u003c- CatDirichlet(gamma = list(alpha=a,uniqueLabels=c(\"death\",\"no death\"))) \ncitySS \u003c- lapply(cancerData,function(x){sufficientStatistics(obj = cityPrior,x=x)}) #sufficient statistics of each city \n## initialize functions used in Metropolis-Hastings, see ?MetropolisHastings for details \n## density of the target distribution \ndp \u003c- function(a){ \n    if(any(a\u003c0)) -Inf \n    else sum(dDir(x=PI,alpha = a,LOG = TRUE))+sum(dexp(x=a,rate = eta,log = TRUE)) \n} \n## density of the proposal distribution \ndq \u003c- function(anew,a){1}                #use a independent proposal \n## random sample generator of the proposal distribution \nrq \u003c- function(x){ \n    c(runif(1,x[1]-1,x[1]+1), \n      runif(1,x[2]-1,x[2]+1)) \n} \n## Step2: main Gibbs sampling loop between alpha and pi -------------- \nmaxit \u003c- 1000 \nburnin \u003c- 500                            #number of burn-in samples \nmeanPI \u003c- numeric(K)                     #place-hoder for the sample mean \nit \u003c- 1 \nwhile(it\u003c=maxit){ \n    ## Step1: sample pi from p(pi|a,x)------------- \n    for(k in 1L:K){ \n        posterior(obj = cityPrior,ss=citySS[[k]]) \n        PI[k,] \u003c- rDir(n=1,alpha = cityPrior$gamma$alpha) \n        posteriorDiscard(obj = cityPrior,ss=citySS[[k]]) \n    } \n    ## calculate the sample mean \n    if(it\u003eburnin) meanPI \u003c- meanPI+PI[,1]/(maxit-burnin) \n    ## Step2: sample a from p(a|pi,g)-------------- \n    ## use Metropolis-Hastings \n    a \u003c- MetropolisHastings(nsamples = 1,xini = a,dp=dp,dq=dq,rq=rq) \n    ## increase iteration counter  \n    it \u003c- it+1 \n} \n## Step3: plot the result--------------------------------------------- \n## black bars are the sample mean from the hierarchical Bayesian model \n## blue bars are the MLE of the mortality rates. \nplot(1:K,meanPI,type = \"h\",xlab = \"city\",ylab = \"mortality rate\",lwd=3) \nlines(1:K+0.2,sapply(cancerData,function(l){sum(l==\"death\")/length(l)}),type = \"h\",col = \"blue\",lwd = 3) \nlegend(1, 0.005, legend=c(\"Sample Mean\", \"MLE\"),col=c(\"black\", \"blue\"), lty=c(1,1), cex=1,lwd = 3) \n```\n\n \n\n### Mixture of Gaussian \n\nA mixture of Gaussian has the following graph structure: \n\n![](./notes_pictures/mixtureModel.png) \n\nWhere there are ![](./notes_pictures/9292e0002d94681feaa2ed052f11995d.png) Gaussian components/groups. ![](./notes_pictures/6ee74a8824b43daec497b09ef0cee1d2.png) is an Gaussian observation, ![](./notes_pictures/6001bd300c1a1befc3da0d7f02b5438f.png) is the hidden group label, ![](./notes_pictures/702bcfc93e86474877319c79449f2f39.png) is the component weights (or the group label distribution). ![](./notes_pictures/ad322b84f9d0c5cf02c3aeb4fe5781d9.png) are the observation distribution parameters. ![](./notes_pictures/8c087632bdd26f21799b9723227cabd4.png) and ![](./notes_pictures/3030fb4bd381ae861c94ec34459cf6bc.png) are prior parameters.  \n\nThe CPDs are: \n\n![](./notes_pictures/021aa5494602009ccb000e2e2f72ece3.png)\n\n\nWhere ![](./notes_pictures/8388b0d36d13c7dfd811fbb8496fe1eb.png) is the Normal-Inverse-Wishart distribution with parameter ![](./notes_pictures/1db1b69d55e01859084d2d413d15b124.png). ![](./notes_pictures/1ad7791ac511a6757ec0f9b5ea3a3f15.png) is a numeric vector representing the \"location parameter\", ![](./notes_pictures/2811f81f916c5df8b5c165dc20b419aa.png) is a symmetric positive definitive matrix representing the \"scale parameter\", ![](./notes_pictures/7fbcbab321e830bfeb617b6d679e8719.png) and ![](./notes_pictures/bf33aa428e818d7fec11a721452d05e9.png) are degree of freedoms. For a NIW sample ![](./notes_pictures/ad322b84f9d0c5cf02c3aeb4fe5781d9.png), it's density function is defined as: \n\n\n![](./notes_pictures/f7a665cfa84aa060bb9c7617a4de2195.png)\n\n\nA mixture model can be see as a combination of two \"prior-posterior\" structures(As shown in [Mindset](#mindset) graph ![](./notes_pictures/108ad30be304e27ce3a8b7dc70c74850.png)): One Categorical-Dirichlet structure ![](./notes_pictures/766a9d58d567ce613bf07667d3f04b10.png) for the hidden cluster labels. and one Gaussian-NIW structure ![](./notes_pictures/2517357a50a9803cc8e65e6c84aa4699.png) for the observation distribution. \n\nIn **bbricks** these two structures are initialized with a `\"CatDirichlet\"` object and a `\"GaussianNIW\"` object. To estimate ![](./notes_pictures/702bcfc93e86474877319c79449f2f39.png) and ![](./notes_pictures/288038e6b7d24534046146439af3fb71.png), we use the following EM procedure: \n\n1. E-step: calculate ![](./notes_pictures/7b3305185d668f7809a7ef8043057d2e.png) and the expected sufficient statistics![](./notes_pictures/b7a79bfbd68c5d360fdb345ad6dfc017.png) and ![](./notes_pictures/33dd49c5f43c8d68d0bbe71bb51b0bf3.png). \n2. M-step: Based on the expected sufficient statistics to get an MAP estimate of ![](./notes_pictures/288038e6b7d24534046146439af3fb71.png) and ![](./notes_pictures/702bcfc93e86474877319c79449f2f39.png) \n\nR code: \n\n```R \n## Get the MAP estimate of pi and theta using EM algorithm. \n \nlibrary(bbricks) \n \n## load some mixture of Gaussian samples. \n## mmData is a numeric matrix with 2 columns, each row is a sample \n## see ?mmData for details \ndata(mmData) \nK \u003c- 4L                                 #number of clusters(mixtures components) \nz \u003c- matrix(runif(nrow(mmData)*K),nrow(mmData),K) #the expected cluster label of each observation \nallK \u003c- 1L:K    #temp variable, all component labels \nallZ \u003c- rep(allK,each=nrow(mmData))     #temp variable, all possible cluster labels for all observations \n## z, pi and alpha are distributed as a Categorical-Dirichlet sturcture: \nmc \u003c- CatDirichlet(gamma = list(alpha=0.5,uniqueLabels=allK)) # create a CatDirichlet object to track the posterior info, see ?CatDirichlet for details \n## each component distribution is a Gaussian-NIW structure: \nec \u003c- replicate(K,GaussianNIW(gamma = list(m=c(0,0),k=0.00001,v=2,S=diag(2)))) # create a GaussianNIW object to track the posterior info of each mixture component, see ?GaussianNIW for details \nmcMAP \u003c- MAP(mc)                        #initialize the MAP estimate of pi \necMAP \u003c- replicate(K,list(muMAP=runif(2),sigmaMAP=diag(2)),simplify = FALSE) #initialize the MAP estimate of theta \n## The main EM loop \nmaxit \u003c- 100                            #number of EM loops \nit \u003c- 1 \nwhile(it\u003c=maxit){ \n    ## E-step--------------------------------------------------------- \n    ## calculate the expected cluster labels: p(z|pi,theta) \n    for(k in allK) z[,k] \u003c- dGaussian(x=mmData,mu = ecMAP[[k]]$muMAP,Sigma=ecMAP[[k]]$sigmaMAP)+log(mcMAP[k]) \n    z \u003c- exp(z-logsumexp(z))            #use logsumexp() to avoid numerical underflow \n    ## calculate the expected sufficient statistics \n    ssComponents \u003c- lapply(allK,function(k){ \n        sufficientStatistics_Weighted(obj = ec[[k]],x=mmData,w=z[,k]) \n    })                                  #the expected sufficient statistics of each Gaussian component \n    ssPi \u003c- sufficientStatistics_Weighted(obj = mc,x=allZ,w=as.vector(z)) #the expected sufficient statistics of the cluster label distribution \n    ## M-step--------------------------------------------------------- \n    ## use the sufficient statistics to update the prior distributions: \n    for(k in allK) posterior(obj = ec[[k]],ss=ssComponents[[k]]) #update component distributions \n    posterior(obj = mc,ss = ssPi)                                #update cluster label distribution \n    ## calculate the MAP estimates from posterior: \n    mcMAP \u003c- MAP(mc) \n    ecMAP \u003c- lapply(ec,MAP) \n    ## Reset the priors for next EM loop----------------------------------------- \n    ## to prepare for the next EM iteration, discard the sufficient statistics info from the posteriors: \n    for(k in allK) posteriorDiscard(obj = ec[[k]],ss=ssComponents[[k]]) \n    posteriorDiscard(obj = mc,ss = ssPi) \n    ## increase the iteration counter \n    it \u003c- it+1 \n} \n \nplot(mmData,col=apply(z,1,which.max)) #plot the best estimates \nmcMAP                                 #the MAP estimate of pi \necMAP                                 #the MAP estimate of theta_z \n```\n\n \n\n#### Dirichlet Process Mixture Model \n\nThe graph structure of Dirichlet Process Mixture Model(DP-MM) is exactly the same as a standard mixture model, except that the number of mixture components is not predetermined: \n\n![](./notes_pictures/mixtureModelDP.png) \n\nThe CPDs of a DP-MM is similar to the ones shown in [Mixture of Gaussian](#mixture-of-gaussian), the only difference is the distribution of ![](./notes_pictures/fb84d7e0ac23c9ae0d924dc072ed5951.png) is a Dirichlet process rather than a Dirichlet distribution, for example if the observations are Gaussian distributed, the CPDs will be: \n\n![](./notes_pictures/b4360645e2b2df390b1e00b49aa73741.png)\n\nWhere ![](./notes_pictures/973bf3270f5fa1867068ed9f3c9629b3.png) is a Dirichlet process on positive integers with \"concentration parameter\" ![](./notes_pictures/8c087632bdd26f21799b9723227cabd4.png), the \"base measure\", which is an uniform distribution on positive integers, is omitted from the formula.  ![](./notes_pictures/8388b0d36d13c7dfd811fbb8496fe1eb.png) is the Normal-Inverse-Wishart distribution with parameter ![](./notes_pictures/1db1b69d55e01859084d2d413d15b124.png). ![](./notes_pictures/1ad7791ac511a6757ec0f9b5ea3a3f15.png) is a numeric vector representing the \"location parameter\", ![](./notes_pictures/2811f81f916c5df8b5c165dc20b419aa.png) is a symmetric positive definitive matrix representing the \"scale parameter\", ![](./notes_pictures/7fbcbab321e830bfeb617b6d679e8719.png) and ![](./notes_pictures/bf33aa428e818d7fec11a721452d05e9.png) are degree of freedoms. \n\nA DP-MM can be see as a combination of two \"prior-posterior\" structures(As shown in [Mindset](#mindset) graph ![](./notes_pictures/108ad30be304e27ce3a8b7dc70c74850.png)): One Categorical-DirichletProcess structure for the hidden cluster label distribution ![](./notes_pictures/766a9d58d567ce613bf07667d3f04b10.png), which we call it a \"DP on positive integers\". And one structure for the observation distribution ![](./notes_pictures/e4a78df2fa27323c74d002b1ad89d277.png). \n\nIn **bbricks**, \"DP on positive integers\" is represented by an object of type `\"CatDP\"`. \n\nTo further simplify the calculations, **bbricks** also provides an `\"DP\"` type to represent all Dirichlet process structures. An object of type  `\"DP\"` is in essence a combination of a `\"CatDP\"` object, which encodes the ![](./notes_pictures/766a9d58d567ce613bf07667d3f04b10.png) structure, i.e. a Dirichlet process on positive integers, and an arbitrary `\"BasicBayesian\"` object, which encodes the ![](./notes_pictures/e4a78df2fa27323c74d002b1ad89d277.png) structure. (in **bbricks**, all models with same structure as [Mindset](#mindset) graph ![](./notes_pictures/108ad30be304e27ce3a8b7dc70c74850.png) are `\"BasicBayesian\" `s, such as `\"GaussianNIW\"`, `\"GaussianNIG\"`, `\"CatDirichlet\"` and even `\"CatDP\"`) . \n\nTo estimate ![](./notes_pictures/c43241284b491b3c59a9f17294572c88.png), we use the following collapse Gibbs sampling procedure: \n\n1. sample ![](./notes_pictures/3ecc9f164107d2f1822fa26ea2f42a04.png) from ![](./notes_pictures/20991bf82fbce7e76c077e33f7d41f85.png). \n\nR code: \n\n```R \n## Learn DP-MM posteriors using Gibbs sampling \n \nlibrary(bbricks) \n \n## load some mixture of Gaussian samples. \n## mmData is a numeric matrix with 2 columns, each row is a sample \n## see ?mmData for details \ndata(mmData) \nmaxit \u003c- 100                            #number of total samples \nburnin \u003c- 50                            #number of burnin samples \n## Step1: Initialization ----------------------------------------- \nobj \u003c- DP(gamma = list(alpha=10,H0aF=\"GaussianNIW\",parH0=list(m=c(0,0),k=0.001,v=2,S=diag(2)))) #create a DP object to track all the changes, the DP object in this case is a combination of a CatDP object and a GaussianNIW object \nz \u003c- matrix(1L,nrow(mmData),maxit-burnin)    #place-holder for the sampled z \nss \u003c- sufficientStatistics(obj,x=mmData,foreach = TRUE) #sufficient statistics of each observed sample \nN \u003c- nrow(mmData) \nfor(i in 1L:N){ # initialize labels before Gibbs sampling \n    z[i,1] \u003c- rPosteriorPredictive(obj = obj,n=1,x=mmData[i,,drop=FALSE]) \n    posterior(obj = obj,ss = ss[[i]], z = z[i,1]) \n} \n## Step2: Main Gibbs sampling loop-------------------------------- \nit \u003c- 1                                 #iteration tracker \npb \u003c- txtProgressBar(min = 0,max = maxit,style = 3) \nwhile(it\u003c=maxit){ \n    if(it\u003eburnin) colIdx \u003c- it-burnin \n    else colIdx \u003c- 1 \n    for(i in 1L:N){ \n        ## remove the sample information from the posterior \n        posteriorDiscard(obj = obj,ss = ss[[i]],z=z[i,colIdx]) \n        ## get a new sample \n        z[i,colIdx] \u003c- rPosteriorPredictive(obj = obj,n=1,x=mmData[i,,drop=FALSE]) \n        ## add the new sample information to the posterior \n        posterior(obj = obj,ss = ss[[i]],z=z[i,colIdx]) \n    } \n    if(it\u003eburnin \u0026 colIdx\u003cncol(z)) z[,colIdx+1] \u003c- z[,colIdx] #copy result of previous iteration \n    it \u003c- it+1 \n    setTxtProgressBar(pb,it) \n    if(it\u003e=maxit){cat(\"\\n\");break} \n    plot(x=mmData[,1],y=mmData[,2],col=z[,colIdx]) #to see how the labels change in each iteration \n} \n## Step3: Estimate group labels of each observation--------------- \n## pick the most frequent z as the best estimate \nzBest \u003c- apply(z,1,function(l){ \n    tmp \u003c- table(l) \n    names(tmp)[which.max(tmp)] \n}) \nplot(x=mmData[,1],y=mmData[,2],col=zBest) \n```\n\n \n\n#### Mixture Model with Partially Observed Cluster Labels \n\nIn the dataset `mmData` of the previous example, what if we know the 50, 100, 150 and 200th samples belong to 4 different clusters(they are shown as different color and shapes in the graph below), how should we incorporate this information in the model? \n\n![](./notes_pictures/mixtureModelPO.png) \n\nWith DP-MM, one only need to **1.** update the DP prior (as defined in previous R example) with the information of the 4 observed samples, and **2.** use the updated prior as the prior of the Gibbs sampling procedure.  These 2 steps can be achieved by adding following code after `obj \u003c- DP(...)` in the previous R example: \n\n```R \n## 1. add the information of the 4 observed samples to the DP object \nssObserved \u003c- sufficientStatistics(obj=obj,x=mmData[c(50,100,150,200),,drop=FALSE],foreach = TRUE) \nfor(i in 1L:4L) posterior(obj = obj,ss = ssObserved[[i]], z = i) # the choice of cluster label 'z' for the 4 observed samples are arbitrary, as long as they are different from each other. In this example I simply use z=1L:4L. \n## 2. remove the 4 samples from the upcoming Gibbs sampling procedure \nmmData \u003c- mmData[-c(50,100,150,200),] \n```\n\nRun the code, and the result will be: \n\n![](./notes_pictures/mixtureModelPO2.png) \n\n \n\n### Hierarchical Mixture Models \n\nIn a hierarchical mixture model, the observation ![](./notes_pictures/6ee74a8824b43daec497b09ef0cee1d2.png) are generated by some unknown mixture components and are split into ![](./notes_pictures/99ef749f3b359c9935e5f9806f12f7ea.png) groups, all ![](./notes_pictures/99ef749f3b359c9935e5f9806f12f7ea.png) groups share the same set of mixture components but with different mixture weights. \n\nHierarchical Dirichlet Process(HDP) is a natural representation of a hierarchical mixture model, It has following graph structure: \n\n![](./notes_pictures/hierarchicalMixtureModel.png) \n\nIf the component distribution is Gaussian, the CPDs will be: \n\n![](./notes_pictures/07789b5a2ceae981ef5306cbeab71b40.png)\n\nWhere ![](./notes_pictures/bf864399590af71694003acc86b1177c.png) is a Dirichlet process on positive integers with \"concentration parameter\" ![](./notes_pictures/3030fb4bd381ae861c94ec34459cf6bc.png), the \"base measure\", which is an uniform distribution on positive integers, is omitted from the formula.  ![](./notes_pictures/746bc1338258258044ed695cb3893b47.png) is a Dirichlet process with concentration parameter ![](./notes_pictures/8c087632bdd26f21799b9723227cabd4.png) and base measure ![](./notes_pictures/e2dfb9de34646ef66c3ad131580c8ede.png). ![](./notes_pictures/633fecc6c70e1dbb0c3765b388c9607b.png) is the Normal-Inverse-Wishart distribution with parameter ![](./notes_pictures/89ddf9f6873006f15de5f63f5aba8c89.png). ![](./notes_pictures/1ad7791ac511a6757ec0f9b5ea3a3f15.png) is a numeric vector representing the \"location parameter\", ![](./notes_pictures/2811f81f916c5df8b5c165dc20b419aa.png) is a symmetric positive definitive matrix representing the \"scale parameter\", ![](./notes_pictures/7fbcbab321e830bfeb617b6d679e8719.png) and ![](./notes_pictures/bf33aa428e818d7fec11a721452d05e9.png) are degree of freedoms.The distribution of ![](./notes_pictures/5de00bb6a0cc1c18e6bee6e5f0b86210.png) is a \"HDP on positive integers\".  \n\nHDP are usually represented in a much simpler and compact way(though not easier to use in practice, especially when generating random samples from HDP) in most literature: \n\n![](./notes_pictures/HDP.png) \n\nFrom the compact representation we can see that HDP is following the \"Hierarchical Bayesian\" structure shown in [Mindset](#mindset) graph ![](./notes_pictures/aa6d91472b50a7d81415d654a507a89b.png).  \n\nIn **bbricks**, \"HDP on positive integers\" is represented by an object of type `\"CatHDP\"`. To further simplify the mixture model calculations, **bbricks** alos provides an `\"HDP\"` type to represent more general hierarchical Dirichlet process models. An object of type  `\"HDP\"` is in essence a combination of a `\"CatHDP\"` object, which encodes the distribution of ![](./notes_pictures/5de00bb6a0cc1c18e6bee6e5f0b86210.png), i.e. a HDP on positive integers; and an arbitrary `\"BasicBayesian\"` object, which encodes the ![](./notes_pictures/a146e4268e395dd62364466c366ce5cb.png) structure. (in **bbricks**, all models with same structure as [Mindset](#mindset) graph ![](./notes_pictures/108ad30be304e27ce3a8b7dc70c74850.png) are `\"BasicBayesian\" `s, such as `\"GaussianNIW\"`, `\"GaussianNIG\"` ,`\"CatDirichlet\"` and even `\"CatDP\"`)  \n\nTo estimate ![](./notes_pictures/7fbcbab321e830bfeb617b6d679e8719.png), we use the following Gibbs sampling procedure: \n\n1. sample ![](./notes_pictures/528e7c42fc24bbeab23d4e4897c2320b.png) from ![](./notes_pictures/ae4dc5138f5dd4968dc86cf15ee30afc.png). \n\nR code: \n\n```R \n## Learn HDP-MM posteriors using Gibbs sampling \n \nlibrary(bbricks) \n \n## load some mixture of Gaussian samples. \n## mmhData is a list of two elements. mmhData$x is a matrix of Gaussian observations, each row is an observation; mmhData$groupLabel is the group label of each observation. \n## see ?mmhData for details \ndata(mmhData) \nx \u003c- mmhData$x \njs \u003c- mmhData$groupLabel \n## Step1: Initialization------------------------------------------ \nmaxit \u003c- 50                             #iterative for maxit times \nburnin \u003c- 30                            #number of burn in samples \n## create a HDP object to track all the changes, the HDP object in this case is a combination of a CatHDP object and a GaussianNIW object: \nobj \u003c- HDP(gamma = list(gamma=1,j=max(js),alpha=1, \n                        H0aF=\"GaussianNIW\", \n                        parH0=list(m=c(0,0),k=0.001,v=2,S=diag(2)*0.01))) \nss \u003c- sufficientStatistics(obj$H,x=x,foreach = TRUE) #sufficient statistics \nset.seed(1) \nz \u003c- rep(1L,nrow(x)) \nk \u003c- matrix(1L,nrow(x),maxit-burnin)    #place-holder for the sampled k \nN \u003c- length(ss) \nfor(i in 1L:N){# initialize k and z \n    tmp \u003c- rPosteriorPredictive(obj = obj,n=1,x=x[i,,drop=FALSE],j=js[i]) \n    z[i] \u003c- tmp[\"z\"] \n    k[i,1] \u003c- tmp[\"k\"] \n    posterior.HDP(obj = obj,ss = ss[[i]],ss1 = k[i],ss2 = z[i],j = js[i]) \n} \n## Step2: main Gibbs loop--------------------------------------------- \nit \u003c- 1                                 #iteration tracker \npb \u003c- txtProgressBar(min = 0,max = maxit,style = 3) \nwhile(it\u003c=maxit){ \n    if(it\u003eburnin) colIdx \u003c- it-burnin \n    else colIdx \u003c- 1 \n    for(i in 1L:N){ \n        ## remove the sample from the posterior info \n        posteriorDiscard(obj = obj,ss = ss[[i]],ss1=k[i,colIdx],ss2=z[i],j=js[i]) \n        ## resample a new partition \n        tmp \u003c- rPosteriorPredictive(obj = obj,n=1,x=x[i,,drop=FALSE],j=js[i]) \n        z[i] \u003c- tmp[\"z\"] \n        k[i,colIdx] \u003c- tmp[\"k\"] \n        ## add the information of the new sample \n        posterior(obj = obj,ss = ss[[i]], ss1=k[i,colIdx],ss2 = z[i],j=js[i]) \n    } \n    if(it\u003eburnin \u0026 colIdx\u003cncol(k)) k[,colIdx+1] \u003c- k[,colIdx] #copy result of previous iteration \n    it \u003c- it+1 \n    plot(x=x[,1],y=x[,2],col=k[,colIdx])         #to visualize the group label dynamics \n    setTxtProgressBar(pb,it) \n} \n## Step3: Estimate group labels of each observation--------------- \n## pick the most frequent k as the best estimate \nkBest \u003c- apply(k,1,function(l){ \n    tmp \u003c- table(l) \n    names(tmp)[which.max(tmp)] \n}) \nplot(x=x[,1],y=x[,2],col=kBest) \n```\n\n \n\n#### Topic Modeling with HDP \n\nA topic model is a hierarchical mixture model(See [Hierarchical Mixture Models](#hierarchical-mixture-models)) with categorical component distribution: \n\n![](./notes_pictures/e6522aa186381143de48538a8ea25442.png)\n\nThe Gibbs sampling procedure on this model is exactly the same as the one in [Hierarchical Mixture Models](#hierarchical-mixture-models) \n\nR code: \n\n```R \n## Learn HDP-LDA posteriors of the farm-ads corpus \n \n## load a subset of farm ads data from https://archive.ics.uci.edu/ml/datasets/Farm+Ads \n## see ?farmadsData for details \ndata(farmadsData) \nword \u003c- farmadsData$word \ndocument \u003c- farmadsData$document \n## Step1: Initialization------------------------------------------ \nset.seed(1) \nmaxit \u003c- 30                            #iterative for maxit times \nz \u003c- rep(1L,length(word)) \nk \u003c- rep(1L,length(word)) \n## initialize \nuniqueWords \u003c- unique(word) \nobj \u003c- HDP(gamma = list(gamma=1,j=max(document),alpha=1,H0aF=\"CatDirichlet\",parH0=list(alpha=rep(0.5,length(uniqueWords)),uniqueLabels=uniqueWords))) #create a HDP object to track all the changes, the HDP object in this case is a combination of a CatHDP object and a CatDrirchlet object \nN \u003c- length(word) \n## initialize k and z \nfor(i in 1L:N){ \n    tmp \u003c- rPosteriorPredictive(obj = obj,n=1,x=word[i],j=document[i]) \n    z[i] \u003c- tmp[\"z\"] \n    k[i] \u003c- tmp[\"k\"] \n    posterior(obj = obj,ss = word[i], ss2 = z[i],j=document[i],ss1=k[i]) \n} \n## Step2: main Gibbs loop--------------------------------------------- \nit \u003c- 1                                 #iteration tracker \npb \u003c- txtProgressBar(min = 0,max = maxit,style = 3) \nwhile(it\u003c=maxit){ \n    for(i in 1L:N){ \n        posteriorDiscard.HDP(obj = obj,ss = word[i],ss1=k[i],ss2=z[i],j=document[i]) #remove the sample information from the posterior \n        tmp \u003c- rPosteriorPredictive(obj = obj,n=1,x=word[i],j=document[i])   #get a new sample \n        z[i] \u003c- tmp[\"z\"] \n        k[i] \u003c- tmp[\"k\"] \n        posterior(obj = obj,ss = word[i],ss1=k[i], ss2 = z[i],j=document[i]) #add the new sample information to the posterior \n    } \n    it \u003c- it+1 \n    setTxtProgressBar(pb,it) \n} \n## Step3: plot the result -------------------------------------------- \n## see which topics are most frequently appeared: \norder(sapply(obj$X,function(l){sum(l$gamma$alpha)}),decreasing = TRUE) \n## seems topic 2 and 1 appear the most, let's plot them: \n## install.packages(\"wordcloud\") # for word-cloud \n## install.packages(\"RColorBrewer\") # color palettes \n## print topic 1 \nwordcloud:: wordcloud(words = obj$X[[1]]$gamma$uniqueLabels, \n                      freq = obj$X[[1]]$gamma$alpha, \n                      min.freq = 1, \n                      max.words=100, \n                      random.order=FALSE, rot.per=0.35, \n                      colors=RColorBrewer::brewer.pal(5, \"Set1\")) \n## print topic 2 \nwordcloud:: wordcloud(words = obj$X[[2]]$gamma$uniqueLabels, \n                      freq = obj$X[[2]]$gamma$alpha, \n                      min.freq = 1, \n                      max.words=100, \n                      random.order=FALSE, rot.per=0.35, \n                      colors=RColorBrewer::brewer.pal(5, \"Set1\")) \n```\n\n### Hierarchical Mixture Models with Two Layers of Hierarchies \n\nBy extending the [Hierarchical Mixture Models](#hierarchical-mixture-models) with an additional layer of Dirichlet process on top, the model is then able to capture longer-term and cross-group dependencies. \nIn this hierarchical mixture model, the observation ![](./notes_pictures/6ee74a8824b43daec497b09ef0cee1d2.png) are generated by some unknown mixture components and are split into ![](./notes_pictures/e434d6480f41e67ddb646279d7706030.png) groups, each group has ![](./notes_pictures/49ecf1bdd94f40959c1cda6710617950.png) sub-groups. All ![](./notes_pictures/e434d6480f41e67ddb646279d7706030.png) groups share the same set of mixture components but with different mixture weights, all ![](./notes_pictures/8659a5b0c07b0499cdc736872f6d1ac8.png) sub-groups within each group ![](./notes_pictures/1ad7791ac511a6757ec0f9b5ea3a3f15.png) also share the same set of mixture components but different mixture weights. The graphical model structure is: \n![](./notes_pictures/hierarchicalMixtureModel2.png) \nIf the component distribution is Gaussian, the CPDs will be: \n\n![](./notes_pictures/3ca2652eeb24429c29d5c4871de18207.png)\n\n\nWhere ![](./notes_pictures/5ea3a7b5edfb5c973ee6160f48d1195b.png) is a Dirichlet process on positive integers with \"concentration parameter\" ![](./notes_pictures/17f2575568e80a424c85237bb02bd9c0.png), the \"base measure\", which is an uniform distribution on positive integers, is omitted from the formula. ![](./notes_pictures/38e522a4b5183cfec24467b2ad4f97f1.png) is a Dirichlet process with concentration parameter ![](./notes_pictures/3030fb4bd381ae861c94ec34459cf6bc.png) and base measure ![](./notes_pictures/e2dfb9de34646ef66c3ad131580c8ede.png). ![](./notes_pictures/a45cf561b43c589cb37c1481c00c3cbd.png) is a Dirichlet process with concentration parameter ![](./notes_pictures/8c087632bdd26f21799b9723227cabd4.png) and base measure ![](./notes_pictures/9b7eb01b3e83539bdbdcdd4293c3af66.png). ![](./notes_pictures/633fecc6c70e1dbb0c3765b388c9607b.png) is the Normal-Inverse-Wishart distribution with parameter ![](./notes_pictures/89ddf9f6873006f15de5f63f5aba8c89.png). ![](./notes_pictures/1ad7791ac511a6757ec0f9b5ea3a3f15.png) is a numeric vector representing the \"location parameter\", ![](./notes_pictures/2811f81f916c5df8b5c165dc20b419aa.png) is a symmetric positive definitive matrix representing the \"scale parameter\", ![](./notes_pictures/7fbcbab321e830bfeb617b6d679e8719.png) and ![](./notes_pictures/bf33aa428e818d7fec11a721452d05e9.png) are degree of freedoms. \n\nThe distribution of ![](./notes_pictures/981245f95d706bfa900fda8ed3fd6e07.png) is a \"HDP on positive integers with two layers of hierarchies\". Like the `\"CatHDP\"` object mentioned in  [Hierarchical Mixture Models](#hierarchical-mixture-models), **bbricks** use a `\"CatHDP2\"` object to represent a \"HDP on positive integers with two layers of hierarchies\". \n\nTo simplify the mixture model calculations, **bbricks** also provides an `\"HDP2\"` type to represent all hierarchical Dirichlet process with two layers of hierarchies. An object of type  `\"HDP2\"` is in essence a combination of a `\"CatHDP2\"` object, which encodes the distribution of ![](./notes_pictures/981245f95d706bfa900fda8ed3fd6e07.png), i.e. a HDP on positive integers with two layers of hierarchies; and an arbitrary `\"BasicBayesian\"` object, which encodes the ![](./notes_pictures/b2116efc097ef12c669479890d894c1f.png) structure. (in **bbricks**, all models with same structure as [Mindset](#mindset) graph ![](./notes_pictures/108ad30be304e27ce3a8b7dc70c74850.png) are `\"BasicBayesian\" `s, such as `\"GaussianNIW\"`, `\"GaussianNIG\"`, `\"CatDirichlet\"` and even `\"CatDP\"`)   \n\nTo estimate ![](./notes_pictures/f42ab172f591be89e7c5b52dd2c7c922.png), we use the following Gibbs sampling procedure: \n\n1. sample ![](./notes_pictures/71725cc1861b1ecd402ea72b74a51787.png) from ![](./notes_pictures/3c95514f9821bc5d1925a485c39a425d.png). \n\nR code: \n\n```R \n## Learn HDP2-MM posteriors using Gibbs sampling \n \nlibrary(bbricks) \n \n## load some mixture of Gaussian samples. \n## mmhData is a list of 3 elements. mmhhData$x is a matrix of Gaussian observations, each row is an observation; mmhhData$groupLabel is the group label of each observation. mmhhData$subGroupLabel is the subgroup label of each observation. \n## there are 2 groups, group1 has 10 subgroups, group has 20 subgroups. \n## see ?mmhhData for details \ndata(mmhhData) \nx \u003c- mmhhData$x \ngroupLabel \u003c- mmhhData$groupLabel \nsubGroupLabel \u003c- mmhhData$subGroupLabel \n## Step1: Initialization------------------------------------------ \nmaxit \u003c- 50                            #iterative for maxit times \nburnin \u003c- 20                           #number of burn in samples \n## create a HDP2 object to track all the changes, the HDP2 object in this case is a combination of a CatHDP2 object and a GaussianNIW object \nobj \u003c- HDP2(gamma = list(eta=1,gamma=1,alpha=1,m=2L,j=c(10L,20L),H0aF=\"GaussianNIW\",parH0=list(m=c(0,0),k=0.001,v=1.1,S=diag(2)*0.001))) \nss \u003c- sufficientStatistics(obj$H,x=x,foreach = TRUE) #sufficient statistics \nz \u003c- rep(1L,nrow(x)) \nk \u003c- rep(1L,nrow(x)) \nu \u003c- matrix(1L,nrow(x),maxit-burnin) \nN \u003c- length(ss) \n## initialization k, z and u \nfor(i in 1L:N){ \n    tmp \u003c- rPosteriorPredictive(obj = obj,n=1,x=x[i,,drop=FALSE],m=groupLabel[i],j=subGroupLabel[i]) \n    z[i] \u003c- tmp[\"z\"] \n    k[i] \u003c- tmp[\"k\"] \n    u[i,1] \u003c- tmp[\"u\"] \n    posterior.HDP2(obj = obj,ss = ss[[i]],ss1 = u[i,1],ss2 = k[i],ss3 = z[i],m=groupLabel[i],j = subGroupLabel[i]) \n} \n## Step2: main Gibbs loop--------------------------------------------- \nit \u003c- 1                                 #iteration counter \npb \u003c- txtProgressBar(min = 0,max = maxit,style = 3) \nwhile(it\u003c=maxit){ \n    if(it\u003eburnin) colIdx \u003c- it-burnin \n    else colIdx \u003c- 1 \n    for(i in 1L:N){ \n        ## remove the sample from the posterior info \n        posteriorDiscard(obj = obj,ss = ss[[i]],ss1=u[i,colIdx],ss2=k[i],ss3 = z[i],m=groupLabel[i],j=subGroupLabel[i]) \n        ## resample a new partition \n        tmp \u003c- rPosteriorPredictive(obj = obj,n=1L,x=x[i,,drop=FALSE],m=groupLabel[i],j=subGroupLabel[i]) \n        z[i] \u003c- tmp[\"z\"] \n        k[i] \u003c- tmp[\"k\"] \n        u[i,colIdx] \u003c- tmp[\"u\"] \n        ## add the information of the new sample \n        posterior(obj = obj,ss = ss[[i]], ss1=u[i,colIdx],ss2 = k[i],ss3 = z[i],m=groupLabel[i],j=subGroupLabel[i]) \n    } \n    if(it\u003eburnin \u0026 colIdx\u003cncol(u)) u[,colIdx+1] \u003c- u[,colIdx] #copy result of previous iteration \n    it \u003c- it+1 \n    plot(x=x[,1],y=x[,2],col=u[,colIdx]) \n    setTxtProgressBar(pb,it) \n} \n## Step3: Estimate group labels of each observation--------------- \n## pick the most frequent u as the best estimate \nuBest \u003c- apply(u,1,function(l){ \n    tmp \u003c- table(l) \n    names(tmp)[which.max(tmp)] \n}) \nplot(x=x[,1],y=x[,2],col=uBest) \n```\n\n \n\n#### Hierarchical Topic Modeling with HDP2 \n\nIf we want to extract topics from multiple corpus, and we want the corpus to share the same set of topics, this is a model called the \"hierarchical topic model\"(Griffiths, Thomas L., et al. \"Hierarchical topic models and the nested Chinese restaurant process.\" *Advances in neural information processing systems*. 2004.). A hierarchical topic model is a [Hierarchical Mixture Models with Two Layers of Hierarchies](#hierarchical-mixture-models-with-two-layers-of-hierarchies). The model can be built by simply replacing the `\"GaussianNIW\"` object with a `\"CatDirichlet\"` object in the previous R example. \n\n### Bayesian Linear Regression \n\nA Bayesian linear regression model has the following graph structure: \n\n![](./notes_pictures/bayesianLinearRegression.png) \n\nThe CPDs are: \n\n![](./notes_pictures/b167759334b433c84b8389a57bd921ac.png)\n\nSince the combination of \"Gaussian\" and \"InvGamma\" is a commonly used conjugate structure in Bayesian linear regression, people named the combination as \"Normal-Inverse-Gamma\" (NIG) distribution. With NIG, the CPDs are usually represented as: \n\n![](./notes_pictures/508ec68424acb56238c076b777b7d42d.png)\n\nWhere ![](./notes_pictures/096669b8081d581cad6f1fc7759b8e5d.png) is the Normal-Inverse-Gamma distribution with parameter ![](./notes_pictures/9960743dface0fd79d9eace6dc51fca8.png), ![](./notes_pictures/1ad7791ac511a6757ec0f9b5ea3a3f15.png) and ![](./notes_pictures/1867a22a1d35934814239657946c8740.png) are the \"location\" and \"scale\" parameters, ![](./notes_pictures/e3eb72208322518fc9c033393f70ecfd.png) and ![](./notes_pictures/dcb85b995ab480787962e5fb09c8a04e.png) are the \"shape\" and \"rate\" parameters. \n\nThe distribution of ![](./notes_pictures/9ebab259439894b9afff4c770602240d.png) is a basic prior-posterior structure as shown in [Mindset](#mindset) graph ![](./notes_pictures/108ad30be304e27ce3a8b7dc70c74850.png). **bbricks** provides an object of type `\"GaussianNIG\"` to represent such a structure.  \n\nSee the R example below for applying MAP estimate, posterior predictive, and marginal likelihood calculations on the `\"GaussianNIG\"` object: \n\n```R \n## Learn posteriors of the Bayesian linear regression model \n \nlibrary(bbricks) \n \n## lrData is a list of two elements. lrData$x is the sample set of the dependent variable; lrData$X is the sample set of the independent variable \n## see ?lrData for details \ndata(lrData) \nX \u003c- lrData$X                           #a matrix of 1 column \nx \u003c- lrData$x                           #a numeric vector \n## task 1. update the prior into posterior using X and x \nobj \u003c- GaussianNIG(gamma=list(m=0,V=1,a=1,b=0)) #create a GaussianNIG object \nss \u003c- sufficientStatistics(obj = obj,X=X,x=x)   #the sufficient statistics of X and x \nposterior(obj = obj,ss = ss)                    #add the infomation to the posterior \n## task 2. get MAP estimate of beta and sigma^2 from the posterior \nbsMAP \u003c- MAP(obj)                               #get the MAP estimate of beta and sigma^2 \nbsMAP                                           #print the MAP estimate \n## plot the MAP estimate of the regression line \nplot(X,X%*%bsMAP$betaMAP,type = \"l\") \npoints(X,x,pch=20) \n## task 3. calculate marginal likelihood \n## generate some new data \nXnew \u003c- matrix(runif(3,min=0,max=),ncol=1) \nxnew \u003c- Xnew*0.2+rnorm(3,sd=10) \nmarginalLikelihood(obj = obj,X=x,x=x,LOG = TRUE) \n## task 4. calculate the posterior prediction \n## say we want to predict x at the location X=100 \npredictedSamples \u003c- rPosteriorPredictive(obj = obj,X=matrix(101,ncol = 1),n=1000) \n## histogram of the prediction \nhist(predictedSamples) \n## the mean and standard devition of the prediction \nmean(predictedSamples) \nsd(predictedSamples) \n \n```\n\n \n\n### Hidden Markov Model (HMM) \n\nA Hidden Markov Model (HMM) has following graphical model structure: \n\n![](./notes_pictures/HMM.png) \n\nThe CPDs are: \n\n![](./notes_pictures/0b9c42bbdfa9dd83001e854d48626efb.png)\n\nWhere ![](./notes_pictures/9881787463e9660500f48e35324ca2a4.png) is the initial distribution, ![](./notes_pictures/1d20860c964e9eb814234b8d3ec6adab.png) are the transition distributions of each state. ![](./notes_pictures/71cfe4f8cb522a872b93181f5d014188.png) is the observation distribution, ![](./notes_pictures/9c98a87379f26becb2d1ce72907993a5.png) is the prior distribution of the ![](./notes_pictures/288038e6b7d24534046146439af3fb71.png)s.  \n\nA HMM with ![](./notes_pictures/99ef749f3b359c9935e5f9806f12f7ea.png) hidden states can be see as a combination of ![](./notes_pictures/49a6f854ca4aa09e55c3f47f2dd47de7.png) \"prior-posterior\" structures(As shown in [Mindset](#mindset) graph ![](./notes_pictures/108ad30be304e27ce3a8b7dc70c74850.png)): ![](./notes_pictures/99ef749f3b359c9935e5f9806f12f7ea.png) Categorical-Dirichlet structure ![](./notes_pictures/e1c63975edf1e177c8eceb1fd9e622a7.png) for the hidden state transitions, one Categorical-Dirichlet structure ![](./notes_pictures/16adf2c37312da23d12be4a741ff9b6a.png), and one `\"BasicBayesian\"` structure (such as Gaussian-NIW) ![](./notes_pictures/8fe6e2c42ba60c6c8e1a0ffc7b38250c.png) for the observation distribution.  \n\nIn **bbricks** the Categorical-Dirichlet structures are initialized with `\"CatDirichlet\"` objects. The choice of the `\"BasicBayesian\"` structure, i.e. the choice of ![](./notes_pictures/71cfe4f8cb522a872b93181f5d014188.png) and ![](./notes_pictures/9c98a87379f26becb2d1ce72907993a5.png), is flexible, for example if we set ![](./notes_pictures/ec6afc41cce614b022bb105452957194.png) and ![](./notes_pictures/c0bdd3e25f16964a386309b216b58abf.png), then the structure ![](./notes_pictures/8fe6e2c42ba60c6c8e1a0ffc7b38250c.png) can be initialized with an `\"GaussianNIW\"` object. \n\nFor demo purpose here shows an example using the **single-move sampling** discussed in Robert et al (1993), Frühwirth-Schnatter (2006). In the example the observations are generated by 3 two-dimensional Gaussian distribution. The R code is: \n\n```R \n## Learn HMM posteriors using single-move sampling \n \nlibrary(bbricks) \n \n## load some hidden markov data \n## see ?hmmData for details \ndata(hmmData) \nx \u003c- hmmData$x \nNsegs \u003c- hmmData$Nsegs                  #number of segments \nbreaks \u003c- hmmData$breaks                #break index of the segments \n## Step1: Initialization------------------------------------------ \n## create Categorical-Dirichlet object for pi0 and transition matrix \nK \u003c- 3L \nallk \u003c- 1L:K \ntransitionObj \u003c- replicate(K,CatDirichlet(gamma = list(alpha=1,uniqueLabels=allk))) \npiObj \u003c- CatDirichlet(gamma = list(alpha=1,uniqueLabels=allk)) \n## create Gaussian-NIW object for observations \nobsObj \u003c- replicate(K,GaussianNIW(gamma=list(m=c(0,0),k=0.001,v=3,S=diag(2)))) \n## place holder for the sampled hidden states \nz \u003c- integer(nrow(x)) \n## observation sufficient statistics \nssx \u003c- sufficientStatistics(obj = obsObj[[1]],x=x,foreach = TRUE) \n## initialize states with random assignment \nz \u003c- sample(allk,length(z),replace = TRUE) \nfor(i in 1L:Nsegs){ \n    segStart \u003c- breaks[i]+1L \n    segEnd \u003c- breaks[i+1] \n    for(j in segStart:segEnd){ \n        if(j==segStart) posterior(piObj,ss = z[j]) \n        else posterior(transitionObj[[z[j-1L]]],ss = z[j]) \n        posterior(obsObj[[z[j]]],ss = ssx[[j]]) \n    } \n} \n## Step2: main Gibbs loop--------------------------------------------- \nmaxit \u003c- 20 \nit \u003c- 1 \npb \u003c- txtProgressBar(min = 0,max = maxit,style = 3) \nwhile(it\u003c=maxit){ \n    ## for all segments \n    for(i in 1L:Nsegs){ \n        segStart \u003c- breaks[i]+1L \n        segEnd \u003c- breaks[i+1] \n        for(j in segStart:segEnd){ \n            ## discard previous z \n            if(j==segStart){ \n                posteriorDiscard(piObj,ss = z[j]) \n            }else if(j==segEnd){ \n                posteriorDiscard(transitionObj[[z[j-1L]]],ss = z[j]) \n            }else{ \n                posteriorDiscard(transitionObj[[z[j-1L]]],ss = z[j]) \n                posteriorDiscard(transitionObj[[z[j]]],ss = z[j+1L]) \n            } \n            posteriorDiscard(obsObj[[z[j]]],ss = ssx[[j]]) \n            ## sample new z \n            if(j==segStart) \n                pz \u003c- dPosteriorPredictive(obj = piObj,x=allk,LOG = TRUE) \n            else \n                pz \u003c- dPosteriorPredictive(obj = transitionObj[[z[j-1L]]],x=allk,LOG = TRUE) \n            if(j!=segEnd) \n                for(k in allk) pz[k] \u003c- pz[k]+dPosteriorPredictive(obj = obsObj[[k]],x=x[j,,drop=FALSE],LOG = TRUE)+dPosteriorPredictive(obj = transitionObj[[k]],x=z[j+1L],LOG = TRUE) \n            else \n                for(k in allk) pz[k] \u003c- pz[k]+dPosteriorPredictive(obj = obsObj[[k]],x=x[j,,drop=FALSE],LOG = TRUE) \n            pz \u003c- exp(pz-logsumexp(pz)) \n            currentK \u003c- sample.int(K,size=1,prob = pz) \n            z[j] \u003c- currentK \n            ## update prior with newly sampled z \n            if(j==segStart){ \n                posterior(piObj,ss = currentK) \n            }else if(j==segEnd){ \n                posterior(transitionObj[[z[j-1L]]],ss = currentK) \n            }else{ \n                posterior(transitionObj[[z[j-1L]]],ss = currentK) \n                posterior(transitionObj[[currentK]],ss = z[j+1]) \n            } \n            posterior(obsObj[[currentK]],ss = ssx[[j]]) \n        } \n    } \n    it \u003c- it+1 \n    setTxtProgressBar(pb,it) \n} \n## print the MAP estimates of the transition matrix \nfor(k in allk) print(round(MAP(transitionObj[[k]]),2)) \n## print the MAP estimates of the observation distributions \nfor(k in allk) print(MAP(obsObj[[k]])) \n## MAP estimates of the initial state distribution pi0 \nMAP(piObj) \n```\n\n### Infinite States Hidden Markov Model (HDP-HMM) \n\nAn infinite state hidden Markov model can be represented by a hierarchical Dirichlet process according to Teh (2005), the graphical model for HDP-HMM is: \n\n![](./notes_pictures/HDPHMM.png) \n\nThe CPDs are: \n\n![](./notes_pictures/4b0ce8ee44ebdd907a6ee7279639242d.png)\n\nLike the one described in [Hierarchical Mixture Models](#hierarchical-mixture-models), **bbricks** provides an object of type `\"HDP\"` to encode such structures. \n\nFor demo purpose I'll use the simple **single-move sampling** method discussed in Robert et al (1993), Frühwirth-Schnatter (2006) to learn the posterior distributions of the model parameters. See Van Gael et al (2008) for a more efficient implementation. \n\nThe R code is: \n\n```R \n## Learn HDP-HMM posteriors using single-move sampling \n \nlibrary(bbricks) \n \nsource(\"Programs/Codes/Bayesian Nonparametrics/Dirichlet_Process.r\") \n## load some hidden markov data \n## see ?hmmData for details \ndata(hmmData) \nx \u003c- hmmData$x \nNsegs \u003c- hmmData$Nsegs \nbreaks \u003c- hmmData$breaks \n## Step1: Initialization------------------------------------------ \ntransitionObj \u003c- HDP(gamma = list(gamma=1,alpha=1,j=1,H0aF=\"GaussianNIW\",parH0=list(m=c(0,0),k=0.001,v=3,S=diag(2)))) \n## place holder for the sampled hidden states \nzk \u003c- matrix(0L,ncol = 2,nrow = nrow(x)) \n## observation sufficient statistics \nssx \u003c- sufficientStatistics(obj = GaussianNIW(),x=x,foreach = TRUE) \n## sample k from p(k|gamma,x) \nsamplePi \u003c- function(x,zkAfter){ \n    ## allK \u003c- which(transitionObj$Z$Z1$gamma$nk\u003e0) \n    allK \u003c- which(transitionObj$Z$Z1$gamma$nk\u003e0) #all possible k \n    if(length(allK)==0) return(transitionObj$Z$Z1$gamma$newLabel) \n    ## p(k|pi) p(x|k) p(k_{2}|k) \n    logp \u003c- dPosteriorPredictive(transitionObj$Z$Z1,x=allK,LOG = TRUE) + \n        vapply(allK,function(k){ \n            dPosteriorPredictive(transitionObj$X[[k]],x=x,LOG = TRUE)+ \n            dPosteriorPredictive(obj = transitionObj,z=zkAfter[1],k=zkAfter[2],j=k,LOG = FALSE) \n        },FUN.VALUE = numeric(1)) \n         \n    logp \u003c- exp(logp-max(logp)) \n    allK[sample.int(length(allK),size = 1,prob = logp)] \n} \n## sample z and k from p(z,k|gamma,alpha,x) \nsampleZK \u003c- function(kBefore,zkAfter,x){ \n    ## p(z,k | k_{t-1}) p(x|k) \n    probs \u003c- dAllIndicators(obj = transitionObj,j=kBefore,x=x) \n    ##  \n    allK \u003c- unique(probs$k) \n \n    ## p(k_{t+1} | k) \n    p \u003c- vapply(allK,function(k){ \n        if(k==transitionObj$Z$Z1$gamma$newLabel) \n            dPosteriorPredictive(obj = transitionObj$Z$Z1,x=zkAfter[2],LOG = FALSE) \n        else \n            dPosteriorPredictive(obj = transitionObj,z=zkAfter[1],k=zkAfter[2],j=k,LOG = FALSE) \n    },FUN.VALUE = numeric(1)) \n    idx \u003c- sample.int(nrow(probs),size = 1,prob = p[match(probs$k,allK)]*probs$p) \n    c(z=probs$z[idx],k=probs$k[idx]) \n} \n## initialize states with forward assignment \nfor(i in 1L:Nsegs){ \n    segStart \u003c- breaks[i]+1L \n    segEnd \u003c- breaks[i+1] \n    for(j in segStart:segEnd){ \n        if(j==segStart){ \n            k \u003c- rPosteriorPredictive(obj = transitionObj$Z$Z1,n=1) \n            ## update initial and observation \n            posterior(obj = transitionObj,ss=ssx[[j]],ss1=k,ss2=1L,j=k) \n            zk[j,2] \u003c- k \n        }else{ \n            ## sample from the j(or k) th DP \n            zk1 \u003c- rPosteriorPredictive(obj = transitionObj,n=1,j=zk[j-1,2],x=x[j,,drop=FALSE]) \n            ## update transition and observation \n            posterior(obj = transitionObj,ss = ssx[[j]],ss1=zk1[2],ss2=zk1[1],j=zk[j-1,2]) \n            ## write to book \n            zk[j,] \u003c- zk1 \n        } \n    } \n} \n## Step2: main Gibbs loop--------------------------------------------- \nmaxit \u003c- 20 \nit \u003c- 1 \npb \u003c- txtProgressBar(min = 0,max = maxit,style = 3) \nwhile(it\u003c=maxit){ \n    ## for all segments \n    for(i in 1L:Nsegs){ \n        segStart \u003c- breaks[i]+1L \n        segEnd \u003c- breaks[i+1] \n        for(j in segStart:segEnd){ \n            ## discard previous z \n            if(j==segStart){ \n                ## discard 1 obs 1 initial and 1 transition \n                posteriorDiscard(obj = transitionObj,ss=ssx[[j]],ss1=zk[j,2],ss2=1L,j=zk[j,2]) \n                posteriorDiscard(obj = transitionObj,ss=NULL, ss1=zk[j+1,2],ss2=zk[j+1,1],j=zk[j,2]) \n                ## sample new initial k \n                k \u003c- samplePi(x=x[j,,drop=FALSE],zkAfter = zk[j+1,,drop=FALSE]) \n                ## update prior with newly sampled k \n                posterior(obj = transitionObj,ss=ssx[[j]],ss1=k,ss2=1L,j=k) \n                posterior(obj = transitionObj,ss=NULL, ss1=zk[j+1,2],ss2=zk[j+1,1],j=k) \n                ## write to book \n                zk[j,2] \u003c- k \n            }else if(j==segEnd){ \n                ## discard 1 obs and 1 transition \n                posteriorDiscard(obj = transitionObj,ss=ssx[[j]],ss1=zk[j,2],ss2=zk[j,1],j=zk[j-1,2]) \n                ## sample new z k \n                zk1 \u003c- rPosteriorPredictive(obj = transitionObj,n=1,x=x[j,,drop=FALSE],j=zk[j-1,2]) \n                ## update prior with newly sampled z \n                posterior(obj = transitionObj,ss=ssx[[j]],ss1=zk1[2],ss2=zk1[1],j=zk[j-1,2]) \n                ## write to book \n                zk[j,] \u003c- zk1 \n            }else{ \n                ## discard 2 transitions and 1 obs \n                posteriorDiscard(obj = transitionObj,ss=ssx[[j]],ss1=zk[j,2],ss2=zk[j,1],j=zk[j-1,2]) \n                posteriorDiscard(obj = transitionObj,ss=NULL, ss1=zk[j+1,2],ss2=zk[j+1,1],j=zk[j,2]) \n                ## sample new z k \n                zk1 \u003c- sampleZK(kBefore = zk[j-1,2],zkAfter = zk[j+1,],x=x[j,,drop=FALSE]) \n                ## update prior with newly sampled z k \n                posterior(obj = transitionObj,ss=ssx[[j]],ss1=zk1[2],ss2=zk1[1],j=zk[j-1,2]) \n                posterior(obj = transitionObj,ss=NULL, ss1=zk[j+1,2],ss2=zk[j+1,1],j=zk1[2]) \n                ## write to book \n                zk[j,] \u003c- zk1 \n            } \n        } \n    } \n    it \u003c- it+1 \n    setTxtProgressBar(pb,it) \n} \nallK \u003c- which(transitionObj$Z$Z1$gamma$nk\u003e0) \n## print the MAP estimates of the state transitions \nfor(k in allK) print(vapply(split(MAP(transitionObj$Z$Z2[[k]]),f=transitionObj$Z$Z12map[[k]]),sum,numeric(1))) \n## print the MAP estimates of the observation parameters \nfor(k in allK) print(MAP(transitionObj$X[[k]])) \n## print the MAP esitmate of the initial state distribution \nround(MAP(transitionObj$Z$Z1)[allK],2) \n \n```\n\n \n\n  \n\n## References \n\nAndrieu, Christophe, et al. \"An introduction to MCMC for machine learning.\" *Machine learning* 50.1-2 (2003): 5-43. \n\nJohnson, Valen E., and James H. Albert. *Ordinal data modeling*. Springer Science \u0026 Business Media, 2006. \n\nHoff, Peter D. *A first course in Bayesian statistical methods*. Vol. 580. New York: Springer, 2009. \n\nLi, Yuelin, Elizabeth Schofield, and Mithat Gönen. \"A tutorial on Dirichlet process mixture modeling.\" *Journal of Mathematical Psychology* 91 (2019): 128-144. \n\nMARolA, K. V., JT KBNT, and J. M. Bibly. Multivariate analysis. AcadeInic Press, Londres, 1979. \n\nMcLachlan, Geoffrey J., and Thriyambakam Krishnan. *The EM algorithm and extensions*. Vol. 382. John Wiley \u0026 Sons, 2007. \n\nMurphy, Kevin P. *Machine learning: a probabilistic perspective*. MIT press, 2012. \n\nRobert, Christian P., Gilles Celeux, and Jean Diebolt. \"Bayesian estimation of hidden Markov chains: A stochastic implementation.\" *Statistics \u0026 Probability Letters* 16.1 (1993): 77-83. \n\nFrühwirth-Schnatter, Sylvia. *Finite mixture and Markov switching models*. Springer Science \u0026 Business Media, 2006. \n\nSmith, W. B., and R. R. Hocking. *Algorithm as 53: Wishart variate generator.* Journal of the Royal Statistical Society. Series C (Applied Statistics) 21.3 (1972): 341-345. \n\nTeh, Yee Whye. \"Dirichlet Process.\" (2010): 280-287. \n\nTeh, Yee W., et al. \"Sharing clusters among related groups: Hierarchical Dirichlet processes.\" *Advances in neural information processing systems*. 2005. \n\nVan Gael, Jurgen, et al. \"Beam sampling for the infinite hidden Markov model.\" *Proceedings of the 25th international conference on Machine learning*. 2008. \n\nWishart, John. \"The generalized product moment distribution in samples from a normal multivariate population.\" Biometrika (1928): 32-52. \n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchenhaotian%2Fbayesian-bricks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchenhaotian%2Fbayesian-bricks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchenhaotian%2Fbayesian-bricks/lists"}