{"id":37081833,"url":"https://github.com/qwerty6191/projected-lmc","last_synced_at":"2026-01-14T09:58:14.442Z","repository":{"id":201223036,"uuid":"705738757","full_name":"QWERTY6191/projected-lmc","owner":"QWERTY6191","description":"A short package based on gpytorch implementing the Projected LMC model.","archived":false,"fork":false,"pushed_at":"2024-02-16T13:05:09.000Z","size":13193,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-06T00:47:51.992Z","etag":null,"topics":["gaussian-processes","multitask"],"latest_commit_sha":null,"homepage":"https://qwerty6191.github.io/projected-lmc/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QWERTY6191.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-10-16T15:43:32.000Z","updated_at":"2025-12-05T23:15:35.000Z","dependencies_parsed_at":null,"dependency_job_id":"e04d9e8d-264c-4b91-bcec-85f4da812fba","html_url":"https://github.com/QWERTY6191/projected-lmc","commit_stats":null,"previous_names":["qwerty6191/projected-lmc"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/QWERTY6191/projected-lmc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QWERTY6191%2Fprojected-lmc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QWERTY6191%2Fprojected-lmc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QWERTY6191%2Fprojected-lmc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QWERTY6191%2Fprojected-lmc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QWERTY6191","download_url":"https://codeload.github.com/QWERTY6191/projected-lmc/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QWERTY6191%2Fprojected-lmc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28416299,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T08:38:59.149Z","status":"ssl_error","status_checked_at":"2026-01-14T08:38:43.588Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gaussian-processes","multitask"],"created_at":"2026-01-14T09:58:13.677Z","updated_at":"2026-01-14T09:58:14.436Z","avatar_url":"https://github.com/QWERTY6191.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# projected-lmc\n\n## Requirements and installation\n\nThe package only requires a recent version of gpytorch and of its dependencies (in particular torch). Some auxiliary functions are imported from scikit-learn to perform efficient SVD.\nPackages `pandas` and `seaborn` are listed in requirements, but are only used to reproduce graphs from the article.\n\nTo install, simply do:\n```\npip install projectedlmc\n```\nIn case of an installation problem, you can also put file projected_lmc.py from the repo projectedlmc in your working directory and import everything from it.\n\nIn case the afferent github page doesn't work, the documentation can be built locally by downloading the `docs` folder and running the following command inside it (on linux, requires `sphinx` and `sphinx_rtd_theme` python modules to be installed):\n```\nmake html\n```\nand then going to _build/html/index.html.\n\n## Models construction and usage\n\nModels are built in the standard gpytorch way. File `experiments.py` (reproducing results from the article) gives all necessary examples, but we shortly restate them below.\n\n### Exact single-output model\n\nNot displayed in `experiments.py`, but this model is the building block from which exact LMC/IMC and Projected LMC inherit.\nFirst create a likelihood with :\n```\nlikelihood = gp.likelihoods.GaussianLikelihood()\n```\nThen create the model :\n```\nmodel = ExactGPModel(X, Y, likelihood, mean_type=Mean, kernel_type=kernel, decomp=decomp, ker_kwargs=ker_kwargs)\n```\n(All fields are described in the documentation).\nNote that this class can also generate an independent multitask GP (i.e a batch of independent single-task GPs trained simultaneously) with the optional argument `n_tasks`.\n\nTo go into training mode, do :\n```\nmodel.train()\nlikelihood.train()\n```\nSpecify a loss function, an optimizer, and optionnaly a scheduler, with :\n```\nmll = gp.mlls.ExactMarginalLogLikelihood(likelihood, model)\noptimizer = torch.optim.AdamW(model.parameters(), lr=lr)\nscheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=np.exp(np.log(lr_min / lr) / n_iter))\n```\n\nThe training loop then looks like :\n```\nfor i in range(n_iter):\n                optimizer.zero_grad()\n                with gp.settings.cholesky_jitter(1e-5):\n                    output_train = model(X)\n                    loss = -mll(output_train, Y)\n                    loss.backward()\n                    optimizer.step()\n                    scheduler.step()\n```\nAnd predictions can be made through :\n```\nmodel.eval()\nlikelihood.eval()\nobserved_pred = full_likelihood(model(X_test))\npred_y = observed_pred.mean\nlower, upper = observed_pred.confidence_region()\n```\nNote that, as for all other models, the helper methods `.lscales()` and `.outputscale()` help inspect the optimized parameters of the GP:\n```\nprint(model.lscales())\nprint(model.outputscale())\n```\n\n### Exact LMC or IMC model\n\nThe data arrays X and Y must have shape `n_points x n_dim` and `n_points x n_tasks` respectively, and predictions will have the same shape convention. All syntax is identical to this of the above single-output GP ; the only difference is in the likelihood, which is now a multitask one :\n```\nlikelihood = gp.likelihoods.MultitaskGaussianLikelihood(num_tasks=n_tasks, rank=lik_rank)\n```\nThe input \"rank\" is the rank of the cross-tasks noise covariance matrix ; `rank=0` corresponds to a diagonal matrix.\n\nThe model also has extra inputs compared to the previous case :\n```\nmodel = MultitaskGPModel(X, Y, likelihood, n_tasks=n_tasks, n_latents=n_lat, model_type='LMC', \n                    mean_type=Mean, kernel_type=kernel, decomp=decomp,\n                    init_lmc_coeffs=True, fix_diagonal=True, ker_kwargs=ker_kwargs)\n```\nOne of course has to specify the number of tasks and latent functions, but also the type of model between \"LMC\" and \"IMC\" (whether or not latent processes have different kernels), and additional optional parameters `init_lmc_coeffs` and `fix_diagonal` - covered in the documentation.    \n\n### variational model\n\nThe variational LMC model functions a bit differently. As before, one starts by defining a multitask likelihood :\n```\nlikelihood = gp.likelihoods.MultitaskGaussianLikelihood(num_tasks=n_tasks, rank=lik_rank)\n```\nAnd a model :\n```\nmodel = VariationalMultitaskGPModel(X, n_tasks=n_tasks, n_latents=n_lat_mod, train_ind_ratio=1.5, seed=0, \n                    distrib=gpytorch.variational.CholeskyVariationalDistribution,\n                    init_lmc_coeffs=True, train_y=Y, \n                    mean_type=Mean, kernel_type=kernel,  decomp=decomp, ker_kwargs=ker_kwargs)\n```\nHere, a `train_ind_ratio` (ratio between the number of training points and the smaller number of inducing points) must be specified as the inducing points approximation is used. Inducing points locations  are learned and initialized with a Sobol' sequence, which random scrambling is controlled by parameter `seed`. If the ratio is set to 1, the behavior is different : inducing points are fixed at the location of input points.\nA variational distribution `distrib` must also be specified - see the gpytorch documentation on this topic. The default `gpytorch.variational.CholeskyVariationalDistribution` is a safe pick in all cases.\nAnother difference with previous cases is that the `train_Y` input is not even necessary : it is only specified in the above example for LMC coefficients initialization. In the same vein, the model doesn't take a likelihood as an input : they remain separated.\n\nThe marginal log-likelihood is replaced by a lower bound, the ELBO:\n```\nmll = gp.mlls.VariationalELBO(likelihood, model, num_data=n_points)\n```\nAnd likelihood and model parameters are optimized separately :\n```\noptimizer = torch.optim.AdamW([{'params': model.parameters()}, {'params': likelihood.parameters()}], lr=lr)\n```\n\n### Projected models\n\nProjected LMC models, introduced in the afferent article, present other subtleties. First, the likelihood here is a batched gaussian likelihood, which dimension is the number of **latent processes** instead of this of tasks :\n```\nproj_likelihood = gpytorch.likelihoods.GaussianLikelihood(batch_shape=torch.Size([n_lat]))\n```\nOne is automatically generated by the model if none is provided at instantiation. This latent-level likelihood, independent over latent processes, corresponds to the *projected noise* in the article.\nIt will be stored in the model attribute `.likelihood`, while the full task-level likelihood can be generated by the method `.full_likelihood()`.\n\nThe model has extra options, corresponding to the simplifications depicted in the article (see the documentation):\n```\nmodel = ProjectedGPModel(X, Y, n_tasks, n_lat, proj_likelihood=proj_likelihood,\n                                   mean_type=Mean,  kernel_type=kernel, decomp=decomp,\n                                   BDN=False, diagonal_B=False, scalar_B=False, diagonal_R=False,  \n                                   init_lmc_coeffs=True, ker_kwargs=ker_kwargs)\n```\n\nThe MLL function is here a custom one, composed of the independent single-output losses of the latent processes plus additional projection-related terms :\n```\nmll = ProjectedLMCmll(proj_likelihood, model)\n```\n\nAt prediction time, in `gpytorch`, modelled noise is usually added to GP covariance by calling `likelihood(model(X_test))`. Here, the full likelihood (by opposition to the projected one) has to be called instead :\n```\nfull_likelihood = model.full_likelihood()\nobserved_pred = full_likelihood(model(X_test))\n```\n\nFinally, note that the various quantities described in the paper ($\\mathbf{H, TY, \\Sigma_{P}}$...) can be accessed through helper methods : `.projection_matrix()`, `.project_data(train_Y)`, etc.\n\n\n## Experiments reproduction\n\n### Results generation\n\nFile `experiments.py` has been used to generate all experiments on synthetic data. It allows one to inspect the effect of varying one parameter (or two, or more with slight modifications) of this data on several models trained side-by-side. These parameters, described in the article under the same name, are : `p, q, q_noise, n, mu_noise, mu_str, max_scale`. Two additional ones are `q_noise_guess` and `lik_rank`, controling the number of latent functions of the *model* and *model noise* respectively (and not of the data and data noise), making it possible to assess the impact of model misspecification. Default parameter values are contained in the dictionnary `v` and their range of variation for parametric studies in dictionnary 'v_vals'.\n\nTo perform a parametric study, just specify the above-described parameter default values and ranges, the included models in the list `models_to_run`, and the test parameter `v_test` (plus eventally a second parameter `v_test_2` for a cross-variables study). A csv file with appropriate name will be outputed, detailing data specifications and performance metrics (defined in the function compute_metrics) for all models ; its name can be modulated through the field 'appendix' or directly. **Predefined inputs corresponding to the paper's figures are given at the beginning of the script ; just uncomment the corresponding line to reproduce one of them**. Additional options are available :\n\n+ `loss_thresh` and `patience` : parameters of the stopping criterion (see paper annex D)\n+ training parameters `n_iters` and `lrs` (maximal number of iterations and learning rate);\n+ `models_with_sched` : what models to endow with the suggested exponential decay scheduler (otherwise, individualized schedulers for each model could be specified by directly editing the script);\n+  `print_metrics` : whether to display performance metrics in console after each run;\n+  `print_loss` : whether to display loss at the end of each 100 iterations of model training, to inspect convergence;\n+  `reject_nonconverged_runs`: in some extreme or misspecified setups, some models frequently jump out of local minima. In most cases they are able to recover afterwards, but sometimes they fail to do so in the prescribed iterations budget, leading to poor predictive performance - a situation which never happened in experiments of the article. If this option is set to `True`, the output csv file is divided into two parts : one including all runs, and another including only converged runs and specifying the number of successful ones. Runs are rejected if accuracy is perceived as abnormally low (errors four times larger than the data noise).\n+  `n_random_runs`: number of random repetitions of the test.\n\nData and model characteristics can of course be modified directly in the body of the script.\n\nFinally, 'realdata_experiments.py' has been used to generate all experiments on real data. **To reproduce one of them, just fill the number of the desired one ('experiment = experiments[i]') at the beginning of the script, then run it**. You can also modify experimental settings (number of inducing points, data subsampling factors, parameters of the models, models to test...) in the preamble of the file or in the block of each experiment ; these settings are defined in annex D of the paper and use the same syntax as in file experiments.py. \n\n\n### Graph processing\n\nScript `graph_processing.py` has been used to generate all graphs of the paper, and enables easy visualization of experimental results. For inspecting a given simple parametric study, one can simply fill the self-explanatory fields `mods_to_plot`, `v` (variable to plot again), `metric`, and then run the script or call the function `make_plot`. All styles can of course be further customized. Once again, **predefined setups corresponding to the paper's figures are given at the beginning of the script ; just uncomment the corresponding line to reproduce one of them**. Below the main part of the file also lies the small script which generated figure 7.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqwerty6191%2Fprojected-lmc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqwerty6191%2Fprojected-lmc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqwerty6191%2Fprojected-lmc/lists"}