{"id":22215705,"url":"https://github.com/jobovy/extreme-deconvolution","last_synced_at":"2025-04-06T04:15:46.518Z","repository":{"id":13326903,"uuid":"16013773","full_name":"jobovy/extreme-deconvolution","owner":"jobovy","description":"Density estimation using Gaussian mixtures in the presence of noisy, heterogeneous and incomplete data","archived":false,"fork":false,"pushed_at":"2024-11-18T16:48:43.000Z","size":735,"stargazers_count":80,"open_issues_count":1,"forks_count":24,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-30T03:09:14.318Z","etag":null,"topics":["c","density-estimation","gaussian-mixture-models","machine-learning","python","uncertainty"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jobovy.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-01-17T22:37:24.000Z","updated_at":"2024-12-13T19:56:23.000Z","dependencies_parsed_at":"2024-12-24T10:21:28.556Z","dependency_job_id":"f9b06d27-167b-45c3-9fc6-6161fd2bc8e9","html_url":"https://github.com/jobovy/extreme-deconvolution","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jobovy%2Fextreme-deconvolution","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jobovy%2Fextreme-deconvolution/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jobovy%2Fextreme-deconvolution/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jobovy%2Fextreme-deconvolution/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jobovy","download_url":"https://codeload.github.com/jobovy/extreme-deconvolution/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247430964,"owners_count":20937875,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","density-estimation","gaussian-mixture-models","machine-learning","python","uncertainty"],"created_at":"2024-12-02T21:42:30.791Z","updated_at":"2025-04-06T04:15:46.500Z","avatar_url":"https://github.com/jobovy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\n%\n% README for extreme-deconvolution implementation:\n%    contains: installation instuctions, usage instructions\n%\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\nCopyright 2008-2010 Jo Bovy, David W. Hogg, and Sam Roweis.\n\n\nINSTALLATION:\n--------------\n\nThe installation can produce two things: (1) a shared library which\ncan be called from other programs and which is called in the IDL\nwrapper provided with the distribution, (2) a \"main\" program which\nruns on input files. See usage below for instructions on how to use these.\n\nThe shared library will be installed in the current directory, and the\ninstallation command will also attempt ensure that the path to the\nshared library in the IDL wrapper is correct. Please do check this\npath in the IDL wrapper if problems occur with the wrapper. If make\ninstall is not called the location of the '.so' should be set manually\nin the IDL wrapper (which is located in the /pro directory).\n\nThis program needs the Gnu Scientific Library\n(http://www.gnu.org/software/gsl/). The installation will not succeed\nif this library is not found.\n\nInstallation:\nmake\nsudo make install\nmake idlwrapper\nmake pywrapper\nmake rpackage\nmake testidl\nmake testpy\nmake clean\n\n1) make: make will create both the shared objected library as well as\nthe \"main\" program. Run \"make extremedeconvolution\" to just compile\nthe main program, \"make extremedeconvolution.so\" to just compile the\nshared object library. These files will be deposited in the 'build/'\nsub-directory.\n\n2) sudo make install: will copy the shared object library to\n/usr/local/lib . To install somewhere else run\n\nsudo make install INSTALL_DIR=/path/to/install/dir/\n\nmake sure to end in '/'.\n\n3) make idlwrapper: will set the path of the shared object library in the\nIDL wrapper. If you installed the library by specifying INSTALL_DIR, also run\n\nmake idlwrapper INSTALL_DIR=/path/to/install/dir/\n\n4) make pywrapper: will set the path of the shared object library in\nthe Python wrapper (that this works is not critical if the library is\ninstalled in a standard place). If you installed the library by\nspecifying INSTALL_DIR, also run\n\nmake pywrapper INSTALL_DIR=/path/to/install/dir/\n\n5) make rpackage: will build and install the R package \"ExtremeDeconvolution\".\n\n6) make testidl: this will run the test program 'fit_tf.pro' from the\nexamples directory, fitting the Tully-Fisher relation in different\nbandpasses (see arXiv:0905:2979v1 for details). A plot TF.ps and\noutput file TF.tex will be produced in the examples directory. If the\ntest succeeds 'Ouput of test agrees with given solution' will be\nprinted. If the test (diff of TF.tex and TF.out) fails 'Output of test\ndoes not agree with given solution Manually diff the TF.tex and TF.out\n(given solution) file' will be printed. IMPORTANT: If your IDL\ninstallation is not 'idl', pass the variable IDL=/your/idl/bin\n\n7) make testpy: Tests whether the python program can access the\nlibrary and gives the expected output for a test run. Uses python's\ndoctest module. IMPORTANT: If your Python installation is not\n'python', pass the variable PYTHON=/your/python/bin\n\n8) make clean: cleans up intermediate files\n\n9*) make spotless: deletes all the intermediate files as well as the main\nprogram and the shared library.\n\n\n\nUSAGE I: THROUGH THE PYTHON OR IDL WRAPPERS\n--------------------------------------------\n\nUsage through the python or IDL wrappers is encouraged. Indeed, the\n\"main\" program is lacking in many respects, was never used in any\nscientific analysis, and has not been tested. However, it might\nwork. More on this later.\n\nSee the python and IDL wrapper header for information on how to call\nthe IDL routine. Various logfiles can be written, one of them logs\neverything (ie, loglikelihoods after each iteration + input and output\nparameters). The other one only outputs the loglikelihood trajectory\nof the accepted split and merge moves.\n\nThe code given here is more general than the algorithm given in the\npaper in the documentation in that it includes the possibility of\nproviding a projection matrix which projects the *underlying* values\non the data, i.e., as in Eq. (3) of the paper. However, if this\nprojection matrix does not have full rank (eg, when the data is\nincomplete), then the calculation of the split and merge hierarchy\nimplemented here is not correct. Therefore, if you want to employ the\nsplit and merge algorithm with incomplete data we suggest that you use\nfull rank projection matrices and put in very large errors for the\nunobserved dimensions. This works. Some hints are given in the code on\nwhat the correct implementation would be for projections matrices that\ndon't have full rank.\n\nThe code here is also more general than the algorithm in that one can\nspecify a weight for each data point with which its contribution to\nthe log-likelihood should be weighted. Use this option with care!\n\nUSAGE II: THROUGH THE SHARED OBJECT LIBRARY\n-------------------------------------------\n\nThe C-code implementing the algorithm can be called from other\nC-programs by using the shared object library. The main function for\nthis purpose is the 'proj_gauss_mixtures' function. This function\ntakes as its input the data, the initial conditions, and some\nparameters describing the kind of solution one wants, and updates the\ninitial conditions to the result of the algorithm.\n\nThe inputs are as follows: \n\na. The data: an array of structures of type\n'datapoint'. Each 'datapoint' structure holds all of the information\npertaining to one datapoint: \nstruct datapoint{\n  gsl_vector *ww;\n  gsl_matrix *SS;\n  gsl_matrix *RR; }; The vector ww holds the actual measured data; the\nmatrix SS holds the uncertainty covariance matrix associated with this\ndata point, and the projection matrix RR projects from the space where\nthe model lives to the space of the data (this could be a rotation, or\njust simply a unit matrix if the model and the observations are given\nin the same coordinates). Once can set RR to a null pointer if no\nprojection is necessary, in which case you also need to provide the\nbool noproj (see later). The data-\u003eSS matrix can be (di,1) dimensional\ninstead of (di,di) dimensional, when the uncertainty covariances are\ndiagonal. In this case, also set the diagerrs bool (see below). The\ninteger 'N' specifies the number of datapoints.\n\nb. The model parameters, i.e., the initial conditions: The initial\nconditions for the model parameters are given as an array of\nstructures of type 'gaussian', each member of this array holds the\nparameters for one of the Gaussian components:\nstruct gaussian{\n  double alpha;\n  gsl_vector *mm;\n  gsl_matrix *VV; }; here, alpha holds the amplitude of this component\nin the mixture, the vector mm holds the mean of this Gaussian\ncomponent, and the matrix VV holds its covariance matrix.\nThe integer 'K' specifies the number of Gaussian components in the mixture.\n\nc. fixamp, fixmean, fixcovar: arrays of bools specifying for each\nGaussian component whether its amplitude, mean, or covariance,\nrespectively, should be held fixed during the optmization.\n\nd. avgloglikedata: this double holds the average log likelihood of the\ndata, returned after convergence.\n\ne. tol and maxiter: if the difference between two subsequent average\nlog likelhihoods is smaller than tol the algorithm is considered to\nhave converged. maxiter specifies the maximum number of iterations\n(for the initial convergence and each individual split and merge step)\nthat the algorithm will go through before giving up.\n\nf. likeonly: set this to true if you only want to calculate the\naverage log likelihood of the data given the model, without\noptimization.\n\ng. w, splitnmerge: w specifies the w parameter to be used in the\nregularization (see the accompanying paper for more on this);\nsplitnmerge specifies the number of steps to go down in the\nsplit-and-merge hierarchy before given up (again, see the paper for\nmore on this).\n\nh. keeplog, logfile, convlogfile: set keeplog to true to print\ninformation to a logfile: this includes the initial conditions used,\nthe average log likelihood in every step, and, for the split-and-merge\npart, which Gaussians are split and which are merged and whether the\nsplit-and-merge step was successfull or not. convlogfile holds only\nthe average loglikelihoods of the initial convergence (before the\nfirst split-and-merge step) and those of successfull split-and-merge\nsteps.\n\ni. noproj: indicates that no projection from the model space to the\ndata space is necessary.\n\nj. diagerrs: indicates that the uncertainty covariances in data-\u003eSS\nare diagonal, such that data-\u003eSS is a (di,1) matrix instead of a\n(di,di) matrix.\n\n\nUSAGE III: THROUGH THE MAIN PROGRAM\n-----------------------------------\n\nAS OF v1.2, THE MAIN PROGRAM IS NO LONGER SUPPORTED\n\n(deprecated) The \"main\" program: since no effort was made to develop\nthis, this \"main\" program is very rudimemtary. The input data and\ninitial conditions must be supplied in the files \"inputdata.dat\" and\n\"IC.dat\" respectively, and the results are written to the file\n\"result.dat\". The main advantage of the \"main\" program is that it can\ndeal with data that has unobserved dimensions that are different for\ndifferent data points, i.e., it can handle the situation in which the\nfirst data point has one unobserved dimension and the second data\npoint has two unobserved dimension, etc. This is all rather irrelevant\ngiven that unobserved dimension == large errors, but it could be\nuseful. The main disadvantage of the main program is that it has not\nbeen tested extensively, but if it succeeds in getting the data and\nthe initial conditions to the main algorithm, the results should be\ntrusted (since the main algorith has been tested extensively).\n\nThe structure of the initial conditions file is as follows (see\nexamples/IC.dat for an example): First various parameters of the model\nand the algorithm must be set, e.g., the number of Gaussians, the\nmaximum number of iterations, etc. Then the initial values for the\nparameters of the various Gaussians must be set, first the amplitude,\nthen the components of the mean, and the covariance matrix (in the\nformat xx, yy, zz, xy, xz, yz, and similar for higher/lower\ndimensions); see the file examples/IC.dat for a specific example of\nthis.\n\nThe structure of the data file is as follows (see\nexamples/inputdata.dat for an example of this): Each line holds the\ndata from one datapoint; separated by '|' first we specify the value\nof the data point w, then the covariance matrix S associated with this\nmeasurement (specified as xx, yy, xy, and similar for higher\ndimensions), followed by the projection matrix (in the format 11, 12,\n21, 22, and similar for higher dimensions).\n\n\nUSAGE IV: THROUGH R\n-------------------\n\nAfter installing the R package, open up an R console and type\n\n\u003e library(ExtremeDeconvolution)\n\u003e ?extreme_deconvolution\n\nTo see input arguments and an example.\n\n\nOUTSTANDING ISSUES\n------------------\n\nFor small data sets the algorithm will very rarely decrease the\nlikelihood in a step (which it is proven not to). This is most likely\ndue to a small numerical instability in the way the average log\nlikelihoods are calculated. These rare decreases are very small, and\ndo not seem to affect the good overall behavior of the algorithm.\n\n\nLICENSE\n-------\n\t\nThis code is free software licensed under the BSD-3-clause\nlicense. See the file LICENSE for the full terms.\n\n\nACKNOWLEDGING EXTREME-DECONVOLUTION\n-----------------------------------\n\nThe algorithm that the code implements is described in the paper\n\"Extreme deconvolution: inferring complete distribution functions from\nnoisy, heterogeneous and incomplete observations\"; a copy of the\nlatest draft of this paper is included in the \"doc/\" directory of the\nrepository or source archive. If you use the code, please cite this\npaper, e.g., \n\nExtreme deconvolution: inferring complete distribution functions from\nnoisy, heterogeneous and incomplete observations Jo Bovy, David\nW. Hogg, \u0026 Sam T. Roweis, Submitted to AOAS (2009) [arXiv/0905.2979]\n\nCONTACT\n-------\n\nQuestions, comments, extensions? jb2777@nyu.edu\n\n\nEND_OF_README\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjobovy%2Fextreme-deconvolution","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjobovy%2Fextreme-deconvolution","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjobovy%2Fextreme-deconvolution/lists"}