{"id":18507932,"url":"https://github.com/educationaltestingservice/cpd","last_synced_at":"2025-06-30T08:06:26.721Z","repository":{"id":155082959,"uuid":"489019174","full_name":"EducationalTestingService/cpd","owner":"EducationalTestingService","description":"Algorithms for Conditioned Positive Definite Matrix Under Constraints","archived":false,"fork":false,"pushed_at":"2022-06-23T16:58:48.000Z","size":508,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-14T09:17:39.243Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EducationalTestingService.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-05-05T15:04:56.000Z","updated_at":"2023-04-25T14:25:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"dbc4129f-d67d-4ff0-a387-bf69c0581fbe","html_url":"https://github.com/EducationalTestingService/cpd","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/EducationalTestingService/cpd","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EducationalTestingService%2Fcpd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EducationalTestingService%2Fcpd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EducationalTestingService%2Fcpd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EducationalTestingService%2Fcpd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EducationalTestingService","download_url":"https://codeload.github.com/EducationalTestingService/cpd/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EducationalTestingService%2Fcpd/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262736600,"owners_count":23356146,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T15:12:52.807Z","updated_at":"2025-06-30T08:06:26.713Z","avatar_url":"https://github.com/EducationalTestingService.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CPD: Algorithms for Conditioned Symmetric Positive Definite Matrix Estimation Under Constraints\nThis code implements several methods for solving the problem of finding a symmetric positive definite (SPD) matrix that\nminimizes an objective functional under the constraint of a bounded condition number. This work was motivated by a specific application: estimating the covariance matrix of the Empirical Best Linear\nPrediction (EBLP) model. Specific API calls are provided for this use case, but also provided for the general\n\"near-PD\" case.\n\n**CPD**: the problem of finding a symmetric positive definite (SPD) matrix that satisfies a condition (e.g., method of\nmoments equations for a covariance matrix satisfied to a given tolerance) and is well-conditioned. Typically, one cannot\nobtain both, and we need to find a point of optimal tradeoff between accuracy (small constraint residual) and\nstability (small condition number).\n\n**Near PD**: a special case of CPD where the functional is the Frobenius norm between a given matrix and the desired SPD\nmatrix with bounded condition number. That is, the task is to find a conditioned SPD matrix nearest to a matrix. The\nregularization parameter is either the condition number bound or a trade-off parameter as in the CPD formulation.\n\n**RCO (Regularized Cholesky Optimization)**: an algorithm for solving CPD that minimizes a regularized functional of the\nmatrix that balances minimizing constraint satisfaction plus the (approximate) condition number of the matrix. The\nminimization is performed for the entries of the Cholesky factor. The algorithm:\n* Using a continuation method to compute the entire ROC curve fast.\n* Each continuation step optimization is solved by Newton-CG.\n* The optimal regularization parameter is picked around the knee of the curve (the default is by a given residual\n  tolerance).\n\n** TN (Tanaka-Nakata): implementation of the paper Tanaka, M., Nakata, N., Positive definite matrix approximation with\ncondition number constraint\", Opt. Let. 8(3), 2014 for finding a bounded condition number nearest SPD matrix.\nUses a new O(n) dynamic programming algorithm to solve the eigenvalue optimization problem.\n\n## Installation\n- Install conda.\n- Create a conda environment from the attached environment.yml:\n  `conda env create -f environment.yml.`\n- If you'd also like to develop/run unit tests, use the full environment file instead:\n  `conda env create -f environment.full.yml.`\n- Add `src` to your PYTHONPATH.\n\n### Full Environment for Testing \u0026 Development\n* Complete the installation steps above, but create a conda environment using\n`conda env create -f environment_full.yml.` instead of `environment.yml`.\n* Install nodejs (e.g., on mac, `brew install nodejs`).\n* Run `jupyter labextension install @jupyter-widgets/jupyterlab-manager plotlywidget`\n\n## Testing\n\nThe project contains Pytest unit tests for the main modules. To run all tests, run `pytest test`.\n\n## Examples\n\n### Near PD\nGiven an n x n matrix ```a```, call ```near_pd()``` to get a conditioned matrix near ```a``` in\nthe Frobeius norm ```norm(b - a)/norm(a)```.\n```python\n  b, info = cpd.near_pd.near_pd(a, \"rco\")      # Regularized Cholesky Optimization\n  b, info = cpd.near_pd.near_pd(a, \"tn\")       # Tanaka-Nakata\n  b, info = cpd.near_pd.near_pd(a, \"higham\")   # Higham near-PD with diagonal perturbation.\n  b, info = cpd.near_pd.near_pd(a, \"tn\", leeway_factor=1.05)       # Stay within 5% Frobenius norm error\n```\n\n### EBLP\n\n```python\nimport cpd\nimport os\n\n# Load data from files; or crate p, w, c directly.\nDATA_DIR = \"/path/to/input_files\"\na_file_name = os.path.join(DATA_DIR, \"A.txt\")\nc_file_name = os.path.join(DATA_DIR, \"C.txt\")\nw_file_name = os.path.join(DATA_DIR, \"Weights_BxB.txt\")\npi_file_name = os.path.join(DATA_DIR, \"Pi_s.txt\")\n_, c, w, _, p = cpd.data.load_data(a_file_name, c_file_name, w_file_name, pi_file_name)\n\n# Create optimizer; pass in an ArrayList of arrays p and an array c if not loading the\n# data from files as above. Only needs to be called once per RHS term list p.\n# Note: p is an cpd.linalg.ArrayList, not a list of matrices.\noptimizer = cpd.eblp.create_optimizer(\"rco\", p, w)\n# Now call the optimizer fo ra particular LHS matrix c.\ng, info = optimizer.optimize(c)\n\n# Objective function value^(1/2) = Relative error in satisfying the moment equations f(G) = C.\nprint(info[1] ** 0.5)  # 0.0855678\n```\n\nTo use the T-N method instead, use\n```python\noptimizer = cpd.eblp.create_optimizer(\"tn\", p, w)\ng, info = optimizer.optimize(c)\n```\n\n### General Constraints\n\nInputs:\n\n* Constraint functional.\n* Gradient of the constraint functional (for BFGS minimization).\n* Solution quality metric (e.g., the matrix condition number).\n\nOutput:\n\n* Optimal covariance matrix.\n\n## Contents\n\n- `data` - test data.\n- `notebooks`: Juypter notebooks.\n- `src`: source code.\n- `test`: unit tests.\n\n## References\n\n* Dan, Katherine, JR, Improving Accuracy and Stability of Aggregate Student Growth Measures Using Empirical Best Linear\n  Prediction (JEBStats, in review).\n\n## TODO\n\n* Test another data set.\n* Integrate into R simulation - Katherine.\n* Write research memo/paper.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feducationaltestingservice%2Fcpd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feducationaltestingservice%2Fcpd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feducationaltestingservice%2Fcpd/lists"}