{"id":15672984,"url":"https://github.com/csinva/mdl-complexity","last_synced_at":"2025-09-12T10:36:20.874Z","repository":{"id":96602762,"uuid":"263444831","full_name":"csinva/mdl-complexity","owner":"csinva","description":"MDL Complexity computations and experiments from the paper \"Revisiting complexity and the bias-variance tradeoff\".","archived":false,"fork":false,"pushed_at":"2023-06-12T01:19:49.000Z","size":15129,"stargazers_count":18,"open_issues_count":0,"forks_count":2,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-02-25T18:45:22.821Z","etag":null,"topics":["ai","artificial-intelligence","bias-variance-trade","bias-variance-tradeoff","complexity","double-descent","information-theory","linear-models","linear-regression","linear-regression-models","machine-learning","mdl","mean-squared-error","minimum-description-length","model-selection","ridge-regression","statistics"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2006.10189","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/csinva.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-12T20:27:39.000Z","updated_at":"2024-10-08T05:52:32.000Z","dependencies_parsed_at":"2024-10-23T11:04:21.873Z","dependency_job_id":null,"html_url":"https://github.com/csinva/mdl-complexity","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/csinva%2Fmdl-complexity","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/csinva%2Fmdl-complexity/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/csinva%2Fmdl-complexity/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/csinva%2Fmdl-complexity/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/csinva","download_url":"https://codeload.github.com/csinva/mdl-complexity/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242833429,"owners_count":20192755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","artificial-intelligence","bias-variance-trade","bias-variance-tradeoff","complexity","double-descent","information-theory","linear-models","linear-regression","linear-regression-models","machine-learning","mdl","mean-squared-error","minimum-description-length","model-selection","ridge-regression","statistics"],"created_at":"2024-10-03T15:34:48.387Z","updated_at":"2025-03-10T10:31:37.246Z","avatar_url":"https://github.com/csinva.png","language":"Jupyter Notebook","readme":"Official code for using / reproducing MDL-COMP from the paper \"Revisiting complexity and the bias-variance tradeoff\" ([arXiv link](https://arxiv.org/abs/2006.10189)). This code implements the calculation of MDL Complexity given training data and explores its ability to inform generalization. MDL-COMP is a complexity measure based on the principle of minimum description length of Rissanen. It enjoys nice theoretical properties and can be used to perform model selection, showing results on par with cross-validation (and sometimes even better with limited data).\n\n*Note: this repo is actively maintained. For any questions please file an issue.*\n\n# Reproducing the results in the paper\n- most of the results can be produced by simply running the notebooks\n- the experiments with real-data are more in depth and require running `scripts/submit_real_data_jobs.py` (which is a script that calls `src/fit.py` with the appropriate hyperparameters) before running the notebook to view the analysis\n\n![](https://csinva.github.io/mdl-complexity/reports/fig_iid_mse.svg)\n\n\n## Calculating MDL-COMP\nComputation of `Prac-MDL-Comp` is fairly straightforward:\n\n```python\nimport numpy.linalg as npl\nimport numpy as np\nimport scipy.optimize\n\n\ndef prac_mdl_comp(X_train, y_train, variance=1):\n    '''Calculate prac-mdl-comp for this dataset\n    '''\n    eigenvals, eigenvecs = npl.eig(X_train.T @ X_train)\n\n    def calc_thetahat(l):\n        inv = npl.pinv(X_train.T @ X_train + l * np.eye(X_train.shape[1]))\n        return inv @ X_train.T @ y_train\n\n    def prac_mdl_comp_objective(l):\n        thetahat = calc_thetahat(l)\n        mse_norm = npl.norm(y_train - X_train @ thetahat)**2 / (2 * variance)\n        theta_norm = npl.norm(thetahat)**2 / (2 * variance)\n        eigensum = 0.5 * np.sum(np.log((eigenvals + l) / l))\n        return (mse_norm + theta_norm + eigensum) / y_train.size\n\n    opt_solved = scipy.optimize.minimize(prac_mdl_comp_objective, x0=1e-10)\n    prac_mdl = opt_solved.fun\n    lambda_opt = opt_solved.x\n    thetahat = calc_thetahat(lambda_opt)\n    \n    return {\n        'prac_mdl': prac_mdl,\n        'lambda_opt': lambda_opt,\n        'thetahat': thetahat\n    }\n```\n\n# Reference\n\n- feel free to use/share this code openly\n- uses code for mdl-rs from [here](https://github.com/koheimiya/pymdlrs)\n- uses fmri data from [here](https://crcns.org/data-sets/vc/vim-2)\n- if you find this code useful for your research, please cite the following:\n```c\n@article{dwivedi2020revisiting,\n  title={Revisiting complexity and the bias-variance tradeoff},\n  author={Dwivedi, Raaz and Singh, Chandan and and Yu, Bin and Wainwright, Martin},\n  journal={arXiv preprint arXiv:2006.10189},\n  year={2020}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcsinva%2Fmdl-complexity","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcsinva%2Fmdl-complexity","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcsinva%2Fmdl-complexity/lists"}