{"id":15920305,"url":"https://github.com/janmotl/cv","last_synced_at":"2025-04-03T12:14:50.584Z","repository":{"id":83811558,"uuid":"241371773","full_name":"janmotl/cv","owner":"janmotl","description":"Stratified cross-validation for multi-label classification","archived":false,"fork":false,"pushed_at":"2020-02-18T16:42:53.000Z","size":85,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-09T01:34:44.520Z","etag":null,"topics":["machine-learning","multi-label-classification"],"latest_commit_sha":null,"homepage":"","language":"MATLAB","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/janmotl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-18T13:45:05.000Z","updated_at":"2020-05-27T01:33:50.000Z","dependencies_parsed_at":"2023-03-10T17:45:15.806Z","dependency_job_id":null,"html_url":"https://github.com/janmotl/cv","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janmotl%2Fcv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janmotl%2Fcv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janmotl%2Fcv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janmotl%2Fcv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/janmotl","download_url":"https://codeload.github.com/janmotl/cv/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246998218,"owners_count":20866696,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","multi-label-classification"],"created_at":"2024-10-06T19:22:03.815Z","updated_at":"2025-04-03T12:14:50.558Z","avatar_url":"https://github.com/janmotl.png","language":"MATLAB","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n__Stratified cross-validation for multi-label classification__\n\nOne way how to evaluate the accuracy of machine learning models is via cross-validation.\nWhen we are dealing with classification, we may want to use _stratified_ cross-validation,\nwhich preserves the distribution of the classes in the whole data set in the individual folds.\nHowever, common implementations of stratified cross-validation work only with a single label. \nThis code performs stratified assignment of multi-label samples into folds, where the labels are all nominal.  \n\n\n__Assignment objectives__\n\n1. Preserve the distribution of individual class-values across folds (_1-way interaction_)\n2. Preserve the distribution of _2-way interactions_ between individual class-values across folds\n3. Preserve the distribution of _n-way interactions_ between individual class-values across folds, where _n_ is the count of labels  \n\n\n__Literature review__\n\nOne way how to quickly extend stratified cross-validation into multi-label stratified cross-validation is by concatenating the class labels into a single label. And run the standard stratified cross-validation. This approach takes care of preserving the _n-way interactions_ listed above, but of nothing else.\n\nAnother approach is to maintain _1-way interactions_. \nThis was done by [(Sechidis, 2011)][1]. And later on extended by [(Szymański, 2017)][2] to optimize both, _1-way_ and _2-way_ interactions.\nWe optimize all these three criteria at once. \n\n\n__Why bother?__\n\nStratified cross-validation generally improves (plain) cross-validation in the following aspects:\n\n1. It makes sure that each class-value is present in the testing set. This is important for the evaluation of many performance measures.\n2. It maintains the same class prior distribution across all the folds. This increases the measured testing accuracy and minimizes the variance of the testing accuracy.\n\n \n__Solution__\n\nWe use Integer Linear Programming (ILP) to reach the optimal solution. Hence, the solution is not an approximation but is exact.\nThe disadvantage, in comparison to greedy solutions from [(Sechidis, 2011)][1] and [(Szymański, 2017)][2] is that the calculation is slow.\nHence, we provide pre-calculated assignments for 10-fold cross-validation for some common multi-label classification data sets at [Multi-Label Classification Dataset Repository](http://www.uco.es/kdis/mllresources/).      \n\n\n__Acknowledgements__\n1. The data are from [Multi-Label Classification Dataset Repository][3] by Mojano et al.\n2. The first published article about stratified multi-label cross-validation is [(Sechidis, 2011)][1]\n3. The evaluation metrics were implemented in [scikit-multilearn](https://github.com/scikit-multilearn/scikit-multilearn) by Szymański et al.\n\nWithout their work, this page would not exist.     \n\n[1]: https://doi.org/10.1007/978-3-642-23808-6_10\n[2]: https://arxiv.org/abs/1704.08756 \n[3]: http://www.uco.es/kdis/mllresources/","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjanmotl%2Fcv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjanmotl%2Fcv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjanmotl%2Fcv/lists"}