{"id":50306881,"url":"https://github.com/cissagatto/crossvalidationmultilabel","last_synced_at":"2026-05-28T17:01:58.140Z","repository":{"id":43305615,"uuid":"339737166","full_name":"cissagatto/CrossValidationMultiLabel","owner":"cissagatto","description":"A code to execute and save cross-validation in multilabel classification","archived":false,"fork":false,"pushed_at":"2026-05-09T02:19:44.000Z","size":26959,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-09T04:30:04.347Z","etag":null,"topics":["cross-validation","machine-learning","multilabel-classification","r"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cissagatto.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-02-17T13:45:25.000Z","updated_at":"2026-05-09T02:19:48.000Z","dependencies_parsed_at":"2023-02-09T16:15:34.556Z","dependency_job_id":"e119c158-1157-469d-a969-f06d3c1f87a5","html_url":"https://github.com/cissagatto/CrossValidationMultiLabel","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cissagatto/CrossValidationMultiLabel","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FCrossValidationMultiLabel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FCrossValidationMultiLabel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FCrossValidationMultiLabel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FCrossValidationMultiLabel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cissagatto","download_url":"https://codeload.github.com/cissagatto/CrossValidationMultiLabel/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FCrossValidationMultiLabel/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33617718,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cross-validation","machine-learning","multilabel-classification","r"],"created_at":"2026-05-28T17:01:53.575Z","updated_at":"2026-05-28T17:01:58.129Z","avatar_url":"https://github.com/cissagatto.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# X-Folds Cross Validation MultiLabel\nA code to execute and save cross-validation in multilabel classification. This code is part of my doctoral research.\n\n\n## How to Cite 📑\nIf you use this code in your research, please cite the following:\n\n```bibtex\n@misc{Gatto2025,\n  author = {Gatto, E. C.},\n  title = {Cross-Validation Multi-Label Classification},\n  year = {2025},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/cissagatto/CrossValidationMultiLabel}}\n}\n```\n\n## 🗂️ Project Structure\n\nThe codebase includes R scripts in `/R` folder:\n\n* `config-files.R`\n* `libraries.R`\n* `utils.R`\n* `CrossValidationMultiLabel.R`\n* `main.R`\n* `cvm.R`\n\n## Multi-Label Datasets\n\nBelow are some reliable repositories where you can download datasets for multi-label classification tasks:\n\n- [COMETA Dataset Collection (University of Jaén)](https://cometa.ujaen.es/datasets/): A diverse collection of datasets for multi-label learning, designed for standardized benchmarking and experimentation.\n\n- [MLL Resources (University of Córdoba)](https://www.uco.es/kdis/mllresources/): A repository of widely used datasets in multi-label machine learning research, covering various domains.\n\n- [Extreme Classification Repository (Microsoft Research / Manik Varma)](http://manikvarma.org/downloads/XC/XMLRepository.html): A collection of large-scale datasets for extreme multi-label classification, with millions of labels, suitable for high-dimensional problems.\n\n\u003e 💡 These resources are useful for training and evaluating multi-label classification algorithms across a range of domains, such as text, image, and structured data.\n\n\n## ⚙️ How to Reproduce the Experiment\n\n### Step-1\nConfirms if the folder *Utils* contains the following files: *Clus.jar*, *R_csv_2_arff.jar*, and *weka.jar*, and also the folder *lib* with *commons-math-1.0.jar*, *jgap.jar*, weka.jar and *Clus.jar.* Without these jars, the code not runs. \n\n### Step-2\nCopy this code and place it where you want. The folder configurations is \"~/CrossValidationMultiLabel\"\n\n### Step-3 – Prepare the Dataset Metadata File\nA file called `datasets-original.csv` should be placed in the **root project folder**. This file contains details for 90 multilabel datasets used in the code. To add a new dataset, include the following information in the file:\n\n\n| Parameter    | Status    | Description                                           |\n|------------- |-----------|-------------------------------------------------------|\n| Id           | mandatory | Integer number to identify the dataset                |\n| Name         | mandatory | Dataset name (please follow the benchmark)            |\n| Domain       | optional  | Dataset domain                                        |\n| Instances    | mandatory | Total number of dataset instances                     |\n| Attributes   | mandatory | Total number of dataset attributes                    |\n| Labels       | mandatory | Total number of labels in the label space             |\n| Inputs       | mandatory | Total number of dataset input attributes              |\n| Labelsets    | optional  |                                                       |\n| Single       | optional  |                                                       |\n| Max.freq     | optional  |                                                       |\n| Cardinality  | optional  |                                                       |\n| Density      | optional  |                                                       |\n| Mean.IR      | optional  |                                                       | \n| Scumble      | optional  |                                                       | \n| Scumble.CV   | optional  |                                                       | \n| TCS          | optional  |                                                       |\n| Diversity    | optional  |                                                       |\n| rDep         | optional  |                                                       |\n| ULD          | optional  |                                                       | \n| AttStart     | mandatory | Column number where the attribute space begins * 1    | \n| AttEnd       | mandatory | Column number where the attribute space ends          |\n| LabelStart   | mandatory | Column number where the label space begins            |\n| LabelEnd     | mandatory | Column number where the label space ends              |\n| xn           | optional  | Value for Dimension X of the Kohone's map             | \n| yn           | optional  | Value for Dimension Y of the Kohonen's map            |\n| gridn        | optional  | X times Y value. Kohonen's map must be square         |\n| max.neigbors | optional  | The maximum number of neighbors is given by LABELS -1 |\n\n\n1. The value is always `1` because it refers to the first column.\n\n2. [Click here](https://link.springer.com/book/10.1007/978-3-319-41111-8) for detailed explanations of each property.\n\n\u003e ℹ️ In **R**, both columns and rows are indexed starting from `1`.  \n\u003e ⚠️ Be aware that in **Python**, indexing starts from `0`, which can lead to off-by-one errors when switching between the two languages.\n\n\n## STEP 4: Environment Setup 🔧\n\nBefore running the code, ensure that all required **Java**, **R**, and **Python** libraries are installed on your system.  \n\nYou can use a pre-configured [Conda environment](https://1drv.ms/u/s!Aq6SGcf6js1mw4hbhU9Raqarl8bH8Q?e=IA2aQs) created specifically for this experiment. Download the environment files using the link above, then run the following command to set it up:\n\n```bash\nconda env create -f Teste.yml\n```\n\nFor more information on creating and managing Conda environments, refer to the official Conda documentation\n\n\n\n### STEP 5: Configuration File ⚙️\n\nTo run this code, you will need a configuration file in **CSV** format containing the following information:\n\n| **Config**       | **Description**                                                                 |\n|------------------|---------------------------------------------------------------------------------|\n| `FolderScripts`  | Absolute path to the folder containing the R scripts                            |\n| `Dataset_Path`   | Absolute path to the folder where the dataset `.tar.gz` file is stored          |\n| `Temporary_Path` | Absolute path to the folder used for temporary data processing *                |\n| `Reports_Path`   | Absolute path to the reports folder                                             |\n| `Dataset_Name`   | Name of the dataset, as defined in the `dataset-original.csv` file              |\n| `Number_Dataset` | Numeric ID of the dataset, as defined in the `dataset-original.csv` file        |\n| `Validation`     | 1 = to generate test, train and validation sets. 0 otherwise                    |\n| `Number_Folds`   | Number of folds to be used in cross-validation                                  |\n\n\n\u003e 📝 *We recommend using high-speed temporary storage directories such as `/dev/shm`, `/tmp`, or `/scratch` for better performance during processing.*\n\nFor detailed guidance on setting up the configuration, please refer to the example CSV files provided.\n\n\n# Run\n\nTo run, first enter the folder ~/CrossValidationMultiLabel/R in a terminal and the type:\n\n```\nRscript cvm.R absolute_path_to_config_file\n```\n\n\nExample:\n\n```\nRscript cvm.R ~/CrossValidationMultiLabel/config-files/cvm-3sources_bbc1000.csv\n```\n\n\n## Folder Structure\n\u003cimg src=\"https://github.com/cissagatto/CrossValidationMultiLabel/blob/main/images/folder_strucutre_mlcv.png\" width=\"300\"\u003e\n\n## DOWNLOAD RESULTS\n[Click here](https://1drv.ms/u/s!Aq6SGcf6js1mrZJSfd6FpToCtGVqJw?e=NxaBfW)\n\n\n## Acknowledgment\n- This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.\n- This study was financed in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPQ) - Process number 200371/2022-3.\n- The authors also thank the Brazilian research agencies FAPESP financial support.\n- (Belgium ....)\n\n\n\n\n## 📞 Contact\nElaine Cecília Gatto\n✉️ [elainececiliagatto@gmail.com](mailto:elainececiliagatto@gmail.com)\n\n\n\n\n## Links\n\n| [Site](https://sites.google.com/view/professor-cissa-gatto) | [Post-Graduate Program in Computer Science](http://ppgcc.dc.ufscar.br/pt-br) | [Computer Department](https://site.dc.ufscar.br/) |  [Biomal](http://www.biomal.ufscar.br/) | [CNPQ](https://www.gov.br/cnpq/pt-br) | [Ku Leuven](https://kulak.kuleuven.be/) | [Embarcados](https://www.embarcados.com.br/author/cissa/) | [Read Prensa](https://prensa.li/@cissa.gatto/) | [Linkedin Company](https://www.linkedin.com/company/27241216) | [Linkedin Profile](https://www.linkedin.com/in/elainececiliagatto/) | [Instagram](https://www.instagram.com/cissagatto) | [Facebook](https://www.facebook.com/cissagatto) | [Twitter](https://twitter.com/cissagatto) | [Twitch](https://www.twitch.tv/cissagatto) | [Youtube](https://www.youtube.com/CissaGatto) |\n\n# Thanks\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcissagatto%2Fcrossvalidationmultilabel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcissagatto%2Fcrossvalidationmultilabel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcissagatto%2Fcrossvalidationmultilabel/lists"}