{"id":50306896,"url":"https://github.com/cissagatto/generate-partitions-random1","last_synced_at":"2026-05-28T17:01:57.783Z","repository":{"id":204510276,"uuid":"371392058","full_name":"cissagatto/Generate-Partitions-Random1","owner":"cissagatto","description":"This code is part of my doctoral research. The aim is to generate a specific version of random partitions for multilabel classification.  ","archived":false,"fork":false,"pushed_at":"2023-10-30T16:47:26.000Z","size":17670,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2023-10-30T17:42:51.695Z","etag":null,"topics":["classification","machine-learning","multi-label","multi-label-classification","multi-label-partitions","multi-label-random-partitions","partitions","r","random-partitions"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cissagatto.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-05-27T13:56:59.000Z","updated_at":"2023-10-30T17:42:54.603Z","dependencies_parsed_at":null,"dependency_job_id":"59a3404d-86da-40d6-a9f3-03f27df85f2e","html_url":"https://github.com/cissagatto/Generate-Partitions-Random1","commit_stats":null,"previous_names":["cissagatto/generate-partitions-random1"],"tags_count":null,"template":null,"template_full_name":null,"purl":"pkg:github/cissagatto/Generate-Partitions-Random1","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FGenerate-Partitions-Random1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FGenerate-Partitions-Random1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FGenerate-Partitions-Random1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FGenerate-Partitions-Random1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cissagatto","download_url":"https://codeload.github.com/cissagatto/Generate-Partitions-Random1/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FGenerate-Partitions-Random1/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33617718,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","machine-learning","multi-label","multi-label-classification","multi-label-partitions","multi-label-random-partitions","partitions","r","random-partitions"],"created_at":"2026-05-28T17:01:55.504Z","updated_at":"2026-05-28T17:01:57.752Z","avatar_url":"https://github.com/cissagatto.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Generate Random Partitions Version 1\nThis code is part of my PhD at PPG-CC/DC/UFSCar. The aim is generate a specific type of random partition for multilabel classification.\n\n## How to Cite\n@misc{Gatto2022, author = {Gatto, E. C.}, title = {Generate Random Partitions Version 1 for Multilabel Classification}, year = {2022}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\\url{https://github.com/cissagatto/Generate-Partitions-Random1}}}\n\n## Source Code\nThis code source is composed of the project R to be used in RStudio IDE and also the following scripts R:\n\n1. libraries.R\n2. utils.R\n3. generateR1.R\n4. run.R\n5. random1.R\n6. gpr1_config_files.R\n\n\n## Preparing your experiment\n\n### STEP 1\nA file called _datasets-original.csv_ must be in the *root project folder*. This file is used to read information about the datasets and they are used in the code. We have 90 multilabel datasets in this _.csv_ file. If you want to use another dataset, please, add the following information about the dataset in the file:\n\n\n| Parameter    | Status    | Description                                           |\n|------------- |-----------|-------------------------------------------------------|\n| Id           | mandatory | Integer number to identify the dataset                |\n| Name         | mandatory | Dataset name (please follow the benchmark)            |\n| Domain       | optional  | Dataset domain                                        |\n| Instances    | mandatory | Total number of dataset instances                     |\n| Attributes   | mandatory | Total number of dataset attributes                    |\n| Labels       | mandatory | Total number of labels in the label space             |\n| Inputs       | mandatory | Total number of dataset input attributes              |\n| Cardinality  | optional  |                                                       |\n| Density      | optional  |                                                       |\n| Labelsets    | optional  |                                                       |\n| Single       | optional  |                                                       |\n| Max.freq     | optional  |                                                       |\n| Mean.IR      | optional  |                                                       | \n| Scumble      | optional  |                                                       | \n| TCS          | optional  |                                                       | \n| AttStart     | mandatory | Column number where the attribute space begins*       | \n| AttEnd       | mandatory | Column number where the attribute space ends          |\n| LabelStart   | mandatory | Column number where the label space begins            |\n| LabelEnd     | mandatory | Column number where the label space ends              |\n| Distinct     | optional  |                                                       |\n| xn           | mandatory | Value for Dimension X of the Kohonen map              | \n| yn           | mandatory | Value for Dimension Y of the Kohonen map              |\n| gridn        | mandatory | X times Y value. Kohonen's map must be square         |\n| max.neigbors | mandatory | The maximum number of neighbors is given by LABELS -1 |\n\n\n* Because it is the first column the number is always 1.\n\n\n### STEP 2\nTo run this experiment you need the _X-Fold Cross-Validation_ files and they must be compacted in **tar.gz** format. You can download these files, with 10-folds, ready for multiple multilabel dataset by clicking [here](https://www.4shared.com/folder/ypgzwzjq/datasets-cross-validation.html). For a new dataset, in addition to including it in the **datasets-original.csv** file, you must also run this code [here](https://github.com/cissagatto/crossvalidationmultilabel). In the repository in question you will find all the instructions needed to generate the files in the format required for this experiment. The **tar.gz** file can be placed on any folder on your computer or cluster. The absolute path of the file should be passed as a parameter in the configuration file that will be read by **exhaustive.R** script. The dataset will be loaded from there.\n\n### STEP 3\nYou need to have installed all the R packages required to execute this code on your machine. Check out which are needed in the file *libraries.R*. This code does not provide any type of automatic package installation!\n\n### STEP 4\nYou can use the Conda environment that I created to perform this experiment. Below are the links to download the files.\n\n| [download txt](https://www.4shared.com/s/fUCVTl13zea) | [download yml](https://www.4shared.com/s/f8nOZyxj9iq) | [download yaml](https://www.4shared.com/s/fk5Io4faLiq) |\n\nTry to use the command below to extract the environment to your computer:\n\n```\nconda env create -file AmbienteTeste.yaml\n```\n\nSee more information about Conda environments [here](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) \n\n### STEP 5\nTo run this code you will need a configuration file saved in *csv* format and with the following information:\n\n| Config          | Value                                                                     | \n|-----------------|---------------------------------------------------------------------------| \n| Dataset_Path    | Absolute path to the folder where the dataset's tar.gz is stored          |\n| Temporary_Path  | Absolute path to the folder where temporary processing will be performed* |\n| dataset_name    | Dataset name according to *datasets-original.csv* file                     |\n| number_dataset  | Dataset number according to *datasets-original.csv* file                   |\n| number_folds    | Number of folds used in cross validation                                  |\n| number_cores    | Number of cores for parallel processing                                   |\n\n* Use folders like */dev/shm*, *tmp* or *scratch* here.\n\nYou can save configuration files wherever you want. The absolute path will be passed as a command line argument.\n\n## Software Requirements\nThis code was develop in RStudio Version 1.4.1106 © 2009-2021 RStudio, PBC \"Tiger Daylily\" (2389bc24, 2021-02-11) for Ubuntu Bionic Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.8 Chrome/69.0.3497.128 Safari/537.36. The R Language version was: R version 4.1.0 (2021-05-18) -- \"Camp Pontanezen\" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit).\n\n## Hardware Requirements\nThis code may or may not be executed in parallel, however, it is highly recommended that you run it in parallel. The number of cores can be configured via the command line (number_cores). If number_cores = 1 the code will run sequentially. In our experiments, we used 10 cores. For reproducibility, we recommend that you also use ten cores. This code was tested with the birds dataset in the following machine:\n\n*System:*\n\nHost: bionote | Kernel: 5.8.0-53-generic | x86_64 bits: 64 | Desktop: Gnome 3.36.7 | Distro: Ubuntu 20.04.2 LTS (Focal Fossa)\n\n*CPU:*\n\nTopology: 6-Core | model: Intel Core i7-10750H | bits: 64 | type: MT MCP | L2 cache: 12.0 MiB | Speed: 800 MHz | min/max: 800/5000 MHz Core speeds (MHz): | 1: 800 | 2: 800 | 3: 800 | 4: 800 | 5: 800 | 6: 800 | 7: 800 | 8: 800 | 9: 800 | 10: 800 | 11: 800 | 12: 800 |\n\nThen the experiment was executed in a cluster at UFSCar.\n\n## Results\nThe results stored in the folder _OUTPUT_ it will be used in the next phase: Best-Partition-Silhoute, Best-Partition-MacroF1 or Best-Partition-MicroF1. The result for a dataset must be put in the folder _PARTITIONS_ in the respective code.\n\n\n## RUN\nTo run the code, open the terminal, enter the *~/Generate-Partitions-Random1/R* folder, and type\n\n```\nRscript random1.R [absolute_path_to_config_file]\n```\n\nExample:\n\n```\nRscript random1.R \"~/Generate-Partitions-Random1/R1-Config-Files/R1-GpositiveGO.csv\"\n```\n\n## RESULTS\nThe results are stored in a folder called reports in the project root.\n\n## DOWNLOAD RESULTS\n[Click here](https://www.4shared.com/folder/xCeyWh9j/Random1.html)\n\n## Acknowledgment\n- This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.\n- This study was financed in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPQ) - Process number 200371/2022-3.\n- The authors also thank the Brazilian research agencies FAPESP financial support.\n\n# Contact\nelainececiliagatto@gmail.com\n\n## Links\n\n| [Site](https://sites.google.com/view/professor-cissa-gatto) | [Post-Graduate Program in Computer Science](http://ppgcc.dc.ufscar.br/pt-br) | [Computer Department](https://site.dc.ufscar.br/) |  [Biomal](http://www.biomal.ufscar.br/) | [CNPQ](https://www.gov.br/cnpq/pt-br) | [Ku Leuven](https://kulak.kuleuven.be/) | [Embarcados](https://www.embarcados.com.br/author/cissa/) | [Read Prensa](https://prensa.li/@cissa.gatto/) | [Linkedin Company](https://www.linkedin.com/company/27241216) | [Linkedin Profile](https://www.linkedin.com/in/elainececiliagatto/) | [Instagram](https://www.instagram.com/cissagatto) | [Facebook](https://www.facebook.com/cissagatto) | [Twitter](https://twitter.com/cissagatto) | [Twitch](https://www.twitch.tv/cissagatto) | [Youtube](https://www.youtube.com/CissaGatto) |\n\n# Thanks\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcissagatto%2Fgenerate-partitions-random1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcissagatto%2Fgenerate-partitions-random1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcissagatto%2Fgenerate-partitions-random1/lists"}