{"id":29383006,"url":"https://github.com/cissagatto/hpml.cc","last_synced_at":"2026-05-19T14:10:01.268Z","repository":{"id":300631396,"uuid":"1006635162","full_name":"cissagatto/HPML.CC","owner":"cissagatto","description":"This code is part of my Ph.D. research. The R script runs in parallel the CC made in python. ","archived":false,"fork":false,"pushed_at":"2025-07-21T12:57:23.000Z","size":53895,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-21T14:50:01.009Z","etag":null,"topics":["classification","classifier-chains","machine-learning","multilabel-classification","python","r","supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cissagatto.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-22T17:33:38.000Z","updated_at":"2025-07-21T12:57:26.000Z","dependencies_parsed_at":"2025-07-21T14:43:55.398Z","dependency_job_id":null,"html_url":"https://github.com/cissagatto/HPML.CC","commit_stats":null,"previous_names":["cissagatto/hpml.cc"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cissagatto/HPML.CC","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FHPML.CC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FHPML.CC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FHPML.CC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FHPML.CC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cissagatto","download_url":"https://codeload.github.com/cissagatto/HPML.CC/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FHPML.CC/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278538328,"owners_count":26003362,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","classifier-chains","machine-learning","multilabel-classification","python","r","supervised-learning"],"created_at":"2025-07-10T04:01:33.893Z","updated_at":"2025-10-06T00:12:55.436Z","avatar_url":"https://github.com/cissagatto.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Classifier Chains runing with R and Python\nThis code is part of my PhD research at PPG-CC/DC/UFSCar in colaboration with Katholieke Universiteit Leuven Campus Kulak Kortrijk Belgium. The R script runs in parallel the CC made in python.\n\n## How to Cite 📑\nIf you use this code in your research, please cite the following:\n\n```bibtex\n@misc{Gatto2025,\n  author = {Gatto, E. C.},\n  title = {Classifier Chains with R and Python},\n  year = {2025},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/cissagatto/HPML.CC}}\n}\n```\n\n\n\n## 🗂️ Project Structure\n\nThe codebase includes R and Python scripts that must be used together.\n\n### R Scripts (in `/R` folder):\n\n* `config-files.R`\n* `cc.R`\n* `run-python.R`\n* `cc-python.R`\n* `utils.R`\n* `libraries.R`\n\n\n### Python Scripts (in `/Python` folder):\n\n* `main.py`\n* `confusion_matrix.py`\n* `measures.py`\n* `evaluation.py`\n\n\n## ⚙️ How to Reproduce the Experiment\n\n\n### Step 1 – Prepare the Dataset Metadata File\nA file called _datasets-original.csv_ must be in the *root project directory*. This file is used to read information about the datasets and they are used in the code. We have 90 multilabel datasets in this _.csv_ file. If you want to use another dataset, please, add the following information about the dataset in the file:\n\n\n| Parameter    | Status    | Description                                           |\n|------------- |-----------|-------------------------------------------------------|\n| Id           | mandatory | Integer number to identify the dataset                |\n| Name         | mandatory | Dataset name (please follow the benchmark)            |\n| Domain       | optional  | Dataset domain                                        |\n| Instances    | mandatory | Total number of dataset instances                     |\n| Attributes   | mandatory | Total number of dataset attributes                    |\n| Labels       | mandatory | Total number of labels in the label space             |\n| Inputs       | mandatory | Total number of dataset input attributes              |\n| Cardinality  | optional  | **                                                    |\n| Density      | optional  | **                                                    |\n| Labelsets    | optional  | **                                                    |\n| Single       | optional  | **                                                    |\n| Max.freq     | optional  | **                                                    |\n| Mean.IR      | optional  | **                                                    | \n| Scumble      | optional  | **                                                    | \n| TCS          | optional  | **                                                    | \n| AttStart     | mandatory | Column number where the attribute space begins * 1    | \n| AttEnd       | mandatory | Column number where the attribute space ends          |\n| LabelStart   | mandatory | Column number where the label space begins            |\n| LabelEnd     | mandatory | Column number where the label space ends              |\n| Distinct     | optional  | ** 2                                                  |\n| xn           | mandatory | Value for Dimension X of the Kohonen map              | \n| yn           | mandatory | Value for Dimension Y of the Kohonen map              |\n| gridn        | mandatory | X times Y value. Kohonen's map must be square         |\n| max.neigbors | mandatory | The maximum number of neighbors is given by LABELS -1 |\n\n\n1 - Because it is the first column the number is always 1.\n\n2 - [Click here](https://link.springer.com/book/10.1007/978-3-319-41111-8) to get explanation about each property.\n\n\n### STEP 2: Cross-Validation Files\nThe experiment requires pre-processed cross-validation files in `.tar.gz` format. You can download the 10-fold files for multilabel datasets [here](https://1drv.ms/u/s!Aq6SGcf6js1mrZJSkZ3VEJ217rEd5A?e=IH73m3).\n\nFor new datasets, you can generate these files by following the instructions in [this repository](https://github.com/cissagatto/crossvalidationmultilabel). After generating the files, place the `.tar.gz` archive in any directory, and provide the absolute path in the configuration file for the `global.R` script.\n\n\n### STEP 3\nYou need to have installed all the Java, Python and R packages required to execute this code on your machine or server. This code does not provide any type of automatic package installation!\n\nYou can use the [Conda Environment](https://1drv.ms/u/s!Aq6SGcf6js1mw4hbhU9Raqarl8bH8Q?e=IA2aQs) that I created to perform this experiment. Below are the links to download the files. Try to use the command below to extract the environment to your computer:\n\n```\nconda env create -file AmbienteTeste.yaml\n```\n\nSee more information about Conda environments [here](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) \n\nYou can also run this code using the AppTainer [container](https://1drv.ms/u/s!Aq6SGcf6js1mw4hcVuz_IN8_Bh1oFQ?e=5NuyxX) that I'm using to run this code in a SLURM cluster. Please, check this [tutorial](https://rpubs.com/cissagatto/apptainer-slurm-r) (in portuguese) to see how to do that. \n\n\n\n### STEP 4\nTo run this code you will need a configuration file saved in *csv* format and with the following information:\n\n| Config          | Value                                                                            | \n|-----------------|----------------------------------------------------------------------------------| \n| FolderScripts   | Absolute path to the R folder scripts                                            |\n| Dataset_Path    | Absolute path to the directory where the dataset's tar.gz is stored              |\n| Temporary_Path  | Absolute path to the directory where temporary processing will be performed * 1  |\n| Partitions_Path | Absolute path to the directory where the best partitions are                     |\n| Implementation  | Must be \"clus\", \"mulan\", \"python\" or \"utiml\"                                     |\n| Dataset_Name    | Dataset name according to *dataset-original.csv* file                            |\n| Number_Dataset  | Dataset number according to *dataset-original.csv* file                          |\n| Number_Folds    | Number of folds used in cross validation                                         |\n| Number_Cores    | Number of cores for parallel processing                                          |\n\n\n1 - Use directorys like */dev/shm*, *tmp* or *scratch* here.\n\n\nYou can save configuration files wherever you want. The absolute path will be passed as a command line argument.\n\n\n## 🛠️ Software Requirements\nThis code was develop in RStudio 2024.12.0+467 \"Kousa Dogwood\" Release (cf37a3e5488c937207f992226d255be71f5e3f41, 2024-12-11) for Ubuntu Jammy Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) rstudio/2024.12.0+467 Chrome/126.0.6478.234 Electron/31.7.6 Safari/537.36, Quarto 1.5.57\n\n- R version 4.5.0 (2025-04-11) -- \"How About a Twenty-Six\", Copyright (C) 2025 The R Foundation for Statistical Computing, Platform: x86_64-pc-linux-gnu\n- Python 3.10\n- Conda 24.11.3\n\n## 💻 Hardware Recommendations\nThis code may or may not be executed in parallel, however, it is highly recommended that you run it in parallel. The number of cores can be configured via the command line (number_cores). If number_cores = 1 the code will run sequentially. In our experiments, we used 10 cores. For reproducibility, we recommend that you also use ten cores. This code was tested with the emotions dataset in the following machine:\n\n- Linux 6.11.0-26-generic #26~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC x86_64 x86_64 x86_64 GNU/Linux\n- Distributor ID: Ubuntu, Description: Ubuntu 24.04.2 LTS, Release: 24.04, Codename: noble\n- Manufacturer: Acer, Product Name: Nitro ANV15-51, Version: V1.16, Wake-up Type: Power Switch, Family: Acer Nitro V 15\n\nThen the experiment was executed in a cluster at UFSC (Federal University of Santa Catarina Campus Blumenau).\n\n\n## Results\nThe results are stored in the _REPORTS_ directory in a tar.gz format.\n\n\n## RUN\nTo run the code, open the terminal, enter the *~/HPML.CC/R* directory, and type:\n\n```\nRscript cc.R [absolute_path_to_config_file]\n```\n\nExample:\n\n```\nRscript cc.R \"~/HPML.CC/config-files/cc-emotions.csv\"\n```\n\n## DOWNLOAD RESULTS\n[Click here]\n\n\n## Acknowledgment\n- This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.\n- This study was financed in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPQ) - Process number 200371/2022-3.\n- The authors also thank the Brazilian research agencies FAPESP financial support.\n\n\n## 📞 Contact\nElaine Cecília Gatto\n✉️ [elainececiliagatto@gmail.com](mailto:elainececiliagatto@gmail.com)\n\n\n## Links\n\n| [Site](https://sites.google.com/view/professor-cissa-gatto) | [Post-Graduate Program in Computer Science](http://ppgcc.dc.ufscar.br/pt-br) | [Computer Department](https://site.dc.ufscar.br/) |  [Biomal](http://www.biomal.ufscar.br/) | [CNPQ](https://www.gov.br/cnpq/pt-br) | [Ku Leuven](https://kulak.kuleuven.be/) | [Embarcados](https://www.embarcados.com.br/author/cissa/) | [Read Prensa](https://prensa.li/@cissa.gatto/) | [Linkedin Company](https://www.linkedin.com/company/27241216) | [Linkedin Profile](https://www.linkedin.com/in/elainececiliagatto/) | [Instagram](https://www.instagram.com/cissagatto) | [Facebook](https://www.facebook.com/cissagatto) | [Twitter](https://twitter.com/cissagatto) | [Twitch](https://www.twitch.tv/cissagatto) | [Youtube](https://www.youtube.com/CissaGatto) |\n\n# Thanks\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcissagatto%2Fhpml.cc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcissagatto%2Fhpml.cc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcissagatto%2Fhpml.cc/lists"}