{"id":50306871,"url":"https://github.com/cissagatto/paircomparison","last_synced_at":"2026-05-28T17:01:53.516Z","repository":{"id":226642055,"uuid":"769212168","full_name":"cissagatto/pairComparison","owner":"cissagatto","description":"One-to-one comparison. Count how many datasets your method (algorithm) obtained the best result when compared to other method (or methods) in the experiment.","archived":false,"fork":false,"pushed_at":"2025-10-07T14:59:01.000Z","size":474,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-07T16:40:54.120Z","etag":null,"topics":["artificial-intelligence","comparison-methods","machine-learning","multi-label","multi-label-classification","multi-label-partition","multi-label-problems"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cissagatto.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-03-08T15:16:39.000Z","updated_at":"2025-10-07T14:59:05.000Z","dependencies_parsed_at":"2025-09-07T17:24:08.174Z","dependency_job_id":"1076424c-28d0-4335-a932-3f8f5ed88033","html_url":"https://github.com/cissagatto/pairComparison","commit_stats":null,"previous_names":["cissagatto/pair-comparison","cissagatto/paircomparison"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cissagatto/pairComparison","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FpairComparison","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FpairComparison/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FpairComparison/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FpairComparison/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cissagatto","download_url":"https://codeload.github.com/cissagatto/pairComparison/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cissagatto%2FpairComparison/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33617718,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","comparison-methods","machine-learning","multi-label","multi-label-classification","multi-label-partition","multi-label-problems"],"created_at":"2026-05-28T17:01:52.325Z","updated_at":"2026-05-28T17:01:53.511Z","avatar_url":"https://github.com/cissagatto.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Paired Comparison of Methods in Machine Learning\n\nA paired comparison of methods in machine learning refers to a direct comparison between two models or algorithms across multiple tasks or datasets. The goal is to determine which model performs better in a head-to-head comparison on a set of metrics, such as accuracy, precision, recall, or any other relevant performance measure. In this context, each dataset serves as a paired observation, where the performance of one model is directly compared to the performance of another. This method is particularly useful for understanding the relative strengths and weaknesses of different models in specific scenarios.\n\n\n## How to cite\n\n```plaintext\n@misc{pairComparison2024,\n  author = {Elaine Cecília Gatto},\n  title = {pairComparison: A package to performing comparisons between pairs of methods. },  \n  year = {2024},\n  note = {R package version 0.1.0. Licensed under CC BY-NC-SA 4.0},\n  doi = {10.13140/RG.2.2.28587.04642},\n  url = {https://github.com/cissagatto/pairComparison}\n}\n```\n\n\n### Simplified Formalization of Pairwise Model Comparison\n\nThe goal of this approach is to generate a matrix $M \\times M$ where each element represents the total number of datasets in which a model $m_{i}$ outperforms another model $m_{j}$. If we have 10 models, the result will be a $10 \\times 10$ matrix, showing the pairwise comparisons between each model.\n\n#### 1. Metrics Where Higher Values are Better (Best Value = 1)\nFor evaluation metrics where a higher value indicates better performance (e.g., accuracy, F1-score):\n\n- **Comparison Rule**: If the value of $m_{i}$ is greater than $m_{j}$ on a specific dataset, then count that dataset as a win for $m_{i}$ (assign a score of 1). Otherwise, assign a score of 0.\n\nFormally, for each pair $(m_{i}, m_{j})$ across all datasets $D_{1}, D_{2}, \\dots, D_{N}$:\n\n```math\nC_{i,j} = \\sum_{k=1}^{N} \\text{I}(P_{i,k} \u003e P_{j,k})\n```\n\nWhere $I$ is the indicator function:\n\n```math\n\\text{I}(P_{i,k} \u003e P_{j,k}) =\n\\begin{cases}\n1 \u0026 \\text{if } P_{i,k} \u003e P_{j,k} \\\\\n0 \u0026 \\text{otherwise}\n\\end{cases}\n```\n\nHere, $C_{i,j}$ represents the total number of datasets where model $m_{i}$ outperforms model $m_{j}$.\n\n#### 2. Metrics Where Lower Values are Better (Best Value = 0)\nFor evaluation metrics where a lower value indicates better performance (e.g., Hamming loss):\n\n- **Comparison Rule**: If the value of $m_{i}$ is less than $m_{j}$ on a specific dataset, then count that dataset as a win for $m_{i}$ (assign a score of 1). Otherwise, assign a score of 0.\n\nFormally, for each pair $(m_{i},m_{j})$ across all datasets $D_{1}, D_{2}, \\dots, D_{N}$:\n\n```math\nC_{i,j} = \\sum_{k=1}^{N} \\text{I}(P_{i,k} \u003c P_{j,k})\n```\n\nWhere $I$ is the indicator function:\n\n```math\n\\text{I}(P_{i,k} \u003c P_{j,k}) =\n\\begin{cases}\n1 \u0026 \\text{if } P_{i,k} \u003c P_{j,k} \\\\\n0 \u0026 \\text{otherwise}\n\\end{cases}\n```\n\nHere, $C_{i,j}$ represents the total number of datasets where model $m_{i}$ outperforms model $m_{j}$ based on the metric where lower is better.\n\n### Additional Comparisons\nThe code can also perform additional comparisons to calculate:\n\n1. **$m_{i} \\geq m_{j}$**: The number of datasets where the performance value of $m_{i}$ is greater than or equal to $m_{j}$.\n2. **$m_{i} \\leq m_{j}$**: The number of datasets where the performance value of $m_{i}$ is lesser than or equal to $m_{j}$.\n3. **$m_{i} = m_{j}$**: The number of datasets where the performance value of $m_{i}$ and $m_{j}$ is equal.\n\nThese additional comparisons can be useful for other types of analysis, such as determining ties or dominance in a set of models.\n\n### Final Matrix\nThe final output is a comparison matrix $\\mathbf{C}$ of size $M \\times M$, where each entry $C_{i,j}$ contains the count of datasets in which model $m_{i}$ was better than model $m_{j}$ according to the specific metric being analyzed. This matrix serves as a comprehensive summary of the pairwise performance comparisons across all models and datasets, allowing for a detailed understanding of model performance in machine learning contexts.\n\n\n|         | Model_1 | Model_2 | Model_3 | Model_4 |\n|---------|---------|---------|---------|---------|\n| **Model_1** |       14 |        9 |        5 |        6 |\n| **Model_2** |        7 |       14 |        2 |        4 |\n| **Model_3** |        9 |       12 |       14 |        9 |\n| **Model_4** |        8 |       10 |        5 |       14 |\n\n- **Interpretation**: \n  - The value in the cell at row \"Model_1\" and column \"Model_2\" is **9**. This means that Model_1 outperforms Model_2 on **9 datasets**.\n  - Similarly, the value in the cell at row \"Model_2\" and column \"Model_3\" is **2**, indicating that Model_2 outperforms Model_3 on **2 datasets**.\n\n### Significance in Machine Learning\n\nIn machine learning, paired comparisons help in:\n\n1. **Model Selection**: By comparing models pairwise across datasets, you can identify which model is generally better or more consistent.\n2. **Understanding Performance Variability**: Some models may perform exceptionally well on certain datasets but poorly on others. Paired comparisons allow for the identification of such patterns.\n3. **Statistical Testing**: Paired comparisons are also the basis for statistical tests, such as the Wilcoxon signed-rank test or the Friedman test, which help to determine if the observed differences in performance are statistically significant.\n\nIn summary, paired comparisons provide a systematic way to evaluate and compare the performance of multiple models across different datasets, helping practitioners make informed decisions in model selection and evaluation.\n\n## How to Use the Code\n\n### 1. Package\n\nFirst, install the package via github\n\n```r\n# install.packages(\"devtools\")\nlibrary(\"devtools\")\ndevtools::install_github(\"https://github.com/cissagatto/pairComparison\")\nlibrary(pairComparison)\n```\n\n### 2. Computing\n\n\n#### A. For one single csv file\n\n\n```r\n\nnames.methods \u003c- c(\"Model_1\", \"Model_2\", \"Model_3\", \"Model_4\")\n\nfilename \u003c- \"~/pairComparison/data/accuracy.csv\"\n\nresults = pair.comparison(filename = filename, \n                FolderOrigin = FolderData,\n                FolderDestiny = FolderResults, \n                measure.name = \"accuracy\",\n                names.methods = names.methods)                \n\nprint(results$Accuracy$greater_or_equal)\nprint(results$Accuracy$greater)\nprint(results$Accuracy$less_or_equal)\nprint(results$Accuracy$less)\nprint(results$Accuracy$equal)\n\n\n```\n\n\n\n#### B. For more than one single csv file\n\n\n```r\n\n# Set working directory to the folder containing the CSV files\nsetwd(FolderData)\n\n# Get list of all CSV files in the directory\nfiles \u003c- list.files(full.names = TRUE) \n\n# Normalize file paths for consistency\nfull_paths \u003c- sapply(files, normalizePath)\n\n# Extract measure names from file paths\nextract_measure_names \u003c- function(file_paths) {\n  # Extract file names from paths\n  file_names \u003c- basename(file_paths)\n  # Remove file extensions to get measure names\n  measure_names \u003c- tools::file_path_sans_ext(file_names)\n  return(measure_names)\n}\n\nmeasure_names \u003c- extract_measure_names(full_paths)\n\n# Perform comparison for all measures\nresults  = pair.comparison.all.measures(names.csvs = full_paths,\n                             FolderOrigin = FolderData, \n                             FolderDestiny = FolderResults,\n                             names.methods = names.methods, \n                             names.measures = measure_names)\n\nprint(results$greater_or_equal)\nprint(results$greater)\nprint(results$less_or_equal)\nprint(results$less)\nprint(results$equal)\n                             \n```\n\n#### C. PLOTING HEATMAP FOR MANY CSV FILES\nPlease, check the script example.R in example folder.\n\n#### Example Output\n\nFor a given CSV file, the result might look like this:\n\n```csv\n,Model_1,Model_2,Model_3,Model_4\nModel_1,14,9,5,6\nModel_2,7,14,2,4\nModel_3,9,12,14,9\nModel_4,8,10,5,14\n```\n\nIn this matrix:\n- The cell at row `Model_1` and column `Model_2` shows `9`, meaning `Model_1` outperforms `Model_2` in 9 datasets.\n\n\n## Function\n\n### `pair.comparison`\n\nThe `pair.comparison` function compares methods across a single CSV file by determining how many datasets each method outperforms another. It processes a given CSV file and saves the comparison results in a structured manner in a results folder.\n\n#### Parameters\n\n- **`filename`**: A character string specifying the full path of the CSV file to be processed. The CSV should have a structure where each row represents a dataset and each column (except the first) represents a method's performance in that dataset.\n\n- **`FolderOrigin`**: A character string specifying the path to the folder where the CSV file is located. This parameter is currently unused but can be included for compatibility with other functions.\n\n- **`FolderDestiny`**: A character string specifying the path to the folder where results will be saved. The function will create a subfolder here for each measure.\n\n- **`measure.name`**: A character string specifying the name of the measure being processed. This name will be used to organize the results (see `pc.mesures()`).\n\n- **`names.methods`**: A character vector containing the names of the methods used as column names in the data. These names will be used for labeling the results.\n\n\n\n#### Returns\n\nThe function does not return any value. It writes multiple CSV files with the comparison results to the specified folder. The results are stored in the following files:\n\n- **`greater-or-equal-datasets.csv`**: Contains the number of datasets in which each method's performance value is greater than or equal to the other method.\n- **`greater-datasets.csv`**: Contains the number of datasets in which each method's performance value is greater than the other method.\n- **`less-or-equal-datasets.csv`**: Contains the number of datasets in which each method's performance value is less or equal than the other method.\n- **`less-datasets.csv`**: Contains the number of datasets in which each method's performance value is less than the other method.\n- **`equal-datasets.csv`**: Contains the number of datasets in which each method's performance value is equal to the other method.\n\n\n### Folder Structure\n\nEnsure the following folder structure is set up:\n\n- `FolderRoot`: Root directory of the project.\n- `FolderData`: Directory where CSV data files are stored.\n- `FolderResults`: Directory where results and plots are saved.\n\n\n### Documentation\n\nFor more detailed documentation on each function, check out the `~/pairComparison/docs`folder\n\n\n\n## 📚 **Contributing**\n\nWe welcome contributions from the community! If you have suggestions, improvements, or bug fixes, please submit a pull request or open an issue in the GitHub repository.\n\n\n\n## Acknowledgment\n- This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.\n- This study was financed in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPQ) - Process number 200371/2022-3.\n- The authors also thank the Brazilian research agencies FAPESP financial support.\n\n## 📧 **Contact**\n\nFor any questions or support, please contact:\n- **Prof. Elaine Cecilia Gatto** (elainececiliagatto@gmail.com)\n  \n\n# Links\n\n| [Site](https://sites.google.com/view/professor-cissa-gatto) | [Post-Graduate Program in Computer Science](http://ppgcc.dc.ufscar.br/pt-br) | [Computer Department](https://site.dc.ufscar.br/) |  [Biomal](http://www.biomal.ufscar.br/) | [CNPQ](https://www.gov.br/cnpq/pt-br) | [Ku Leuven](https://kulak.kuleuven.be/) | [Embarcados](https://www.embarcados.com.br/author/cissa/) | [Read Prensa](https://prensa.li/@cissa.gatto/) | [Linkedin Company](https://www.linkedin.com/company/27241216) | [Linkedin Profile](https://www.linkedin.com/in/elainececiliagatto/) | [Instagram](https://www.instagram.com/cissagatto) | [Facebook](https://www.facebook.com/cissagatto) | [Twitter](https://twitter.com/cissagatto) | [Twitch](https://www.twitch.tv/cissagatto) | [Youtube](https://www.youtube.com/CissaGatto) |\n\n\n---\n\nFeel free to adjust any specific details or add additional sections based on your needs.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcissagatto%2Fpaircomparison","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcissagatto%2Fpaircomparison","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcissagatto%2Fpaircomparison/lists"}