An open API service indexing awesome lists of open source software.

https://github.com/cissagatto/multilabelsimilaritiesmeasures

Compute similarities measures (categorical data) for all labels in label space for a multilabel dataset
https://github.com/cissagatto/multilabelsimilaritiesmeasures

binary-coefficients categorial-data label-space machine-learning multilabel-classification multilabel-partitions partitions similarities-coefficients similarities-measures

Last synced: 27 days ago
JSON representation

Compute similarities measures (categorical data) for all labels in label space for a multilabel dataset

Awesome Lists containing this project

README

          

# MultiLabel Similarities Measures
Compute similarities measures (categorical data) for all labels in label space for a multilabel dataset.

## Multi-Label Datasets (original)
Click [here](https://cometa.ujaen.es/datasets/) to go to the cometa page

## 10-Fold Cross Validation Multi-Label Datasets
Click [here](https://www.4shared.com/s/dYpGZWzjQ) to download

## Conda Environment
[download txt](https://www.4shared.com/s/fUCVTl13zea)

[download yml](https://www.4shared.com/s/f8nOZyxj9iq)

[download yaml](https://www.4shared.com/s/fk5Io4faLiq)

To use conda environment to run this experiment, please consult [here](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)

## Tutorial

https://rpubs.com/cissagatto/MultiLabelSimilaritiesMeasures

## How to cite
@misc{Gatto2021, author = {Gatto, E. C.}, title = {Compute Similarities Measures for MultiLabel Classification}, year = {2021}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/cissagatto/MultiLabelSimilaritiesMeasures}}}

# Scripts
This code has the following script in the R folder

1. functions_contingency_table_multilabel.R
2. functions_measures_binary_data.R
3. functions_multilabel_binary_measures.R
4. libraries.R
5. utils.R
6. runCV.R
7. runNCV.R
8. mlsm.R

## FLOWCHART

## Preparing your experiment

### Step-1
This code is executed in X-fold cross-validation. First, you have to obtain the X-fold cross-validation files using this [code]( https://github.com/cissagatto/CrossValidationMultiLabel). All the instructions to use the code are in the Github. After that, put the results generated in the *datasets* folder in this project as "tar.gz". The folder structure generated by the code CrossValidation is used here. This code don't work without theses files.

### Step-2
A file called _datasets.csv_ must be in the *root project* folder. This file is used to read information about the datasets and they are used in the code. All 74 datasets available in *Cometa* are in this file. If you want to use another dataset, please, add the following information about the dataset in the file:

_Id, Name, Domain, Labels, Instances, Attributes, Inputs, Labelsets, Single, Max freq, Card, Dens, MeanIR, Scumble, TCS, AttStart, AttEnd, LabelStart, LabelEnd, xn, yn, gridn_

The *Id* of the dataset is a mandatory parameter in the command line to run all code. The fields are used in a lot of internal functions. Please, make sure that this information is available before running the code. *xn* and *yn* correspond to a dimension of the quadrangular map for kohonen, and *gridn* is (xn * yn). Example: xn = 4, yn = 4, gridn = 16.

## RUN

To run the code, open the terminal, enter the */MultiLabelSimilaritiesMeasures/R/* folder, and type

```
Rscript mlsm.R [number_dataset] [number_cores] [number_folds] [name_folder_results]
```

Where:

_number_dataset_ is the dataset number in the datasets.csv file

_number_cores_ is the total cores you want to use in parallel execution.

_number_folds_ is the number of folds you want for cross-validation

_name_folders_results_ is the name of the folder to save the results

All parameters are mandatory. Example:

```
Rscript mlsm.R 17 10 10 "/dev/shm/results"
```

This will execute the code for the dataset number 17 in the _dataset.csv_, with 10 cores, 10 folds and the process will be store in the _/dev/shm/results/_. This code automatically makes a copy of the */dev/shm/results* in the folder *Reports* - which is in the root of the project. In this way, you can run the code using a temporary folder, like *scratch* and *shm*, to speed up the execution.

## IMPORTANT
I used ABS function in all functions that used SQRT. Divisions per zero were treated like zero.

## Video Demonstration
Click [here](https://youtu.be/rrSh7vF60bA) to watch a video that demonstrate how to run this code

## Acknowledgment
- This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
- This study was financed in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPQ) - Process number 200371/2022-3.
- The authors also thank the Brazilian research agencies FAPESP financial support.

# Contact
elainececiliagatto@gmail.com

## Links

| [Site](https://sites.google.com/view/professor-cissa-gatto) | [Post-Graduate Program in Computer Science](http://ppgcc.dc.ufscar.br/pt-br) | [Computer Department](https://site.dc.ufscar.br/) | [Biomal](http://www.biomal.ufscar.br/) | [CNPQ](https://www.gov.br/cnpq/pt-br) | [Ku Leuven](https://kulak.kuleuven.be/) | [Embarcados](https://www.embarcados.com.br/author/cissa/) | [Read Prensa](https://prensa.li/@cissa.gatto/) | [Linkedin Company](https://www.linkedin.com/company/27241216) | [Linkedin Profile](https://www.linkedin.com/in/elainececiliagatto/) | [Instagram](https://www.instagram.com/cissagatto) | [Facebook](https://www.facebook.com/cissagatto) | [Twitter](https://twitter.com/cissagatto) | [Twitch](https://www.twitch.tv/cissagatto) | [Youtube](https://www.youtube.com/CissaGatto) |

# Thanks