https://github.com/cissagatto/multilabelsimilaritiesmeasures

Compute similarities measures (categorical data) for all labels in label space for a multilabel dataset
https://github.com/cissagatto/multilabelsimilaritiesmeasures

binary-coefficients categorial-data label-space machine-learning multilabel-classification multilabel-partitions partitions similarities-coefficients similarities-measures

Last synced: 27 days ago
JSON representation

Compute similarities measures (categorical data) for all labels in label space for a multilabel dataset

Host: GitHub
URL: https://github.com/cissagatto/multilabelsimilaritiesmeasures
Owner: cissagatto
License: gpl-3.0
Created: 2021-10-15T01:01:50.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2023-10-30T16:45:05.000Z (over 2 years ago)
Last Synced: 2023-10-30T17:41:59.509Z (over 2 years ago)
Topics: binary-coefficients, categorial-data, label-space, machine-learning, multilabel-classification, multilabel-partitions, partitions, similarities-coefficients, similarities-measures
Language: R
Homepage:
Size: 1.75 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # MultiLabel Similarities Measures

Compute similarities measures (categorical data) for all labels in label space for a multilabel dataset.

## Multi-Label Datasets (original)

Click [here](https://cometa.ujaen.es/datasets/) to go to the cometa page

## 10-Fold Cross Validation Multi-Label Datasets

Click [here](https://www.4shared.com/s/dYpGZWzjQ) to download

## Conda Environment

[download txt](https://www.4shared.com/s/fUCVTl13zea)

[download yml](https://www.4shared.com/s/f8nOZyxj9iq)

[download yaml](https://www.4shared.com/s/fk5Io4faLiq)

To use conda environment to run this experiment, please consult [here](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) 

## Tutorial

https://rpubs.com/cissagatto/MultiLabelSimilaritiesMeasures

## How to cite 

@misc{Gatto2021, author = {Gatto, E. C.}, title = {Compute Similarities Measures for MultiLabel Classification}, year = {2021}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/cissagatto/MultiLabelSimilaritiesMeasures}}}

# Scripts

This code has the following script in the R folder

1. functions_contingency_table_multilabel.R

2. functions_measures_binary_data.R

3. functions_multilabel_binary_measures.R

4. libraries.R

5. utils.R

6. runCV.R

7. runNCV.R

8. mlsm.R

## FLOWCHART

## Preparing your experiment

### Step-1

This code is executed in X-fold cross-validation. First, you have to obtain the X-fold cross-validation files using this [code]( https://github.com/cissagatto/CrossValidationMultiLabel). All the instructions to use the code are in the Github. After that, put the results generated in the *datasets* folder in this project as "tar.gz". The folder structure generated by the code CrossValidation is used here. This code don't work without theses files.

### Step-2

A file called _datasets.csv_ must be in the *root project* folder. This file is used to read information about the datasets and they are used in the code. All 74 datasets available in *Cometa* are in this file. If you want to use another dataset, please, add the following information about the dataset in the file:

_Id, Name, Domain, Labels, Instances, Attributes, Inputs, Labelsets, Single, Max freq, Card, Dens, MeanIR, Scumble, TCS, AttStart, AttEnd, LabelStart, LabelEnd, xn, yn, gridn_

The *Id* of the dataset is a mandatory parameter in the command line to run all code. The fields are used in a lot of internal functions. Please, make sure that this information is available before running the code. *xn* and *yn* correspond to a dimension of the quadrangular map for kohonen, and *gridn* is (xn * yn). Example: xn = 4, yn = 4, gridn = 16.

## RUN

To run the code, open the terminal, enter the */MultiLabelSimilaritiesMeasures/R/* folder, and type

```

Rscript mlsm.R [number_dataset] [number_cores] [number_folds] [name_folder_results]

```

Where:

_number_dataset_ is the dataset number in the datasets.csv file

_number_cores_ is the total cores you want to use in parallel execution.

_number_folds_ is the number of folds you want for cross-validation

_name_folders_results_ is the name of the folder to save the results

All parameters are mandatory. Example:

```

Rscript mlsm.R 17 10 10 "/dev/shm/results"

```

This will execute the code for the dataset number 17 in the _dataset.csv_, with 10 cores, 10 folds and the process will be store in the _/dev/shm/results/_. This code automatically makes a copy of the */dev/shm/results* in the folder *Reports* - which is in the root of the project. In this way, you can run the code using a temporary folder, like *scratch* and *shm*, to speed up the execution.

## IMPORTANT

I used ABS function in all functions that used SQRT. Divisions per zero were treated like zero.

## Video Demonstration

Click [here](https://youtu.be/rrSh7vF60bA) to watch a video that demonstrate how to run this code

## Acknowledgment

- This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

- This study was financed in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPQ) - Process number 200371/2022-3.

- The authors also thank the Brazilian research agencies FAPESP financial support.

# Contact

elainececiliagatto@gmail.com

## Links

| [Site](https://sites.google.com/view/professor-cissa-gatto) | [Post-Graduate Program in Computer Science](http://ppgcc.dc.ufscar.br/pt-br) | [Computer Department](https://site.dc.ufscar.br/) |  [Biomal](http://www.biomal.ufscar.br/) | [CNPQ](https://www.gov.br/cnpq/pt-br) | [Ku Leuven](https://kulak.kuleuven.be/) | [Embarcados](https://www.embarcados.com.br/author/cissa/) | [Read Prensa](https://prensa.li/@cissa.gatto/) | [Linkedin Company](https://www.linkedin.com/company/27241216) | [Linkedin Profile](https://www.linkedin.com/in/elainececiliagatto/) | [Instagram](https://www.instagram.com/cissagatto) | [Facebook](https://www.facebook.com/cissagatto) | [Twitter](https://twitter.com/cissagatto) | [Twitch](https://www.twitch.tv/cissagatto) | [Youtube](https://www.youtube.com/CissaGatto) |

# Thanks

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cissagatto/multilabelsimilaritiesmeasures

Awesome Lists containing this project

README