https://github.com/ritzvik/oversampling-techniques
https://github.com/ritzvik/oversampling-techniques
data machine-learning machine-learning-algorithms oversampling python r smote smote-ipf smoteipf spider spider2
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/ritzvik/oversampling-techniques
- Owner: ritzvik
- Created: 2018-10-04T17:25:57.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-02-01T14:59:18.000Z (about 6 years ago)
- Last Synced: 2024-12-29T12:43:36.586Z (4 months ago)
- Topics: data, machine-learning, machine-learning-algorithms, oversampling, python, r, smote, smote-ipf, smoteipf, spider, spider2
- Language: Python
- Size: 64.5 KB
- Stars: 4
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# oversampling-techniques
## SMOTE
Input file should contain Yes/No Class and header. The header of Yes/No Column should be named 'Class'. Entries in Yes/No Column should be either 'Y' or 'N'. Input File should be in csv format.Change k in KNN for SMOTE by changing the variable 'k_knn' in smote.r . Change 'YNcolumn' variable to indicate column number of Yes/No Column. Column numbers start from 1.
File Usage : rscript smote.r
## SMOTE-IPF
Input file should contain Yes/No Class and header. The header of Yes/No Column should be named 'Class'. Entries in Yes/No Column should be either 'Y' or 'N'. Input File should be in csv format.Change k in KNN for SMOTE by changing the variable 'k_knn' in smoteipf.r . Change 'YNcolumn' variable to indicate column number of Yes/No Column. Column numbers start from 1.
Also, variable 'n', 'k', 'voting', 'p' can be changed accordingly. For information on these variables visit : https://www.sciencedirect.com/science/article/pii/S0020025514008561
File Usage : rscript smoteipf.r
## SPIDER2
Input file should contain Yes/No Class and header. The header of Yes/No Column should be named 'Class'. Entries in Yes/No Column should be either 'Y' or 'N'. Input File should be in csv format.Change k in KNN changing the variable 'k' in spider.py . Change 'YNcolumn' variable to indicate column number of Yes/No Column. Column numbers start from 1.
Other variables can be changed below the comment "#change below parameters according to requirment". For info on these variables visit : https://link.springer.com/chapter/10.1007/978-3-642-13529-3_18
File Usage : python3 spider.py
Output : k-r-a0.csv, k-r-a1.csv, k-r-a2.csv where a0, a1 and a2 represent no aplification, weak amplification and strong amplification respectively.
#### A sample file named 'sample.csv' is uploaded for reference.