Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/timgasser/acm_imbalanced_learning

Slides and code for the ACM Imbalanced Learning talk on 27th April 2016
https://github.com/timgasser/acm_imbalanced_learning

Last synced: about 2 months ago
JSON representation

Slides and code for the ACM Imbalanced Learning talk on 27th April 2016

Awesome Lists containing this project

README

        

# acm_imbalanced_learning

This repo contains slides and code for the [ACM Imbalanced Learning talk](http://www.meetup.com/Austin-ACM-SIGKDD/events/230200840/) on 27th April 2016 in Austin, TX.

## File listing

The files in the repo are listed below, with an explanation of what they're used for.

* ```acm_imbalance_algorithms.ipynb``` - Jupyter notebook with scikit-learn classifiers training on the Kaggle dataset.
* ```acm_imbalance_datasets.{pdf, pptx}``` - Powerpoint presentation with explanation of the dataset processing and algorithms.
* ```acm_imbalance_sampling.ipynb``` - Jupyter notebook with a set of routines to pre-process imbalanced data.
* ```acm_imbalanced_dataset.R``` - R script to use the 'unbalanced' package to pre-process data to remove imbalance.
* ```datasets.zip``` - A zip file containing datasets for use in the talk. These are listed below
* ```cs-training.csv``` - Training data from the Kaggle 'Can I get some credit' competition
* ```cs-test.csv``` - Test data from the Kaggle competition.
* ```sampleEntry.csv``` - Sample entry format for the Kagle competition.
* ```cs-training-{CNN, OSS, smote, tomek}.csv``` - Processed training data (generated from ```acm_imbalanced_dataset.R```) using the algorithms in the filename.

## Feedback

Any comments, questions, or feedback please submit a pull request !