https://github.com/eeddaann/data-mining-project
https://github.com/eeddaann/data-mining-project
boosting confusion-matrix data-mining feature-engineering feature-selection gaussian-naive-bayes-implementation knn-classifier machine-learning mlp-classifier pca-analysis roc-curve
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/eeddaann/data-mining-project
- Owner: eeddaann
- Created: 2017-11-02T18:10:20.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-01-27T22:43:43.000Z (over 7 years ago)
- Last Synced: 2025-04-01T13:51:14.987Z (7 months ago)
- Topics: boosting, confusion-matrix, data-mining, feature-engineering, feature-selection, gaussian-naive-bayes-implementation, knn-classifier, machine-learning, mlp-classifier, pca-analysis, roc-curve
- Language: Jupyter Notebook
- Size: 7.04 MB
- Stars: 3
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Mining Project
This notebook is the final project for data-mining course.
For this project we applied data-mining techniques with python's scikit-learn library.
The project consist:
1. **Data Exploration:**
- statistics about the features
- scatter plots for the features
- correlation matrix
- violinplot
2. **Feature Engineering:**
- LDA
- PCA
- Modification for PCA
- Feature Generation
3. **KNN**
- Hyperparameter optimization
- apply with different preprocessing
4. **Gaussian Naive Bayes**
- apply with different preprocessing
5. **Multilayer Perceptron:**
- apply with different preprocessing
6. **Boosting:**
- Based on decision trees
- Based on Gaussian Naive Bayes
7. **Evaluation**:
- Confusion matrix
- Receiver operating characteristic (ROC)