An open API service indexing awesome lists of open source software.

https://github.com/dangnh0611/data_mining_projects

Some mini projects for Introduction to Data mining techniques
https://github.com/dangnh0611/data_mining_projects

data-mining kaggle-challenge

Last synced: 8 months ago
JSON representation

Some mini projects for Introduction to Data mining techniques

Awesome Lists containing this project

README

          

# data_mining_projects
Some mini projects for Introduction to Data mining techniques
>Note that notebooks are not all focus on challenge solving, but experiments and comparations with some models/algorithms/engineering techniques

The following mini projects included:
- [1. bbc_text_categorization](#1-bbc_text_categorization)
- [2. emnist_handwriten_character_digits_recognition](#2-emnist_handwriten_character_digits_recognition)
- [3. fake_jobs_classification](#3-fake_jobs_classification)
- [4. market_basket_association_rules](#4-market_basket_association_rules)
- [5. mushrrom_classification](#5-mushrrom_classification)
- [6. people_interest_clustering](#6-people_interest_clustering)
- [7. red_wine_quality](#7-red_wine_quality)

---
## 1. bbc_text_categorization
- Notebooks can be found at [bbc_text_categorization/](bbc_text_categorization/).
- Dataset from this Kaggle Challenge: [BBC articles fulltext and category](https://www.kaggle.com/yufengdev/bbc-fulltext-and-category)
- Testing some Naive Bayes algorithms on document categorization with different feature extraction methods: binary vectorization, count vectorization, TF/IDF.

## 2. emnist_handwriten_character_digits_recognition
- Notebooks can be found at [emnist_handwriten_character_recognition/](emnist_handwriten_character_recognition/)
- Data is sampled from a part of the [EMNIST handwritten character digits dataset](https://www.nist.gov/itl/products-and-services/emnist-dataset)
- Playing with CNNs and some DL techniques with the Keras framework.

## 3. fake_jobs_classification
- Notebooks can be found at [fake_jobs_classification/](fake_jobs_classification/).
- Dataset from this Kaggle Challenge: [[Real or Fake] Fake JobPosting Prediction](https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction)

## 4. market_basket_association_rules
- Notebooks can be found at [market_basket_association_rules/](market_basket_association_rules/).
- Dataset from this Kaggle Challenge: [market_basket](https://www.kaggle.com/luckysan/market-basket)
- Association rules applied with the apiori algorithm.

## 5. mushrrom_classification
- Notebooks can be found at [mushroom_classification/](mushroom_classification/).
- Dataset from this Kaggle Challenge: [Mushroom Classification](https://www.kaggle.com/uciml/mushroom-classification)
- Decision Tree algorithm testing and visualizing.

## 6. people_interest_clustering
- Notebooks can be found at [people_interest_clustering/](people_interest_clustering/).
- Dataset from this Kaggle Challenge: [Clustering Categorical Peoples Interests](https://www.kaggle.com/rainbowgirl/clustering-categorical-peoples-interests)
- Clustering algorithms: K-means, DBSCAN and clusters visualization.

## 7. red_wine_quality
- Notebooks can be found at [red_wine_quality/](red_wine_quality/).
- Dataset from this Kaggle Challenge: [Red Wine Quality](https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009)
- Decision Tree algorithm testing and visualizing.