https://github.com/dangnh0611/data_mining_projects
Some mini projects for Introduction to Data mining techniques
https://github.com/dangnh0611/data_mining_projects
data-mining kaggle-challenge
Last synced: 8 months ago
JSON representation
Some mini projects for Introduction to Data mining techniques
- Host: GitHub
- URL: https://github.com/dangnh0611/data_mining_projects
- Owner: dangnh0611
- Created: 2020-06-12T05:52:21.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-08-19T18:08:54.000Z (about 5 years ago)
- Last Synced: 2025-01-08T01:35:21.927Z (9 months ago)
- Topics: data-mining, kaggle-challenge
- Language: Jupyter Notebook
- Homepage:
- Size: 8.45 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# data_mining_projects
Some mini projects for Introduction to Data mining techniques
>Note that notebooks are not all focus on challenge solving, but experiments and comparations with some models/algorithms/engineering techniquesThe following mini projects included:
- [1. bbc_text_categorization](#1-bbc_text_categorization)
- [2. emnist_handwriten_character_digits_recognition](#2-emnist_handwriten_character_digits_recognition)
- [3. fake_jobs_classification](#3-fake_jobs_classification)
- [4. market_basket_association_rules](#4-market_basket_association_rules)
- [5. mushrrom_classification](#5-mushrrom_classification)
- [6. people_interest_clustering](#6-people_interest_clustering)
- [7. red_wine_quality](#7-red_wine_quality)---
## 1. bbc_text_categorization
- Notebooks can be found at [bbc_text_categorization/](bbc_text_categorization/).
- Dataset from this Kaggle Challenge: [BBC articles fulltext and category](https://www.kaggle.com/yufengdev/bbc-fulltext-and-category)
- Testing some Naive Bayes algorithms on document categorization with different feature extraction methods: binary vectorization, count vectorization, TF/IDF.## 2. emnist_handwriten_character_digits_recognition
- Notebooks can be found at [emnist_handwriten_character_recognition/](emnist_handwriten_character_recognition/)
- Data is sampled from a part of the [EMNIST handwritten character digits dataset](https://www.nist.gov/itl/products-and-services/emnist-dataset)
- Playing with CNNs and some DL techniques with the Keras framework.## 3. fake_jobs_classification
- Notebooks can be found at [fake_jobs_classification/](fake_jobs_classification/).
- Dataset from this Kaggle Challenge: [[Real or Fake] Fake JobPosting Prediction](https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction)## 4. market_basket_association_rules
- Notebooks can be found at [market_basket_association_rules/](market_basket_association_rules/).
- Dataset from this Kaggle Challenge: [market_basket](https://www.kaggle.com/luckysan/market-basket)
- Association rules applied with the apiori algorithm.
## 5. mushrrom_classification
- Notebooks can be found at [mushroom_classification/](mushroom_classification/).
- Dataset from this Kaggle Challenge: [Mushroom Classification](https://www.kaggle.com/uciml/mushroom-classification)
- Decision Tree algorithm testing and visualizing.## 6. people_interest_clustering
- Notebooks can be found at [people_interest_clustering/](people_interest_clustering/).
- Dataset from this Kaggle Challenge: [Clustering Categorical Peoples Interests](https://www.kaggle.com/rainbowgirl/clustering-categorical-peoples-interests)
- Clustering algorithms: K-means, DBSCAN and clusters visualization.## 7. red_wine_quality
- Notebooks can be found at [red_wine_quality/](red_wine_quality/).
- Dataset from this Kaggle Challenge: [Red Wine Quality](https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009)
- Decision Tree algorithm testing and visualizing.