https://github.com/dangnh0611/data_mining_projects

Some mini projects for Introduction to Data mining techniques
https://github.com/dangnh0611/data_mining_projects

data-mining kaggle-challenge

Last synced: 8 months ago
JSON representation

Some mini projects for Introduction to Data mining techniques

Host: GitHub
URL: https://github.com/dangnh0611/data_mining_projects
Owner: dangnh0611
Created: 2020-06-12T05:52:21.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-08-19T18:08:54.000Z (about 5 years ago)
Last Synced: 2025-01-08T01:35:21.927Z (9 months ago)
Topics: data-mining, kaggle-challenge
Language: Jupyter Notebook
Homepage:
Size: 8.45 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # data_mining_projects

Some mini projects for Introduction to Data mining techniques 

>Note that notebooks are not all focus on challenge solving, but experiments and comparations with some models/algorithms/engineering techniques 

The following mini projects included:

  - [1. bbc_text_categorization](#1-bbc_text_categorization)

  - [2. emnist_handwriten_character_digits_recognition](#2-emnist_handwriten_character_digits_recognition)

  - [3. fake_jobs_classification](#3-fake_jobs_classification)

  - [4. market_basket_association_rules](#4-market_basket_association_rules)

  - [5. mushrrom_classification](#5-mushrrom_classification)

  - [6. people_interest_clustering](#6-people_interest_clustering)

  - [7. red_wine_quality](#7-red_wine_quality)

---

## 1. bbc_text_categorization

- Notebooks can be found at [bbc_text_categorization/](bbc_text_categorization/).

- Dataset from this Kaggle Challenge: [BBC articles fulltext and category](https://www.kaggle.com/yufengdev/bbc-fulltext-and-category)

- Testing some Naive Bayes algorithms on document categorization with different feature extraction methods: binary vectorization, count vectorization, TF/IDF.

## 2. emnist_handwriten_character_digits_recognition

- Notebooks can be found at [emnist_handwriten_character_recognition/](emnist_handwriten_character_recognition/)

- Data is sampled from a part of the [EMNIST handwritten character digits dataset](https://www.nist.gov/itl/products-and-services/emnist-dataset)

- Playing with CNNs and some DL techniques with the Keras framework.

## 3. fake_jobs_classification

- Notebooks can be found at [fake_jobs_classification/](fake_jobs_classification/).

- Dataset from this Kaggle Challenge: [[Real or Fake] Fake JobPosting Prediction](https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction)

## 4. market_basket_association_rules

- Notebooks can be found at [market_basket_association_rules/](market_basket_association_rules/).

- Dataset from this Kaggle Challenge: [market_basket](https://www.kaggle.com/luckysan/market-basket)

- Association rules applied with the apiori algorithm.

  

## 5. mushrrom_classification

- Notebooks can be found at [mushroom_classification/](mushroom_classification/).

- Dataset from this Kaggle Challenge: [Mushroom Classification](https://www.kaggle.com/uciml/mushroom-classification)

- Decision Tree algorithm testing and visualizing.

## 6. people_interest_clustering

- Notebooks can be found at [people_interest_clustering/](people_interest_clustering/).

- Dataset from this Kaggle Challenge: [Clustering Categorical Peoples Interests](https://www.kaggle.com/rainbowgirl/clustering-categorical-peoples-interests)

- Clustering algorithms: K-means, DBSCAN and clusters visualization.

## 7. red_wine_quality

- Notebooks can be found at [red_wine_quality/](red_wine_quality/).

- Dataset from this Kaggle Challenge: [Red Wine Quality](https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009)

- Decision Tree algorithm testing and visualizing.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dangnh0611/data_mining_projects

Awesome Lists containing this project

README