https://github.com/hiejulia/data-mining

Turn data into meaning by ML, DL, Statistics
https://github.com/hiejulia/data-mining

Last synced: about 1 month ago
JSON representation

Turn data into meaning by ML, DL, Statistics

Host: GitHub
URL: https://github.com/hiejulia/data-mining
Owner: hiejulia
Created: 2020-09-17T13:23:48.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2025-02-11T16:59:52.000Z (8 months ago)
Last Synced: 2025-04-03T08:18:20.892Z (6 months ago)
Language: Jupyter Notebook
Homepage:
Size: 4.1 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# data-mining
Run this project iteractive ->

https://notebooks.gesis.org/binder/jupyter/user/hiejulia-data-mining-b9al5ice/tree#notebooks

# Feature description in this repo
- Stock market analyse on time series data regression

- Predict go viral -> Predictive content scoring model (ruzzit.com)
- images, headline, story content,
- K Means applied on 3d space
- PCA
- n grams
- Predictive model for content scoring
- NLTK for head line

- Regression model with geo heat map visualisation utilised different api datasources to extract underpriced housing

- Web scraping retrieve data from the web and identify outlier fares with anomaly detection. Send real time text alerts with IFTTT
(Try density based spatial clustering of application with noise, isolation forest, Grubb test) -> use generalised extreme studentized

- NLP
- bags of words
- term doc matrix
- stop words
- stemming | lemmatisation
- term frequency inverse document frequency tf-idf ratio

- SVM with tf idf vector matrix
- maximum margin hyperlane
- soft margin SVM
- kernel trick -> dimensional space

- CNN for multi class classification
- cross validation
- image feature extraction
- layers, filter
- flatten
- max pooling layers
- model metrics : cross entropy loss
- Dropout regularization

- Chat bot
- NLTK
- sequence to sequence model https://arxiv.org/pdf/1506.05869v1.pdf
- LSTM encoder - decoder

- Recommendation engine
- collaborative filtering
- hybrid system

- Regression analysis
- Residual Analysis
- Normality Test (Q-Q Plot)
- R-Squared: Goodness of Fit
- Cross Validation k fold cross validation

- Decision Tree Regression
- Mean Squared Error (MSE)
- Mean Absolute Error
- Variance Reduction
- Gini Impurity/Index
- Information Gain

- Clustering
- K means
- hierarchical clustering
- Agglomerative hierarchical clustering and Divisive hierarchical clustering.
- Affinity propagation clustering

- Ref
https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hiejulia/data-mining

Awesome Lists containing this project

README