https://github.com/kush1912/phocket---ml-internship
This repository consists of machine Learning models, deep learning models and some NLP tasks such as Topic Modelling, Sequence generation, Sentiment analysis, Recommendation System
https://github.com/kush1912/phocket---ml-internship
black-friday classification-algorithims decison-trees keywords-extraction knn model-selection n-grams natural-language-processing nlp nlp-keywords-extraction pre-processing random-forest roc-curve sentiment-analysis sequence-to-sequence svm-classifier tensorflow tfidf-vectorizer topic-modeling twitter-sentiment-analysis
Last synced: about 1 month ago
JSON representation
This repository consists of machine Learning models, deep learning models and some NLP tasks such as Topic Modelling, Sequence generation, Sentiment analysis, Recommendation System
- Host: GitHub
- URL: https://github.com/kush1912/phocket---ml-internship
- Owner: kush1912
- Created: 2019-07-04T03:57:00.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-10-03T20:29:41.000Z (over 6 years ago)
- Last Synced: 2025-03-11T11:35:16.335Z (11 months ago)
- Topics: black-friday, classification-algorithims, decison-trees, keywords-extraction, knn, model-selection, n-grams, natural-language-processing, nlp, nlp-keywords-extraction, pre-processing, random-forest, roc-curve, sentiment-analysis, sequence-to-sequence, svm-classifier, tensorflow, tfidf-vectorizer, topic-modeling, twitter-sentiment-analysis
- Language: Jupyter Notebook
- Size: 679 KB
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# **Phocket---ML-Internship**
This repository consists of machine Learning models, deep learning models and some NLP tasks such as Topic Modelling, Sequence generation, Sentiment analysis, Recommendation System.
+ Link to collab files:https://drive.google.com/open?id=1X07MHhVhrY8oWvP2VadjUjrzfJkCN5pW
+ Link to Datasets used:https://drive.google.com/open?id=1NC4CmlifjKnT94bNJvSV4_r9xD1UOWrA
**1. Designing the preprocessing template**
+ It was able to load the dataset on its own.
+ Fill the missing values using fillna() methods and the techniques you have used to fill them.
+ Using standard scalar functions to standardize the attributes of the column.
+ One hot encoding of categorical features so that they could be sent to the algorithmic models which uses numerical models to build the model.
**2. Design a template which identifies the 3 most important independent features in the dataset.**
+ Used the above mentioned preprocessing template to preprocess the data which in way shows the utility of in work.
+ BLACK FRIDAY DATASET was used as reference-One of the very popular datasets which is highly skewed and have categorical attributes as input independent features and continuous output.
+ Designed a template which splits the data on the user input biased ratio and then trains and tests the model. I have used 6 different algorithms to train the model and compare the results.
+ I have also applied PCA and derived 4 principal components and trained and tested the model.
**3. Evaluation Of Classification model.**
+ Analysis of ROC Curve
+ Finding when the model is being going through overfitting and when the model is being underfitted.
+ ROC curve also helps us in finding out the effect of different hyper parameters used in the algorithms
+ Acurracy of the model has significant role but that just can't be the only parameters to analyse the utility of our model.
+ Health data set was used as a reference.
**4. Topic Modelling**
+ Twitter's Climate dataset was used for reference and to extract the different topics which might have been used in the discussion of the tweets.
+ NLP techniques such as tokenizing, lemmatization, stop words removal, POS tagging was used.
+ A proper template was build to understand how is the preprocessing of text based dataset is used.
+ IMPORTANT features such as popular hastags, popular mentions, and popular tweets were identified.
+ Corelation matrix was built among all three to identify the strong relationship and negative relationship between all these values
+ Algorithms used in topic modeling were LDA-Latent Dirichlet and NMF
**5. SEQUENCE2SEQUENCE MODELLING.**
+ Prediction of Song lyrics and different text based on feed data into the model
+ Completion of all the modules in coursera course and its assignments
+ Some extra assignments were given by the mentors to test weather we have really understood the concept or not.
+ 3D visualization of these models in the tensorflow library and tools
+ Sarcasm dataset was used as reference for this task
**6. Combining different models in the flask web app:**
+ Learning how to combine flask and their models with the algorithm machine learning models.
+ There were around 3-4 projects going on in which I Combined the different models.