{"id":20737743,"url":"https://github.com/kush1912/phocket---ml-internship","last_synced_at":"2026-04-19T22:02:36.519Z","repository":{"id":141187379,"uuid":"195165744","full_name":"kush1912/Phocket---ML-Internship","owner":"kush1912","description":"This repository consists of machine Learning models, deep learning models and some NLP tasks such as Topic Modelling, Sequence generation, Sentiment analysis, Recommendation System   ","archived":false,"fork":false,"pushed_at":"2019-10-03T20:29:41.000Z","size":695,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-03-11T11:35:16.335Z","etag":null,"topics":["black-friday","classification-algorithims","decison-trees","keywords-extraction","knn","model-selection","n-grams","natural-language-processing","nlp","nlp-keywords-extraction","pre-processing","random-forest","roc-curve","sentiment-analysis","sequence-to-sequence","svm-classifier","tensorflow","tfidf-vectorizer","topic-modeling","twitter-sentiment-analysis"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kush1912.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-04T03:57:00.000Z","updated_at":"2019-10-03T20:29:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"53f66803-6320-4baf-b34e-b2a6a82da9ba","html_url":"https://github.com/kush1912/Phocket---ML-Internship","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/kush1912/Phocket---ML-Internship","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kush1912%2FPhocket---ML-Internship","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kush1912%2FPhocket---ML-Internship/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kush1912%2FPhocket---ML-Internship/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kush1912%2FPhocket---ML-Internship/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kush1912","download_url":"https://codeload.github.com/kush1912/Phocket---ML-Internship/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kush1912%2FPhocket---ML-Internship/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28010700,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-24T02:00:07.193Z","response_time":83,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["black-friday","classification-algorithims","decison-trees","keywords-extraction","knn","model-selection","n-grams","natural-language-processing","nlp","nlp-keywords-extraction","pre-processing","random-forest","roc-curve","sentiment-analysis","sequence-to-sequence","svm-classifier","tensorflow","tfidf-vectorizer","topic-modeling","twitter-sentiment-analysis"],"created_at":"2024-11-17T06:15:31.548Z","updated_at":"2025-12-24T22:18:34.755Z","avatar_url":"https://github.com/kush1912.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **Phocket---ML-Internship**\nThis repository consists of machine Learning models, deep learning models and some NLP tasks such as Topic Modelling, Sequence generation, Sentiment analysis, Recommendation System.\n\n+ Link to collab files:https://drive.google.com/open?id=1X07MHhVhrY8oWvP2VadjUjrzfJkCN5pW\n+ Link to Datasets used:https://drive.google.com/open?id=1NC4CmlifjKnT94bNJvSV4_r9xD1UOWrA\n\n\n**1. Designing the preprocessing template**\n+ It was able to load the dataset on its own.\n+ Fill the missing values using fillna() methods and the techniques you have used to fill them.\n+ Using standard scalar functions to standardize the attributes of the column.\n+ One hot encoding of categorical features so that they could be sent to the algorithmic models which uses numerical models to build the model.\n\n**2. Design a template which identifies the 3 most important independent features in the dataset.**\n+ Used the above mentioned preprocessing template to preprocess the data which in way shows the utility of in work.\n+ BLACK FRIDAY DATASET was used as reference-One of the very popular datasets which is highly skewed and have categorical attributes as input independent features and continuous output.\n+ Designed a template which splits the data on the user input biased ratio and then trains and tests the model. I have used 6 different algorithms to train the model and compare the results.\n+ I have also applied PCA and derived 4 principal components and trained and tested the model.\n\n**3. Evaluation Of Classification model.**\n+ Analysis of ROC Curve\n+ Finding when the model is being going through overfitting and when the model is being underfitted.\n+ ROC curve also helps us in finding out the effect of different hyper parameters used in the algorithms\n+ Acurracy of the model has significant role but that just can't be the only parameters to analyse the utility of our model.\n+ Health data set was used as a reference.\n\n**4. Topic Modelling**\n+ Twitter's Climate dataset was used for reference and to extract the different topics which might have been used in the discussion of the tweets.\n+ NLP techniques such as tokenizing, lemmatization, stop words removal, POS tagging was used.\n+ A proper template was build to understand how is the preprocessing of text based dataset is used.\n+ IMPORTANT features such as popular hastags, popular mentions, and popular tweets were identified.\n+ Corelation matrix was built among all three to identify the strong relationship and negative relationship between all these values\n+ Algorithms used in topic modeling were LDA-Latent Dirichlet and NMF\n\n**5. SEQUENCE2SEQUENCE MODELLING.**\n\n+ Prediction of Song lyrics and different text based on feed data into the model\n+ Completion of all the modules in coursera course and its assignments\n+ Some extra assignments were given by the mentors to test weather we have really understood the concept or not.\n+ 3D visualization of these models in the tensorflow library and tools\n+ Sarcasm dataset was used as reference for this task\n\n**6. Combining different models in the flask web app:**\n+ Learning how to combine flask and their models with the algorithm machine learning models.\n+ There were around 3-4 projects going on in which I Combined the different models.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkush1912%2Fphocket---ml-internship","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkush1912%2Fphocket---ml-internship","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkush1912%2Fphocket---ml-internship/lists"}