{"id":15911279,"url":"https://github.com/devamoghs/ml-study-plan","last_synced_at":"2025-04-03T02:14:28.658Z","repository":{"id":97396842,"uuid":"185017596","full_name":"devAmoghS/ML-Study-Plan","owner":"devAmoghS","description":null,"archived":false,"fork":false,"pushed_at":"2022-12-25T17:14:06.000Z","size":27,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-08T16:16:49.637Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devAmoghS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-05T10:40:45.000Z","updated_at":"2024-05-27T20:59:49.000Z","dependencies_parsed_at":"2023-03-13T16:14:33.774Z","dependency_job_id":null,"html_url":"https://github.com/devAmoghS/ML-Study-Plan","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devAmoghS%2FML-Study-Plan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devAmoghS%2FML-Study-Plan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devAmoghS%2FML-Study-Plan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devAmoghS%2FML-Study-Plan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devAmoghS","download_url":"https://codeload.github.com/devAmoghS/ML-Study-Plan/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246922246,"owners_count":20855345,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-06T15:40:59.683Z","updated_at":"2025-04-03T02:14:28.639Z","avatar_url":"https://github.com/devAmoghS.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# ML-Study-Plan\n\t\t\t\t\t\t\n## Week 1: Learn Scala:\n\t\t\t\t\t\t\n1. Programming-Scala-Martin-Odersky (coursera)\n2. https://www.coursera.org/specializations/scala\n\t\t\t\t\t\t\n## Learn Spark Scala:\n1. http://spark.apache.org/docs/latest/quick-start.html\n2. https://spark.apache.org/docs/latest/sql-programming-guide.html\n3. http://spark.apache.org/docs/latest/ml-guide.html\n4. Learning-Spark-Lightning-Fast-Data-Analysis\n5. Advanced-Analytics-Spark-Sandy-Ryza77\n6. Hadoop-Definitive-Guide-Tom-White\n\t\t\t\t\t\t\nDo Machine Learning course by Andrew Ng on Coursera :\n1. https://www.coursera.org/learn/machine-learning/home\n\t\t\t\t\t\t\n## Week 2:\n\t\t\t\t\t\t\nLearn one API/Algorithm from Spark Scala ML library and come up with ideas to use it Implement the idea and show the results\n\t\t\t\t\t\t\n## Week 3:\nLearn one algorithm from Data Mining/Machine Learning and come up with ideas to use it\n\t\t\t\t\t\t\n## Week 4:\nRead a recently published paper and come up with ideas to use it\n\t\t\t\t\t\nAt your own pace: Open-source Libraries:\n\t\t\t\t\t\t\n1. https://github.com/showcases/machine-learning\n2. Scikit-learn\n3. Shogun\n4. Mahout\t\t\t\t\t\t\n5. H2O\n6. Oryx\n7. TensorFlow\n8. Weka\n\t\t\t\t\t\t\nML/Data Mining Competitions:\n1. http://www.kdnuggets.com/competitions/\n2. https://www.kaggle.com/\n3. http://2017.recsyschallenge.com/\n4. http://www.image-net.org/challenges/LSVRC/\n5. http://www.chalearn.org/challenges.html\n\t\t\t\t\t\t\nResearch Conferences:\n1. http://dblp.uni-trier.de/db/conf/nips/\n2. http://dblp.uni-trier.de/db/conf/icml/\n3. http://dblp.uni-trier.de/db/conf/sigir/index.html\n4. http://dblp.uni-trier.de/db/conf/recsys/index.html\n5. http://dblp.uni-trier.de/db/conf/icdm/index.html\n6. http://dblp.uni-trier.de/db/conf/kdd/\n\t\t\t\t\t\t\nMore from Coursera:\n1. https://www.coursera.org/specializations/deep-learning\n2. https://www.coursera.org/learn/neural-networks/home\n3. https://www.coursera.org/learn/data-patterns\n4. https://www.coursera.org/learn/cluster-analysis\n5. https://www.coursera.org/learn/recommender-systems\n6. https://www.coursera.org/specializations/probabilistic-graphical-models\n7. https://www.coursera.org/specializations/data-mining\n8. https://www.coursera.org/specializations/recommender-systems\n\t\t\t\t\t\t\nBooks:\n1. http://www.deeplearningbook.org/\t\n2. Fundamentals-Machine-Learning-Predictive-Analytics\n3. Pattern-Recognition-Learning-Information-Statistics\n4. Elements-Statistical-Learning-Prediction-Statistics\n5. Reinforcement-Learning-Introduction-Adaptive-Computation\n6. Machine-Learning-Probabilistic-Perspective-Computation\n7. Python-Machine-Learning-Sebastian-Raschka\n8. Data-Science-Scratch-Principles-Python\n9. Applied-Predictive-Modeling-Max-Kuhn\n10. Introduction-Statistical-Learning-Applications-Statistics\n11. Machine-Learning-Second-Brett-Lantz\n12. Data-Mining-Textbook-Charu-Aggarwal\n13. Data-Science-Business-Data-Analytic-Thinking\n14. Predictive-Analytics-Power-Predict-Click\n15. Storytelling-Data-Visualization-Business-Professionals\n16. Functional-Programming-SCALA-Manning-Chiusano \n\n\n## Data Science Super Harsh Guide\n\nFirst, read fucking Hastie, Tibshirani, and whoever. Chapters 1–4 and 7–8. If you don’t understand it, keep reading it until you do.\n\nYou can read the rest of the book if you want. You probably should, but I’ll assume you know all of it.\n\nTake Andrew Ng’s Coursera. Do all the exercises in python and R. Make sure you get the same answers with all of them.\n\nNow forget all of that and read the deep learning book. Put tensorflow and pytorch on a Linux box and run examples until you get it. \n\nDo stuff with CNNs and RNNs and just feed forward NNs.\n\nOnce you do all of that, go on arXiv and read the most recent useful papers. The literature changes every few months, so keep up.\n\nThere. Now you can probably be hired most places. \n\nIf you need resume filler, so some Kaggle competitions. \n\nIf you have debugging questions, use StackOverflow. \n\nIf you have math questions, read more. \n\nIf you have life questions, I have no idea.\n\n\n## New Grads / Current Students\n\nYou have:\n=======================================\nBachelors in Computer Science, Statistics, or Math \u003cbr/\u003e\nIntermediate-advanced proficiency in programming (relative to the other two groups) \u003cbr/\u003e\nKnowledge of data structures and algorithms i.e. leetcode \u003cbr/\u003e\nKnowledge of SQL, Spark, Hadoop, AWS \u003cbr/\u003e\nProjects in machine learning or Kaggle competitions \u003cbr/\u003e\n\nJobs you should target:\n=======================================\nData Analyst \u003cbr/\u003e\nData Engineer \u003cbr/\u003e\nSoftware Engineer in Machine Learning  \u003cbr/\u003e\nSoftware Engineer in Data Science \u003cbr/\u003e\n\nObstacles you may face:\n=======================================\nYou are looking for that rare, sweet spot in data science for entry level/new grad roles \u003cbr/\u003e\nEvery data science position you find online will have many applicants with a graduate degree so you will need to stand out somehow or get a referral \u003cbr/\u003e\nYour lack of work experience will usually disqualify you for senior roles at larger companies and early hires at startups \u003cbr/\u003e\nAvoid companies that do not have proper data infrastructure pipeline set up (they will likely bait and switch you or try to make you into a jack-of-all-trades data person) \u003cbr/\u003e\n\nHow you should be job hunting:\n=======================================\nConsider data science roles at non-tech companies in data driven industries like healthcare analytics \u003cbr/\u003e\nNetwork with your professors and classmates \u003cbr/\u003e\nNext to a graduate degree, an internship in data science and machine learning is the next best thing \u003cbr/\u003e\nResearch experience is also really good to have and make great talking points in interviews \u003cbr/\u003e\nFind out which interview skills are important for you (they vary from normal leetcode, project based, implementing research papers, SQL, ML theory, statistics puzzles depending on the company) \u003cbr/\u003e\nGet into a Big-N as a regular software engineer and try to transition to their data science teams \u003cbr/\u003e\n\n## Ideas\n\n### AWS Rekognition\nMade ppt\nTry out a simple demo by implementing the Deep Expectation Paper\n\n### Articles about:\nWrite in both Medium and LinkedIn\n\n1. Locality Sensitive Hashing \u003cbr/\u003e\n2. Huffman Coding \u003cbr/\u003e\n4. Basics of NLP \u003cbr/\u003e\n5. Generative Adversarial Networks \u003cbr/\u003e\n6. Neural Style Transfer -- Done \u003cbr/\u003e\n7. Age-Gender Determination using Deep Learning \u003cbr/\u003e\n8. Tail Recursion in Python / Scala \u003cbr/\u003e\n\n### Skills upgrading\nIn demand tools: R Python SAS Tableau Spark\n\nIn demand skills: Hadoop, Spark, Machine Learning, NoSQL Databases, Data Visualisation\n\nBig Data Unicorn -\u003e Hadoop, Spark, Tableau, Mongo and Cassandra\n\nCities -\u003e Mumbai, Bengaluru, Delhi, Pune, Hyderabad, Kolkata, Chennai\n\n### Thoughts to ponder upon\nusing k means clustering algorithm for file compression (can achieve 1/6th file size of the original)\n\nhow useful is game theory in machine learning\n\ninternal working of spark\n\n## Address Matching (Statistics)\n\nAddress matching\n\n1st Floor, No 141/142, 22nd Cross, 36th Main, Jayanagar 9th Block, Bannerghatta Road, Bangalore - 560 035\t\n\nComponents of an address\n===========================\ndoor number (optional) + \n\"building name\" (optional) + \nstreet number + \nstreet name (cross and main or single road) + \narea name (multiple; sometimes with block/phase/stage/sector) + \nmajor street name (optional) + \ncity name + \npostcode + \nstate name (optional)\n\nBefore adding checks for pincode and Flat no.\n\nRunning the algorithm with threshold set at: 70 %\nTrue Positive:  395\nTrue Negative:  335\nFalse Positive:  92\nFalse Negative:  32\n\n\n\nAccuracy:  0.8548009367681498\nPrecision:  0.811088295687885\nRecall:  0.9250585480093677\nF1-score:  0.8643326039387309\n\n=============================================\nModifying the pincode flag -\u003e False Positives reduced, all numbers improved\n\nRunning the algorithm with threshold set at: 70 %\nTrue Positive:  389\nTrue Negative:  391\nFalse Positive:  36\nFalse Negative:  38\n\n\n\nAccuracy:  0.9133489461358314\nPrecision:  0.9152941176470588\nRecall:  0.9110070257611241\nF1-score:  0.9131455399061031\n\nProcess finished with exit code 0\n==============================================\nAdding the flatno flag -\u003e False Positives are reduced, precision grew\n\nRunning the algorithm with threshold set at: 70 %\nTrue Positive:  268\nTrue Negative:  424\nFalse Positive:  3\nFalse Negative:  159\n\n\n\nAccuracy:  0.810304449648712\nPrecision:  0.988929889298893\nRecall:  0.6276346604215457\nF1-score:  0.7679083094555874\n\nProcess finished with exit code 0\n======================================================\nCleaned the data \n\nRunning the algorithm with threshold set at: 70 %\nTrue Positive:  428\nTrue Negative:  421\nFalse Positive:  37\nFalse Negative:  40\n\n\n\nAccuracy:  0.9168466522678186\nPrecision:  0.9204301075268817\nRecall:  0.9145299145299145\nF1-score:  0.917470525187567\n================================================================\nAdded the flat num flag along with cleaned data and pincode\n\nRunning the algorithm with threshold set at: 70 %\nTrue Positive:  292\nTrue Negative:  455\nFalse Positive:  3\nFalse Negative:  176\n\n\n\nAccuracy:  0.806695464362851\nPrecision:  0.9898305084745763\nRecall:  0.6239316239316239\nF1-score:  0.7653997378768022\n=====================================================================\nRunning the algorithm with threshold set at: 70 %\nTrue Positive:  294\nTrue Negative:  465\nFalse Positive:  1\nFalse Negative:  168\n\n\n\nAccuracy:  0.8178879310344828\nPrecision:  0.9966101694915255\nRecall:  0.6363636363636364\nF1-score:  0.7767503302509906\n\n===============================================================\nrows in red:  60\ncaptured red:  37\nnew coverage:  61.666666666666664 %\n\nrows in yellow:  249\ncaptured yellow:  225\nnew coverage:  90.36144578313252 %\n\n====================================================================\nAdding numeric token similarity-\u003e True Postive increased, False negative decreased, f1-score and accuracy increased \n\nRunning the algorithm with threshold set at: 70 %\nTrue Positive:  352\nTrue Negative:  463\nFalse Positive:  3\nFalse Negative:  110\n\n\n\nAccuracy:  0.8782327586206896\nPrecision:  0.9915492957746479\nRecall:  0.7619047619047619\nF1-score:  0.8616891064871481\n\n======================================================================================\nTuning the sim score to mean sim_score\n\nRunning the algorithm with threshold set at: 70 %\nTrue Positive:  398\nTrue Negative:  459\nFalse Positive:  7\nFalse Negative:  64\n\n\n\nAccuracy:  0.9234913793103449\nPrecision:  0.9827160493827161\nRecall:  0.8614718614718615\nF1-score:  0.9181084198385236\n==========================================================================================\nMaking a model with decision tree = (with ans as a feature)\n\nAccuracy is  94.75\nConfusion Matrix is  [[183   9]\n [ 12 196]]\nClassification Report is               precision    recall  f1-score   support\n\n          0       0.94      0.95      0.95       192\n          1       0.96      0.94      0.95       208\n\navg / total       0.95      0.95      0.95       400\n\n================================================================================================\nMaking a model with decision tree = (without ans as a feature)\n\n\nAccuracy is  94.75\nConfusion Matrix is  [[183   9]\n [ 12 196]]\nClassification Report is               precision    recall  f1-score   support\n\n          0       0.94      0.95      0.95       192\n          1       0.96      0.94      0.95       208\n\navg / total       0.95      0.95      0.95       400\n\n====================================================================================================\nnew data set v2\n\nTrue Positive:  547\nTrue Negative:  779\nFalse Positive:  32\nFalse Negative:  140\n\nAccuracy:  0.8851802403204272\nPrecision:  0.9447322970639033\nRecall:  0.7962154294032023\nF1-score:  0.8641390205371248\n\nProcess finished with exit code 0\n===================================================================================================\nnew data set v2\n\nAccuracy is  89.33333333333333\nConfusion Matrix is  [[302  29]\n [ 35 234]]\nClassification Report is               precision    recall  f1-score   support\n\n        0.0       0.90      0.91      0.90       331\n        1.0       0.89      0.87      0.88       269\n\navg / total       0.89      0.89      0.89       600\n\n==========================================================================================================\ntrain = old data set, test = new data set\n\nAccuracy is  90.4539385847797\nConfusion Matrix is  [[713  98]\n [ 45 642]]\nClassification Report is               precision    recall  f1-score   support\n\n        0.0       0.94      0.88      0.91       811\n        1.0       0.87      0.93      0.90       687\n\navg / total       0.91      0.90      0.90      1498\n===========================================================================\nAfter correcting false positives\n\nAccuracy is  92.59012016021362\nConfusion Matrix is  [[713  66]\n [ 45 674]]\nClassification Report is               precision    recall  f1-score   support\n\n          0       0.94      0.92      0.93       779\n          1       0.91      0.94      0.92       719\n\navg / total       0.93      0.93      0.93      1498\n\n\n\n\n==========================\n28th feb\n==========================\nwithout correcting false positives and false negatives\n\nAccuracy is  90.4539385847797\nConfusion Matrix is  [[713  98]\n [ 45 642]]\nClassification Report is               precision    recall  f1-score   support\n\n          0       0.94      0.88      0.91       811\n          1       0.87      0.93      0.90       687\n\navg / total       0.91      0.90      0.90      1498\n\n===============================================\nAccuracy is  90.72096128170895\nConfusion Matrix is  [[712  99]\n [ 40 647]]\nClassification Report is               precision    recall  f1-score   support\n\n          0       0.95      0.88      0.91       811\n          1       0.87      0.94      0.90       687\n\navg / total       0.91      0.91      0.91      1498\n\n================================================\nTraining Stats\n\nTrue Positive:  398\nTrue Negative:  450\nFalse Positive:  8\nFalse Negative:  70\n=================================================\n\n\n\n\nAccuracy:  0.9157667386609071\nPrecision:  0.9802955665024631\nRecall:  0.8504273504273504\nF1-score:  0.9107551487414187\n============================================\n\nAccuracy is  90.72096128170895\nConfusion Matrix is  [[765  46]\n [ 93 594]]\nClassification Report is               precision    recall  f1-score   support\n\n          0       0.89      0.94      0.92       811\n          1       0.93      0.86      0.90       687\n\navg / total       0.91      0.91      0.91      1498\n\n=====================================================\nTest stats\n\nTrue Positive:  551\nTrue Negative:  779\nFalse Positive:  32\nFalse Negative:  136\n\n\n\nAccuracy:  0.8878504672897196\nPrecision:  0.9451114922813036\nRecall:  0.8020378457059679\nF1-score:  0.8677165354330709\n\n=========================================================\nadded soundex score to the features\nAccuracy is  90.72096128170895\nConfusion Matrix is  [[765  46]\n [ 93 594]]\nClassification Report is               precision    recall  f1-score   support\n\n          0       0.89      0.94      0.92       811\n          1       0.93      0.86      0.90       687\n\navg / total       0.91      0.91      0.91      1498\n\n\nratio()\nReturn a measure of the sequences’ similarity as a float in the range [0, 1].\n\nWhere T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common.\n\n\n===========================================================\nneural network results\n\nTest score: 0.20490155638617094\nTest accuracy: 0.9272363152459403\n\n===============================================================\n\nchanged all records to 'fail' where pincode is not matching\n\nTest score: 0.18380965855474943\nTest accuracy: 0.9319092119647282\n\n=====================================================\n\nTest score: 0.192987822881528\nTest accuracy: 0.9359145524186668\n\nProb |  No os obs. | Prc 1 | Prc 0 | TPR | FPR|\n\n======================================\n\nAdding validation set \nepochs=60, batch_size=10\n\nTest score: 0.13257216911516592\nTest accuracy: 0.9498998004353357\n\n========================\nHyperParamters Optimization\n========================\n\nBest: 0.946742 using {'batch_size': 40, 'epochs': 100}\n\n======================================\nSetting batch size 40\n\nTest score: 0.13517679035185812\nTest accuracy: 0.9519038084513678\n\n=====================================\nSetting epochs 100\n\nTest score: 0.15414797805832\nTest accuracy: 0.957915832499464\n\nBest: 0.944862 using {'optimizer': 'Adam'}\n0.831454 (0.006203) with: {'optimizer': 'SGD'}\n0.942356 (0.004430) with: {'optimizer': 'RMSprop'}\n0.937970 (0.007674) with: {'optimizer': 'Adagrad'}\n0.944236 (0.003195) with: {'optimizer': 'Adadelta'}\n0.944862 (0.008993) with: {'optimizer': 'Adam'}\n0.943609 (0.001535) with: {'optimizer': 'Adamax'}\n0.809524 (0.192290) with: {'optimizer': 'Nadam'}\n\n=====================================\nAdding lecun unifrom\n\nTest score: 0.1970766116538363\nTest accuracy: 0.9519038084513678\n\n\n### Songs\nChris Isaac Wiked Game Chillon Mix\nHans Zimmer - Time (Pen Perry Remix)\nStill Corners - The Trip\nHey Baby feat. Debs Daughter\nWorakls - Coeur de la Nuit Unofficial Video\nI'm Shipping Up To Boston - Dropkick Murphys\n\n### Things to Learn\n\nThings To Learn\n===========================\nBasics of R (completing datacamp tutorial)\n\nAlgorithms\n=============================\nRegression algorithms (stanford , completed)\nSVM algorithms\nData projection algorithms\nDeep learning algorithms\nTime series forecasting algorithms\nRating system algorithms\nRecommender system algorithms\nFeature selection algorithms\nClass imbalance algorithms\nDecision tree algorithms\n\nDeep Learning\n=============\nRecurrent attention\nSequence masking\n\nAdditional Tools\n==============\nAWS\nnumpy\npandas (decent grip)\nSQL\n\nLibraries\n================\nScikit Learn (decent grip)\nPyTorch\nTensorFlow\n\nLinear Algebra  (MIT OCW) 1/50\n===============================\nPrincipal Component Analysis (PCA), Singular Value Decomposition (SVD), Eigendecomposition of a matrix, LU Decomposition, QR Decomposition/Factorization, Symmetric Matrices, Orthogonalization \u0026 Orthonormalization, Matrix Operations, Projections, Eigenvalues \u0026 Eigenvectors, Vector Spaces and Norms\n\nMatrix Algebra (The Matrix Cookbook) \nProbability Theory\n\n\n\nStatistics (Think Stats)\n================================\nCombinatorics, Probability Rules \u0026 Axioms, Bayes’ Theorem, Random Variables, Variance and Expectation, Conditional and Joint Distributions, Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian), Moment Generating Functions, Maximum Likelihood Estimation (MLE), Prior and Posterior, Maximum a Posteriori Estimation (MAP) and Sampling Methods\n\nSingle Variable Calculus\n\nMultiVariate Calculus\n================================\nIntegral Calculus, Partial Derivatives, Vector-Values Functions, Directional Gradient, Hessian, Jacobian, Laplacian and Lagragian Distribution\n\nOptimiztion Theory\n\nExcellent understanding of machine learning techniques and algorithms, such as \nk-NN, \nNaive Bayes, \nSVM, \nRandom Forest, \nDecision Forests, \nLogistic Regression, \nNeural Networks, \nRecommenders, \nK-means, \nBoosted machines, \nEnsemble Learning, \nClustering, \nClassification\n\nUnderstanding of deep neural networks - \nAutoencoders, \nCNN, \nRNN, \nGAN, \nBoltzmann Machine\n \nGood understanding of Statistical modelling like \nlinear, logistic regression, \nclassification, \nhypothesis testing, \nANOVA, \nPCA, \nSVD\n\n\n===========================\nCore Companies\n===========================\nAMD\nMathWorks\nNVIDIA \nVisa\nMasterCard\nAmazon\nFlipkart\nAdobe\nZS Associates\nIBM\nSymantec\nThoughtWorks\nVMware\nEMC\nCISCO\n\n### How to use Cracking the Coding Interview to pass data science code challenges...\n\nStart with chapter 7 and then work through chapters 1, 2, 3, 4, 10 in order, writing Python code (ideally OO) to solve 50% of the problems. It's a bit repetitive to solve every question in the book, so just complete all the odd or even problems at first (you can complete the rest of the problems later if you need extra practice).\n\n👉 If you’re interested in more software-oriented roles, e.g. machine learning engineer, then do the problems from chapters 6 and 8 as well, otherwise those are optional.\n\nChapters 6 and 8 sometimes also help for tech and finance companies and chapter 6 can also be very relevant for analytics roles at companies that like to ask brain teasers.\n\nSide Projects \n=================\n\n[COMPLETED] = Latent Dirichlet Allocation on NewsGroup Dataset using Scikit Learn\n\n[COMPLETED] = Deep Learning Network on MNIST dataset using Keras\n\n[COMPLETED] = Movie Recommendation on MovieLens dataset using Scala, Spark MLlib and Alternating Least Sqaures\n\n[COMPLETED] = Predicting Breast Cancer on Wisconsin Breast Cancer dataset using Scala, Spark MLlib and Random Forests\n\n[COMPLETED] = Spam Filtering Engine using Naive Bayes Classifier on Spam Assassin Public Corpus\n\n[COMPLETED] = Spelling Corrector\n\n[IN PROGRESS] = Sentdex Regression\n\n[COMPLETED] = License Plate Recognition\n\n[COMPLETED] = Bank Customer Churn using Keras and Scikit-Learn\n\n[IN PROGRESS] = Implementing Random Forest Algorithm from scratch on Sonar Dataset\n\nhttps://medium.freecodecamp.org/the-hitchhikers-guide-to-machine-learning-algorithms-in-python-bfad66adb378\n[In Progress] = Linear Regression\n[In Progress] = Logistic Regression\n[In Progress] = Decision Trees\n[In Progress] = Support Vector Machines\n[In Progress] = K-Nearest Neighbors\n[In Progress] = Random Forests\n[In Progress] = K-Means Clustering\n[In Progress] = Principal Components Analysis\n\n\nCorporate Wisdom\n==========================================================\n First of all, looking at it from the employer’s perspective, \nasking for more money immediately makes you self-centered and selfish. \nPutting the focus on the long term gets you away from this. \nPick some point in the future that we both agree on, and what you want then. \nIf you immediately pivot to exactly where you want to be in the strategic future of the company, then the person you’re talking to says, \n“This guy is going to make my future better,” which then puts you in a position to ask for more, because you’re automatically more valuable.\n\nAchievers use a success list, not a to do list. They have a strong sense of priority.\nIf you want extraordinary results, you need to narrow your focus.\nDo your most important work, your “one thing”, when your willpower is strongest. For many people, that’s early in the day.\nAsk the question: what’s the one thing I can do that will make everything else easier?\n\nI understand a business problem,\nthe decisions before the stakeholder, \nthe multiple ways to frame the problem, and \nthe trade-offs between each. \n\nI understand how data can and cannot help, \nthe variety of techniques I can use given a chosen approach. \nI can defend why I chose one particular approach, \nhow the resulting model works, \nits limitations as applied to a problem, \nand problem appropriate metrics. \n\nI translate them into recommendations the business can digest, \nand persuade towards a value creating outcome.\n============================================================\nThe upper tier performers dont focus on the outcome, they focus on the process\n\nThe truly successful people attract success, not chase it\n\nEvery skill you acquire doubles your chances of success\n\nIn 1925, one year before he entered school, Isaac Asimov taught himself to read. Uneducated and thus unable to support his son, his father gave him a library card. Without any direction, the curious boy read everything.\n\nDon’t boast. Those who know more will see you for the fool you really are. Those who know as much as you will resent you for boasting about things they already know. Those who know less will kiss your ass and be yes men until they know more than you\n\n\nMovies\n=========\nDog Day Afternoon - Al Pacino\nInside Man - Denzel Washington\nCatch Me If You Can - Leonardo D'Caprio\nChina Town - Jack Nicholson\nThe Postman Always Rings Twice - Jack Nicholson\n\n## The Blunt Guide to Mathematically Rigorous Machine Learning\n\nI won’t be going through the math portions again, you can check out my other article or this excellent post by YC on the topic. My advice, learn enough Linear Algebra, Stats, Probability, and Multivariate Calculus to feel good about yourself, and learn everything else as you have to.\n\n1. Elements of Statistical Learning\nPrioritize Chapters 1–4 and Chapters 7–8. This covers supervised learning, linear regression, classification, Model Assessment and Inference. Its okay if you don’t understand it at first, absolutely nobody does. Keep reading it and learning whatever math you need to until you get it. If you want, knock the whole book out, you won’t regret it.\n\nIf Elements is really just too hard, you can start with Introduction to Statistical Learning, by the same authors. The book sacrifices some mathematical explanation and focuses on a subset of the problems in Elements, but is a good ramping up point to understanding the material. There is an excellent accompanying course provided by Stanford for free.\n\nBoth books focus on R, which is worth learning.\n\n2. Stanford CS 229\nOnce you’ve finished Elements, you’re in a great position to take Stanford’s ML course, taught by Andrew Ng. You can think about this like the mathematically rigorous version of his popular Coursera course. Going into this course, make sure to refresh your Multivariate Calculus and Linear Algebra skills, as well as some probability. They provide some handy refresher guides on the site page.\n\nDo all the exercises and problem sets, and try doing the programming assignments in both R and Python. You’ll thank me later.\n\nYou can again opt to go for a slightly easier route in Andrew Ng’s Coursera course, which is focused more on implementation and less on underlying theory and the math. I would really just do all the programming assignments from there as well. You don’t have to do them in Octave/Matlab, you can do R and Python versions. There are plenty of repos to compare to on Github.\n\n3. Deep Learning Book\nAt this point, you’re starting to get formidable. You have a fundamental mathematical understanding of many popular, historic techniques in Machine Learning, and can choose to dive into any vertical you want. Of course, most people want to go into Deep Learning because of its significance in industry.\n\nGo through the DL book. It will refresh you on a lot of math and also fundamentally explain much of modern Deep Learning well. You can start messing around with implementations by spinning up a Linux box and doing cool shit with CNNs, RNNs and regular old feed forward neural networks. Use Tensorflow and Pytorch, and start to get a sense of how awesome some of these libraries are for abstracting a lot of the complexity you learned.\n\nI’ve also heard the DeepLearning.ai courses by Andrew Ng and co are worth it. They are not nearly as comprehensive as the textbook by Goodfellow et.al, but seem to be a useful companion.\n\n4. arXiv and Google Scholar\nIf you’ve made it this far, congratulations, you’re probably in an excellent place to make sense of the latest papers in field. Just go onto Arxiv and Google Scholar and look at both seminal papers and recently papers that are popular. Remember that ML is a fast moving field and the literature changes, so keep checking back in every few months.\n\nIf you’re feeling particularly bold or find something cool, try implementing it yourself. The learning process will be invaluable.\n\n5. Padding your resume and getting hired.\nExcellent work. You’ve probably reached the point by now that you can get hired at most places and/or get into grad school. If you want to fill out your resume, you can continue to implement new architectures, or even do Kaggle Competitions.\n\nIf you want to do the latter, but feel that your actual implementation skills aren’t totally up to par, take Fast.ai courses 1 and 2. They focus on cohesively applying all the shit you’ve learned over the past few months using popular libraries and tooling.\n\nThere are a lot of AI residency programs popping up at OpenAI, Google, Facebook, Uber, and a few other places. You are probably a pretty good candidate, give them a shot.\n\nIf you get this far, holy shit. Well done. The journey is never over, but you’re in an excellent place and you understand ML as well as many experts. I think.\n\nOh and those of you just starting, I’m right there with you. Race you to the end ;)\n\n### Python Practice\n\n1. r subsets from a list of N elements\n2. cartesian product of two sets\n3. produce sample space of two dices given a number N\n\n4. Lcm of two numbers\n5. reverse binary representation of a number\n6. first N prime number\n7. count words in a string\n8. sum of all even numbers and odd numbers\n9. generate a list of sets of 2 element sets given a list of values\n10. print the nth term of a fibonacci series\n11. next prime number after a given number\n12. dictionary of key as letter and values as list of strings, starting from that letter ; given a list of string\n13. input a tuple of int values and gives a dictionay of each int value and its frequency\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevamoghs%2Fml-study-plan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevamoghs%2Fml-study-plan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevamoghs%2Fml-study-plan/lists"}