https://github.com/devamoghs/ml-study-plan

Last synced: 10 months ago
JSON representation
Host: GitHub
URL: https://github.com/devamoghs/ml-study-plan
Owner: devAmoghS
Created: 2019-05-05T10:40:45.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2022-12-25T17:14:06.000Z (about 3 years ago)
Last Synced: 2025-02-08T16:16:49.637Z (11 months ago)
Size: 26.4 KB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # ML-Study-Plan

						

## Week 1: Learn Scala:

						

1. Programming-Scala-Martin-Odersky (coursera)

2. https://www.coursera.org/specializations/scala

						

## Learn Spark Scala:

1. http://spark.apache.org/docs/latest/quick-start.html

2. https://spark.apache.org/docs/latest/sql-programming-guide.html

3. http://spark.apache.org/docs/latest/ml-guide.html

4. Learning-Spark-Lightning-Fast-Data-Analysis

5. Advanced-Analytics-Spark-Sandy-Ryza77

6. Hadoop-Definitive-Guide-Tom-White

						

Do Machine Learning course by Andrew Ng on Coursera :

1. https://www.coursera.org/learn/machine-learning/home

						

## Week 2:

						

Learn one API/Algorithm from Spark Scala ML library and come up with ideas to use it Implement the idea and show the results

						

## Week 3:

Learn one algorithm from Data Mining/Machine Learning and come up with ideas to use it

						

## Week 4:

Read a recently published paper and come up with ideas to use it

					

At your own pace: Open-source Libraries:

						

1. https://github.com/showcases/machine-learning

2. Scikit-learn

3. Shogun

4. Mahout						

5. H2O

6. Oryx

7. TensorFlow

8. Weka

						

ML/Data Mining Competitions:

1. http://www.kdnuggets.com/competitions/

2. https://www.kaggle.com/

3. http://2017.recsyschallenge.com/

4. http://www.image-net.org/challenges/LSVRC/

5. http://www.chalearn.org/challenges.html

						

Research Conferences:

1. http://dblp.uni-trier.de/db/conf/nips/

2. http://dblp.uni-trier.de/db/conf/icml/

3. http://dblp.uni-trier.de/db/conf/sigir/index.html

4. http://dblp.uni-trier.de/db/conf/recsys/index.html

5. http://dblp.uni-trier.de/db/conf/icdm/index.html

6. http://dblp.uni-trier.de/db/conf/kdd/

						

More from Coursera:

1. https://www.coursera.org/specializations/deep-learning

2. https://www.coursera.org/learn/neural-networks/home

3. https://www.coursera.org/learn/data-patterns

4. https://www.coursera.org/learn/cluster-analysis

5. https://www.coursera.org/learn/recommender-systems

6. https://www.coursera.org/specializations/probabilistic-graphical-models

7. https://www.coursera.org/specializations/data-mining

8. https://www.coursera.org/specializations/recommender-systems

						

Books:

1. http://www.deeplearningbook.org/	

2. Fundamentals-Machine-Learning-Predictive-Analytics

3. Pattern-Recognition-Learning-Information-Statistics

4. Elements-Statistical-Learning-Prediction-Statistics

5. Reinforcement-Learning-Introduction-Adaptive-Computation

6. Machine-Learning-Probabilistic-Perspective-Computation

7. Python-Machine-Learning-Sebastian-Raschka

8. Data-Science-Scratch-Principles-Python

9. Applied-Predictive-Modeling-Max-Kuhn

10. Introduction-Statistical-Learning-Applications-Statistics

11. Machine-Learning-Second-Brett-Lantz

12. Data-Mining-Textbook-Charu-Aggarwal

13. Data-Science-Business-Data-Analytic-Thinking

14. Predictive-Analytics-Power-Predict-Click

15. Storytelling-Data-Visualization-Business-Professionals

16. Functional-Programming-SCALA-Manning-Chiusano 

## Data Science Super Harsh Guide

First, read fucking Hastie, Tibshirani, and whoever. Chapters 1–4 and 7–8. If you don’t understand it, keep reading it until you do.

You can read the rest of the book if you want. You probably should, but I’ll assume you know all of it.

Take Andrew Ng’s Coursera. Do all the exercises in python and R. Make sure you get the same answers with all of them.

Now forget all of that and read the deep learning book. Put tensorflow and pytorch on a Linux box and run examples until you get it. 

Do stuff with CNNs and RNNs and just feed forward NNs.

Once you do all of that, go on arXiv and read the most recent useful papers. The literature changes every few months, so keep up.

There. Now you can probably be hired most places. 

If you need resume filler, so some Kaggle competitions. 

If you have debugging questions, use StackOverflow. 

If you have math questions, read more. 

If you have life questions, I have no idea.

## New Grads / Current Students

You have:

=======================================

Bachelors in Computer Science, Statistics, or Math 


Intermediate-advanced proficiency in programming (relative to the other two groups) 


Knowledge of data structures and algorithms i.e. leetcode 


Knowledge of SQL, Spark, Hadoop, AWS 


Projects in machine learning or Kaggle competitions 


Jobs you should target:

=======================================

Data Analyst 


Data Engineer 


Software Engineer in Machine Learning  


Software Engineer in Data Science 


Obstacles you may face:

=======================================

You are looking for that rare, sweet spot in data science for entry level/new grad roles 


Every data science position you find online will have many applicants with a graduate degree so you will need to stand out somehow or get a referral 


Your lack of work experience will usually disqualify you for senior roles at larger companies and early hires at startups 


Avoid companies that do not have proper data infrastructure pipeline set up (they will likely bait and switch you or try to make you into a jack-of-all-trades data person) 


How you should be job hunting:

=======================================

Consider data science roles at non-tech companies in data driven industries like healthcare analytics 


Network with your professors and classmates 


Next to a graduate degree, an internship in data science and machine learning is the next best thing 


Research experience is also really good to have and make great talking points in interviews 


Find out which interview skills are important for you (they vary from normal leetcode, project based, implementing research papers, SQL, ML theory, statistics puzzles depending on the company) 


Get into a Big-N as a regular software engineer and try to transition to their data science teams 


## Ideas

### AWS Rekognition

Made ppt

Try out a simple demo by implementing the Deep Expectation Paper

### Articles about:

Write in both Medium and LinkedIn

1. Locality Sensitive Hashing 


2. Huffman Coding 


4. Basics of NLP 


5. Generative Adversarial Networks 


6. Neural Style Transfer -- Done 


7. Age-Gender Determination using Deep Learning 


8. Tail Recursion in Python / Scala 


### Skills upgrading

In demand tools: R Python SAS Tableau Spark

In demand skills: Hadoop, Spark, Machine Learning, NoSQL Databases, Data Visualisation

Big Data Unicorn -> Hadoop, Spark, Tableau, Mongo and Cassandra

Cities -> Mumbai, Bengaluru, Delhi, Pune, Hyderabad, Kolkata, Chennai

### Thoughts to ponder upon

using k means clustering algorithm for file compression (can achieve 1/6th file size of the original)

how useful is game theory in machine learning

internal working of spark

## Address Matching (Statistics)

Address matching

1st Floor, No 141/142, 22nd Cross, 36th Main, Jayanagar 9th Block, Bannerghatta Road, Bangalore - 560 035	

Components of an address

===========================

door number (optional) + 

"building name" (optional) + 

street number + 

street name (cross and main or single road) + 

area name (multiple; sometimes with block/phase/stage/sector) + 

major street name (optional) + 

city name + 

postcode + 

state name (optional)

Before adding checks for pincode and Flat no.

Running the algorithm with threshold set at: 70 %

True Positive:  395

True Negative:  335

False Positive:  92

False Negative:  32

Accuracy:  0.8548009367681498

Precision:  0.811088295687885

Recall:  0.9250585480093677

F1-score:  0.8643326039387309

=============================================

Modifying the pincode flag -> False Positives reduced, all numbers improved

Running the algorithm with threshold set at: 70 %

True Positive:  389

True Negative:  391

False Positive:  36

False Negative:  38

Accuracy:  0.9133489461358314

Precision:  0.9152941176470588

Recall:  0.9110070257611241

F1-score:  0.9131455399061031

Process finished with exit code 0

==============================================

Adding the flatno flag -> False Positives are reduced, precision grew

Running the algorithm with threshold set at: 70 %

True Positive:  268

True Negative:  424

False Positive:  3

False Negative:  159

Accuracy:  0.810304449648712

Precision:  0.988929889298893

Recall:  0.6276346604215457

F1-score:  0.7679083094555874

Process finished with exit code 0

======================================================

Cleaned the data 

Running the algorithm with threshold set at: 70 %

True Positive:  428

True Negative:  421

False Positive:  37

False Negative:  40

Accuracy:  0.9168466522678186

Precision:  0.9204301075268817

Recall:  0.9145299145299145

F1-score:  0.917470525187567

================================================================

Added the flat num flag along with cleaned data and pincode

Running the algorithm with threshold set at: 70 %

True Positive:  292

True Negative:  455

False Positive:  3

False Negative:  176

Accuracy:  0.806695464362851

Precision:  0.9898305084745763

Recall:  0.6239316239316239

F1-score:  0.7653997378768022

=====================================================================

Running the algorithm with threshold set at: 70 %

True Positive:  294

True Negative:  465

False Positive:  1

False Negative:  168

Accuracy:  0.8178879310344828

Precision:  0.9966101694915255

Recall:  0.6363636363636364

F1-score:  0.7767503302509906

===============================================================

rows in red:  60

captured red:  37

new coverage:  61.666666666666664 %

rows in yellow:  249

captured yellow:  225

new coverage:  90.36144578313252 %

====================================================================

Adding numeric token similarity-> True Postive increased, False negative decreased, f1-score and accuracy increased 

Running the algorithm with threshold set at: 70 %

True Positive:  352

True Negative:  463

False Positive:  3

False Negative:  110

Accuracy:  0.8782327586206896

Precision:  0.9915492957746479

Recall:  0.7619047619047619

F1-score:  0.8616891064871481

======================================================================================

Tuning the sim score to mean sim_score

Running the algorithm with threshold set at: 70 %

True Positive:  398

True Negative:  459

False Positive:  7

False Negative:  64

Accuracy:  0.9234913793103449

Precision:  0.9827160493827161

Recall:  0.8614718614718615

F1-score:  0.9181084198385236

==========================================================================================

Making a model with decision tree = (with ans as a feature)

Accuracy is  94.75

Confusion Matrix is  [[183   9]

 [ 12 196]]

Classification Report is               precision    recall  f1-score   support

          0       0.94      0.95      0.95       192

          1       0.96      0.94      0.95       208

avg / total       0.95      0.95      0.95       400

================================================================================================

Making a model with decision tree = (without ans as a feature)

Accuracy is  94.75

Confusion Matrix is  [[183   9]

 [ 12 196]]

Classification Report is               precision    recall  f1-score   support

          0       0.94      0.95      0.95       192

          1       0.96      0.94      0.95       208

avg / total       0.95      0.95      0.95       400

====================================================================================================

new data set v2

True Positive:  547

True Negative:  779

False Positive:  32

False Negative:  140

Accuracy:  0.8851802403204272

Precision:  0.9447322970639033

Recall:  0.7962154294032023

F1-score:  0.8641390205371248

Process finished with exit code 0

===================================================================================================

new data set v2

Accuracy is  89.33333333333333

Confusion Matrix is  [[302  29]

 [ 35 234]]

Classification Report is               precision    recall  f1-score   support

        0.0       0.90      0.91      0.90       331

        1.0       0.89      0.87      0.88       269

avg / total       0.89      0.89      0.89       600

==========================================================================================================

train = old data set, test = new data set

Accuracy is  90.4539385847797

Confusion Matrix is  [[713  98]

 [ 45 642]]

Classification Report is               precision    recall  f1-score   support

        0.0       0.94      0.88      0.91       811

        1.0       0.87      0.93      0.90       687

avg / total       0.91      0.90      0.90      1498

===========================================================================

After correcting false positives

Accuracy is  92.59012016021362

Confusion Matrix is  [[713  66]

 [ 45 674]]

Classification Report is               precision    recall  f1-score   support

          0       0.94      0.92      0.93       779

          1       0.91      0.94      0.92       719

avg / total       0.93      0.93      0.93      1498

==========================

28th feb

==========================

without correcting false positives and false negatives

Accuracy is  90.4539385847797

Confusion Matrix is  [[713  98]

 [ 45 642]]

Classification Report is               precision    recall  f1-score   support

          0       0.94      0.88      0.91       811

          1       0.87      0.93      0.90       687

avg / total       0.91      0.90      0.90      1498

===============================================

Accuracy is  90.72096128170895

Confusion Matrix is  [[712  99]

 [ 40 647]]

Classification Report is               precision    recall  f1-score   support

          0       0.95      0.88      0.91       811

          1       0.87      0.94      0.90       687

avg / total       0.91      0.91      0.91      1498

================================================

Training Stats

True Positive:  398

True Negative:  450

False Positive:  8

False Negative:  70

=================================================

Accuracy:  0.9157667386609071

Precision:  0.9802955665024631

Recall:  0.8504273504273504

F1-score:  0.9107551487414187

============================================

Accuracy is  90.72096128170895

Confusion Matrix is  [[765  46]

 [ 93 594]]

Classification Report is               precision    recall  f1-score   support

          0       0.89      0.94      0.92       811

          1       0.93      0.86      0.90       687

avg / total       0.91      0.91      0.91      1498

=====================================================

Test stats

True Positive:  551

True Negative:  779

False Positive:  32

False Negative:  136

Accuracy:  0.8878504672897196

Precision:  0.9451114922813036

Recall:  0.8020378457059679

F1-score:  0.8677165354330709

=========================================================

added soundex score to the features

Accuracy is  90.72096128170895

Confusion Matrix is  [[765  46]

 [ 93 594]]

Classification Report is               precision    recall  f1-score   support

          0       0.89      0.94      0.92       811

          1       0.93      0.86      0.90       687

avg / total       0.91      0.91      0.91      1498

ratio()

Return a measure of the sequences’ similarity as a float in the range [0, 1].

Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common.

===========================================================

neural network results

Test score: 0.20490155638617094

Test accuracy: 0.9272363152459403

===============================================================

changed all records to 'fail' where pincode is not matching

Test score: 0.18380965855474943

Test accuracy: 0.9319092119647282

=====================================================

Test score: 0.192987822881528

Test accuracy: 0.9359145524186668

Prob |  No os obs. | Prc 1 | Prc 0 | TPR | FPR|

======================================

Adding validation set 

epochs=60, batch_size=10

Test score: 0.13257216911516592

Test accuracy: 0.9498998004353357

========================

HyperParamters Optimization

========================

Best: 0.946742 using {'batch_size': 40, 'epochs': 100}

======================================

Setting batch size 40

Test score: 0.13517679035185812

Test accuracy: 0.9519038084513678

=====================================

Setting epochs 100

Test score: 0.15414797805832

Test accuracy: 0.957915832499464

Best: 0.944862 using {'optimizer': 'Adam'}

0.831454 (0.006203) with: {'optimizer': 'SGD'}

0.942356 (0.004430) with: {'optimizer': 'RMSprop'}

0.937970 (0.007674) with: {'optimizer': 'Adagrad'}

0.944236 (0.003195) with: {'optimizer': 'Adadelta'}

0.944862 (0.008993) with: {'optimizer': 'Adam'}

0.943609 (0.001535) with: {'optimizer': 'Adamax'}

0.809524 (0.192290) with: {'optimizer': 'Nadam'}

=====================================

Adding lecun unifrom

Test score: 0.1970766116538363

Test accuracy: 0.9519038084513678

### Songs

Chris Isaac Wiked Game Chillon Mix

Hans Zimmer - Time (Pen Perry Remix)

Still Corners - The Trip

Hey Baby feat. Debs Daughter

Worakls - Coeur de la Nuit Unofficial Video

I'm Shipping Up To Boston - Dropkick Murphys

### Things to Learn

Things To Learn

===========================

Basics of R (completing datacamp tutorial)

Algorithms

=============================

Regression algorithms (stanford , completed)

SVM algorithms

Data projection algorithms

Deep learning algorithms

Time series forecasting algorithms

Rating system algorithms

Recommender system algorithms

Feature selection algorithms

Class imbalance algorithms

Decision tree algorithms

Deep Learning

=============

Recurrent attention

Sequence masking

Additional Tools

==============

AWS

numpy

pandas (decent grip)

SQL

Libraries

================

Scikit Learn (decent grip)

PyTorch

TensorFlow

Linear Algebra  (MIT OCW) 1/50

===============================

Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Eigendecomposition of a matrix, LU Decomposition, QR Decomposition/Factorization, Symmetric Matrices, Orthogonalization & Orthonormalization, Matrix Operations, Projections, Eigenvalues & Eigenvectors, Vector Spaces and Norms

Matrix Algebra (The Matrix Cookbook) 

Probability Theory

Statistics (Think Stats)

================================

Combinatorics, Probability Rules & Axioms, Bayes’ Theorem, Random Variables, Variance and Expectation, Conditional and Joint Distributions, Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian), Moment Generating Functions, Maximum Likelihood Estimation (MLE), Prior and Posterior, Maximum a Posteriori Estimation (MAP) and Sampling Methods

Single Variable Calculus

MultiVariate Calculus

================================

Integral Calculus, Partial Derivatives, Vector-Values Functions, Directional Gradient, Hessian, Jacobian, Laplacian and Lagragian Distribution

Optimiztion Theory

Excellent understanding of machine learning techniques and algorithms, such as 

k-NN, 

Naive Bayes, 

SVM, 

Random Forest, 

Decision Forests, 

Logistic Regression, 

Neural Networks, 

Recommenders, 

K-means, 

Boosted machines, 

Ensemble Learning, 

Clustering, 

Classification

Understanding of deep neural networks - 

Autoencoders, 

CNN, 

RNN, 

GAN, 

Boltzmann Machine

 

Good understanding of Statistical modelling like 

linear, logistic regression, 

classification, 

hypothesis testing, 

ANOVA, 

PCA, 

SVD

===========================

Core Companies

===========================

AMD

MathWorks

NVIDIA 

Visa

MasterCard

Amazon

Flipkart

Adobe

ZS Associates

IBM

Symantec

ThoughtWorks

VMware

EMC

CISCO

### How to use Cracking the Coding Interview to pass data science code challenges...

Start with chapter 7 and then work through chapters 1, 2, 3, 4, 10 in order, writing Python code (ideally OO) to solve 50% of the problems. It's a bit repetitive to solve every question in the book, so just complete all the odd or even problems at first (you can complete the rest of the problems later if you need extra practice).

👉 If you’re interested in more software-oriented roles, e.g. machine learning engineer, then do the problems from chapters 6 and 8 as well, otherwise those are optional.

Chapters 6 and 8 sometimes also help for tech and finance companies and chapter 6 can also be very relevant for analytics roles at companies that like to ask brain teasers.

Side Projects 

=================

[COMPLETED] = Latent Dirichlet Allocation on NewsGroup Dataset using Scikit Learn

[COMPLETED] = Deep Learning Network on MNIST dataset using Keras

[COMPLETED] = Movie Recommendation on MovieLens dataset using Scala, Spark MLlib and Alternating Least Sqaures

[COMPLETED] = Predicting Breast Cancer on Wisconsin Breast Cancer dataset using Scala, Spark MLlib and Random Forests

[COMPLETED] = Spam Filtering Engine using Naive Bayes Classifier on Spam Assassin Public Corpus

[COMPLETED] = Spelling Corrector

[IN PROGRESS] = Sentdex Regression

[COMPLETED] = License Plate Recognition

[COMPLETED] = Bank Customer Churn using Keras and Scikit-Learn

[IN PROGRESS] = Implementing Random Forest Algorithm from scratch on Sonar Dataset

https://medium.freecodecamp.org/the-hitchhikers-guide-to-machine-learning-algorithms-in-python-bfad66adb378

[In Progress] = Linear Regression

[In Progress] = Logistic Regression

[In Progress] = Decision Trees

[In Progress] = Support Vector Machines

[In Progress] = K-Nearest Neighbors

[In Progress] = Random Forests

[In Progress] = K-Means Clustering

[In Progress] = Principal Components Analysis

Corporate Wisdom

==========================================================

 First of all, looking at it from the employer’s perspective, 

asking for more money immediately makes you self-centered and selfish. 

Putting the focus on the long term gets you away from this. 

Pick some point in the future that we both agree on, and what you want then. 

If you immediately pivot to exactly where you want to be in the strategic future of the company, then the person you’re talking to says, 

“This guy is going to make my future better,” which then puts you in a position to ask for more, because you’re automatically more valuable.

Achievers use a success list, not a to do list. They have a strong sense of priority.

If you want extraordinary results, you need to narrow your focus.

Do your most important work, your “one thing”, when your willpower is strongest. For many people, that’s early in the day.

Ask the question: what’s the one thing I can do that will make everything else easier?

I understand a business problem,

the decisions before the stakeholder, 

the multiple ways to frame the problem, and 

the trade-offs between each. 

I understand how data can and cannot help, 

the variety of techniques I can use given a chosen approach. 

I can defend why I chose one particular approach, 

how the resulting model works, 

its limitations as applied to a problem, 

and problem appropriate metrics. 

I translate them into recommendations the business can digest, 

and persuade towards a value creating outcome.

============================================================

The upper tier performers dont focus on the outcome, they focus on the process

The truly successful people attract success, not chase it

Every skill you acquire doubles your chances of success

In 1925, one year before he entered school, Isaac Asimov taught himself to read. Uneducated and thus unable to support his son, his father gave him a library card. Without any direction, the curious boy read everything.

Don’t boast. Those who know more will see you for the fool you really are. Those who know as much as you will resent you for boasting about things they already know. Those who know less will kiss your ass and be yes men until they know more than you

Movies

=========

Dog Day Afternoon - Al Pacino

Inside Man - Denzel Washington

Catch Me If You Can - Leonardo D'Caprio

China Town - Jack Nicholson

The Postman Always Rings Twice - Jack Nicholson

## The Blunt Guide to Mathematically Rigorous Machine Learning

I won’t be going through the math portions again, you can check out my other article or this excellent post by YC on the topic. My advice, learn enough Linear Algebra, Stats, Probability, and Multivariate Calculus to feel good about yourself, and learn everything else as you have to.

1. Elements of Statistical Learning

Prioritize Chapters 1–4 and Chapters 7–8. This covers supervised learning, linear regression, classification, Model Assessment and Inference. Its okay if you don’t understand it at first, absolutely nobody does. Keep reading it and learning whatever math you need to until you get it. If you want, knock the whole book out, you won’t regret it.

If Elements is really just too hard, you can start with Introduction to Statistical Learning, by the same authors. The book sacrifices some mathematical explanation and focuses on a subset of the problems in Elements, but is a good ramping up point to understanding the material. There is an excellent accompanying course provided by Stanford for free.

Both books focus on R, which is worth learning.

2. Stanford CS 229

Once you’ve finished Elements, you’re in a great position to take Stanford’s ML course, taught by Andrew Ng. You can think about this like the mathematically rigorous version of his popular Coursera course. Going into this course, make sure to refresh your Multivariate Calculus and Linear Algebra skills, as well as some probability. They provide some handy refresher guides on the site page.

Do all the exercises and problem sets, and try doing the programming assignments in both R and Python. You’ll thank me later.

You can again opt to go for a slightly easier route in Andrew Ng’s Coursera course, which is focused more on implementation and less on underlying theory and the math. I would really just do all the programming assignments from there as well. You don’t have to do them in Octave/Matlab, you can do R and Python versions. There are plenty of repos to compare to on Github.

3. Deep Learning Book

At this point, you’re starting to get formidable. You have a fundamental mathematical understanding of many popular, historic techniques in Machine Learning, and can choose to dive into any vertical you want. Of course, most people want to go into Deep Learning because of its significance in industry.

Go through the DL book. It will refresh you on a lot of math and also fundamentally explain much of modern Deep Learning well. You can start messing around with implementations by spinning up a Linux box and doing cool shit with CNNs, RNNs and regular old feed forward neural networks. Use Tensorflow and Pytorch, and start to get a sense of how awesome some of these libraries are for abstracting a lot of the complexity you learned.

I’ve also heard the DeepLearning.ai courses by Andrew Ng and co are worth it. They are not nearly as comprehensive as the textbook by Goodfellow et.al, but seem to be a useful companion.

4. arXiv and Google Scholar

If you’ve made it this far, congratulations, you’re probably in an excellent place to make sense of the latest papers in field. Just go onto Arxiv and Google Scholar and look at both seminal papers and recently papers that are popular. Remember that ML is a fast moving field and the literature changes, so keep checking back in every few months.

If you’re feeling particularly bold or find something cool, try implementing it yourself. The learning process will be invaluable.

5. Padding your resume and getting hired.

Excellent work. You’ve probably reached the point by now that you can get hired at most places and/or get into grad school. If you want to fill out your resume, you can continue to implement new architectures, or even do Kaggle Competitions.

If you want to do the latter, but feel that your actual implementation skills aren’t totally up to par, take Fast.ai courses 1 and 2. They focus on cohesively applying all the shit you’ve learned over the past few months using popular libraries and tooling.

There are a lot of AI residency programs popping up at OpenAI, Google, Facebook, Uber, and a few other places. You are probably a pretty good candidate, give them a shot.

If you get this far, holy shit. Well done. The journey is never over, but you’re in an excellent place and you understand ML as well as many experts. I think.

Oh and those of you just starting, I’m right there with you. Race you to the end ;)

### Python Practice

1. r subsets from a list of N elements

2. cartesian product of two sets

3. produce sample space of two dices given a number N

4. Lcm of two numbers

5. reverse binary representation of a number

6. first N prime number

7. count words in a string

8. sum of all even numbers and odd numbers

9. generate a list of sets of 2 element sets given a list of values

10. print the nth term of a fibonacci series

11. next prime number after a given number

12. dictionary of key as letter and values as list of strings, starting from that letter ; given a list of string

13. input a tuple of int values and gives a dictionay of each int value and its frequency
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/devamoghs/ml-study-plan

Awesome Lists containing this project

README