https://github.com/devamoghs/ml-study-plan
https://github.com/devamoghs/ml-study-plan
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/devamoghs/ml-study-plan
- Owner: devAmoghS
- Created: 2019-05-05T10:40:45.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-25T17:14:06.000Z (about 3 years ago)
- Last Synced: 2025-02-08T16:16:49.637Z (11 months ago)
- Size: 26.4 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ML-Study-Plan
## Week 1: Learn Scala:
1. Programming-Scala-Martin-Odersky (coursera)
2. https://www.coursera.org/specializations/scala
## Learn Spark Scala:
1. http://spark.apache.org/docs/latest/quick-start.html
2. https://spark.apache.org/docs/latest/sql-programming-guide.html
3. http://spark.apache.org/docs/latest/ml-guide.html
4. Learning-Spark-Lightning-Fast-Data-Analysis
5. Advanced-Analytics-Spark-Sandy-Ryza77
6. Hadoop-Definitive-Guide-Tom-White
Do Machine Learning course by Andrew Ng on Coursera :
1. https://www.coursera.org/learn/machine-learning/home
## Week 2:
Learn one API/Algorithm from Spark Scala ML library and come up with ideas to use it Implement the idea and show the results
## Week 3:
Learn one algorithm from Data Mining/Machine Learning and come up with ideas to use it
## Week 4:
Read a recently published paper and come up with ideas to use it
At your own pace: Open-source Libraries:
1. https://github.com/showcases/machine-learning
2. Scikit-learn
3. Shogun
4. Mahout
5. H2O
6. Oryx
7. TensorFlow
8. Weka
ML/Data Mining Competitions:
1. http://www.kdnuggets.com/competitions/
2. https://www.kaggle.com/
3. http://2017.recsyschallenge.com/
4. http://www.image-net.org/challenges/LSVRC/
5. http://www.chalearn.org/challenges.html
Research Conferences:
1. http://dblp.uni-trier.de/db/conf/nips/
2. http://dblp.uni-trier.de/db/conf/icml/
3. http://dblp.uni-trier.de/db/conf/sigir/index.html
4. http://dblp.uni-trier.de/db/conf/recsys/index.html
5. http://dblp.uni-trier.de/db/conf/icdm/index.html
6. http://dblp.uni-trier.de/db/conf/kdd/
More from Coursera:
1. https://www.coursera.org/specializations/deep-learning
2. https://www.coursera.org/learn/neural-networks/home
3. https://www.coursera.org/learn/data-patterns
4. https://www.coursera.org/learn/cluster-analysis
5. https://www.coursera.org/learn/recommender-systems
6. https://www.coursera.org/specializations/probabilistic-graphical-models
7. https://www.coursera.org/specializations/data-mining
8. https://www.coursera.org/specializations/recommender-systems
Books:
1. http://www.deeplearningbook.org/
2. Fundamentals-Machine-Learning-Predictive-Analytics
3. Pattern-Recognition-Learning-Information-Statistics
4. Elements-Statistical-Learning-Prediction-Statistics
5. Reinforcement-Learning-Introduction-Adaptive-Computation
6. Machine-Learning-Probabilistic-Perspective-Computation
7. Python-Machine-Learning-Sebastian-Raschka
8. Data-Science-Scratch-Principles-Python
9. Applied-Predictive-Modeling-Max-Kuhn
10. Introduction-Statistical-Learning-Applications-Statistics
11. Machine-Learning-Second-Brett-Lantz
12. Data-Mining-Textbook-Charu-Aggarwal
13. Data-Science-Business-Data-Analytic-Thinking
14. Predictive-Analytics-Power-Predict-Click
15. Storytelling-Data-Visualization-Business-Professionals
16. Functional-Programming-SCALA-Manning-Chiusano
## Data Science Super Harsh Guide
First, read fucking Hastie, Tibshirani, and whoever. Chapters 1–4 and 7–8. If you don’t understand it, keep reading it until you do.
You can read the rest of the book if you want. You probably should, but I’ll assume you know all of it.
Take Andrew Ng’s Coursera. Do all the exercises in python and R. Make sure you get the same answers with all of them.
Now forget all of that and read the deep learning book. Put tensorflow and pytorch on a Linux box and run examples until you get it.
Do stuff with CNNs and RNNs and just feed forward NNs.
Once you do all of that, go on arXiv and read the most recent useful papers. The literature changes every few months, so keep up.
There. Now you can probably be hired most places.
If you need resume filler, so some Kaggle competitions.
If you have debugging questions, use StackOverflow.
If you have math questions, read more.
If you have life questions, I have no idea.
## New Grads / Current Students
You have:
=======================================
Bachelors in Computer Science, Statistics, or Math
Intermediate-advanced proficiency in programming (relative to the other two groups)
Knowledge of data structures and algorithms i.e. leetcode
Knowledge of SQL, Spark, Hadoop, AWS
Projects in machine learning or Kaggle competitions
Jobs you should target:
=======================================
Data Analyst
Data Engineer
Software Engineer in Machine Learning
Software Engineer in Data Science
Obstacles you may face:
=======================================
You are looking for that rare, sweet spot in data science for entry level/new grad roles
Every data science position you find online will have many applicants with a graduate degree so you will need to stand out somehow or get a referral
Your lack of work experience will usually disqualify you for senior roles at larger companies and early hires at startups
Avoid companies that do not have proper data infrastructure pipeline set up (they will likely bait and switch you or try to make you into a jack-of-all-trades data person)
How you should be job hunting:
=======================================
Consider data science roles at non-tech companies in data driven industries like healthcare analytics
Network with your professors and classmates
Next to a graduate degree, an internship in data science and machine learning is the next best thing
Research experience is also really good to have and make great talking points in interviews
Find out which interview skills are important for you (they vary from normal leetcode, project based, implementing research papers, SQL, ML theory, statistics puzzles depending on the company)
Get into a Big-N as a regular software engineer and try to transition to their data science teams
## Ideas
### AWS Rekognition
Made ppt
Try out a simple demo by implementing the Deep Expectation Paper
### Articles about:
Write in both Medium and LinkedIn
1. Locality Sensitive Hashing
2. Huffman Coding
4. Basics of NLP
5. Generative Adversarial Networks
6. Neural Style Transfer -- Done
7. Age-Gender Determination using Deep Learning
8. Tail Recursion in Python / Scala
### Skills upgrading
In demand tools: R Python SAS Tableau Spark
In demand skills: Hadoop, Spark, Machine Learning, NoSQL Databases, Data Visualisation
Big Data Unicorn -> Hadoop, Spark, Tableau, Mongo and Cassandra
Cities -> Mumbai, Bengaluru, Delhi, Pune, Hyderabad, Kolkata, Chennai
### Thoughts to ponder upon
using k means clustering algorithm for file compression (can achieve 1/6th file size of the original)
how useful is game theory in machine learning
internal working of spark
## Address Matching (Statistics)
Address matching
1st Floor, No 141/142, 22nd Cross, 36th Main, Jayanagar 9th Block, Bannerghatta Road, Bangalore - 560 035
Components of an address
===========================
door number (optional) +
"building name" (optional) +
street number +
street name (cross and main or single road) +
area name (multiple; sometimes with block/phase/stage/sector) +
major street name (optional) +
city name +
postcode +
state name (optional)
Before adding checks for pincode and Flat no.
Running the algorithm with threshold set at: 70 %
True Positive: 395
True Negative: 335
False Positive: 92
False Negative: 32
Accuracy: 0.8548009367681498
Precision: 0.811088295687885
Recall: 0.9250585480093677
F1-score: 0.8643326039387309
=============================================
Modifying the pincode flag -> False Positives reduced, all numbers improved
Running the algorithm with threshold set at: 70 %
True Positive: 389
True Negative: 391
False Positive: 36
False Negative: 38
Accuracy: 0.9133489461358314
Precision: 0.9152941176470588
Recall: 0.9110070257611241
F1-score: 0.9131455399061031
Process finished with exit code 0
==============================================
Adding the flatno flag -> False Positives are reduced, precision grew
Running the algorithm with threshold set at: 70 %
True Positive: 268
True Negative: 424
False Positive: 3
False Negative: 159
Accuracy: 0.810304449648712
Precision: 0.988929889298893
Recall: 0.6276346604215457
F1-score: 0.7679083094555874
Process finished with exit code 0
======================================================
Cleaned the data
Running the algorithm with threshold set at: 70 %
True Positive: 428
True Negative: 421
False Positive: 37
False Negative: 40
Accuracy: 0.9168466522678186
Precision: 0.9204301075268817
Recall: 0.9145299145299145
F1-score: 0.917470525187567
================================================================
Added the flat num flag along with cleaned data and pincode
Running the algorithm with threshold set at: 70 %
True Positive: 292
True Negative: 455
False Positive: 3
False Negative: 176
Accuracy: 0.806695464362851
Precision: 0.9898305084745763
Recall: 0.6239316239316239
F1-score: 0.7653997378768022
=====================================================================
Running the algorithm with threshold set at: 70 %
True Positive: 294
True Negative: 465
False Positive: 1
False Negative: 168
Accuracy: 0.8178879310344828
Precision: 0.9966101694915255
Recall: 0.6363636363636364
F1-score: 0.7767503302509906
===============================================================
rows in red: 60
captured red: 37
new coverage: 61.666666666666664 %
rows in yellow: 249
captured yellow: 225
new coverage: 90.36144578313252 %
====================================================================
Adding numeric token similarity-> True Postive increased, False negative decreased, f1-score and accuracy increased
Running the algorithm with threshold set at: 70 %
True Positive: 352
True Negative: 463
False Positive: 3
False Negative: 110
Accuracy: 0.8782327586206896
Precision: 0.9915492957746479
Recall: 0.7619047619047619
F1-score: 0.8616891064871481
======================================================================================
Tuning the sim score to mean sim_score
Running the algorithm with threshold set at: 70 %
True Positive: 398
True Negative: 459
False Positive: 7
False Negative: 64
Accuracy: 0.9234913793103449
Precision: 0.9827160493827161
Recall: 0.8614718614718615
F1-score: 0.9181084198385236
==========================================================================================
Making a model with decision tree = (with ans as a feature)
Accuracy is 94.75
Confusion Matrix is [[183 9]
[ 12 196]]
Classification Report is precision recall f1-score support
0 0.94 0.95 0.95 192
1 0.96 0.94 0.95 208
avg / total 0.95 0.95 0.95 400
================================================================================================
Making a model with decision tree = (without ans as a feature)
Accuracy is 94.75
Confusion Matrix is [[183 9]
[ 12 196]]
Classification Report is precision recall f1-score support
0 0.94 0.95 0.95 192
1 0.96 0.94 0.95 208
avg / total 0.95 0.95 0.95 400
====================================================================================================
new data set v2
True Positive: 547
True Negative: 779
False Positive: 32
False Negative: 140
Accuracy: 0.8851802403204272
Precision: 0.9447322970639033
Recall: 0.7962154294032023
F1-score: 0.8641390205371248
Process finished with exit code 0
===================================================================================================
new data set v2
Accuracy is 89.33333333333333
Confusion Matrix is [[302 29]
[ 35 234]]
Classification Report is precision recall f1-score support
0.0 0.90 0.91 0.90 331
1.0 0.89 0.87 0.88 269
avg / total 0.89 0.89 0.89 600
==========================================================================================================
train = old data set, test = new data set
Accuracy is 90.4539385847797
Confusion Matrix is [[713 98]
[ 45 642]]
Classification Report is precision recall f1-score support
0.0 0.94 0.88 0.91 811
1.0 0.87 0.93 0.90 687
avg / total 0.91 0.90 0.90 1498
===========================================================================
After correcting false positives
Accuracy is 92.59012016021362
Confusion Matrix is [[713 66]
[ 45 674]]
Classification Report is precision recall f1-score support
0 0.94 0.92 0.93 779
1 0.91 0.94 0.92 719
avg / total 0.93 0.93 0.93 1498
==========================
28th feb
==========================
without correcting false positives and false negatives
Accuracy is 90.4539385847797
Confusion Matrix is [[713 98]
[ 45 642]]
Classification Report is precision recall f1-score support
0 0.94 0.88 0.91 811
1 0.87 0.93 0.90 687
avg / total 0.91 0.90 0.90 1498
===============================================
Accuracy is 90.72096128170895
Confusion Matrix is [[712 99]
[ 40 647]]
Classification Report is precision recall f1-score support
0 0.95 0.88 0.91 811
1 0.87 0.94 0.90 687
avg / total 0.91 0.91 0.91 1498
================================================
Training Stats
True Positive: 398
True Negative: 450
False Positive: 8
False Negative: 70
=================================================
Accuracy: 0.9157667386609071
Precision: 0.9802955665024631
Recall: 0.8504273504273504
F1-score: 0.9107551487414187
============================================
Accuracy is 90.72096128170895
Confusion Matrix is [[765 46]
[ 93 594]]
Classification Report is precision recall f1-score support
0 0.89 0.94 0.92 811
1 0.93 0.86 0.90 687
avg / total 0.91 0.91 0.91 1498
=====================================================
Test stats
True Positive: 551
True Negative: 779
False Positive: 32
False Negative: 136
Accuracy: 0.8878504672897196
Precision: 0.9451114922813036
Recall: 0.8020378457059679
F1-score: 0.8677165354330709
=========================================================
added soundex score to the features
Accuracy is 90.72096128170895
Confusion Matrix is [[765 46]
[ 93 594]]
Classification Report is precision recall f1-score support
0 0.89 0.94 0.92 811
1 0.93 0.86 0.90 687
avg / total 0.91 0.91 0.91 1498
ratio()
Return a measure of the sequences’ similarity as a float in the range [0, 1].
Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common.
===========================================================
neural network results
Test score: 0.20490155638617094
Test accuracy: 0.9272363152459403
===============================================================
changed all records to 'fail' where pincode is not matching
Test score: 0.18380965855474943
Test accuracy: 0.9319092119647282
=====================================================
Test score: 0.192987822881528
Test accuracy: 0.9359145524186668
Prob | No os obs. | Prc 1 | Prc 0 | TPR | FPR|
======================================
Adding validation set
epochs=60, batch_size=10
Test score: 0.13257216911516592
Test accuracy: 0.9498998004353357
========================
HyperParamters Optimization
========================
Best: 0.946742 using {'batch_size': 40, 'epochs': 100}
======================================
Setting batch size 40
Test score: 0.13517679035185812
Test accuracy: 0.9519038084513678
=====================================
Setting epochs 100
Test score: 0.15414797805832
Test accuracy: 0.957915832499464
Best: 0.944862 using {'optimizer': 'Adam'}
0.831454 (0.006203) with: {'optimizer': 'SGD'}
0.942356 (0.004430) with: {'optimizer': 'RMSprop'}
0.937970 (0.007674) with: {'optimizer': 'Adagrad'}
0.944236 (0.003195) with: {'optimizer': 'Adadelta'}
0.944862 (0.008993) with: {'optimizer': 'Adam'}
0.943609 (0.001535) with: {'optimizer': 'Adamax'}
0.809524 (0.192290) with: {'optimizer': 'Nadam'}
=====================================
Adding lecun unifrom
Test score: 0.1970766116538363
Test accuracy: 0.9519038084513678
### Songs
Chris Isaac Wiked Game Chillon Mix
Hans Zimmer - Time (Pen Perry Remix)
Still Corners - The Trip
Hey Baby feat. Debs Daughter
Worakls - Coeur de la Nuit Unofficial Video
I'm Shipping Up To Boston - Dropkick Murphys
### Things to Learn
Things To Learn
===========================
Basics of R (completing datacamp tutorial)
Algorithms
=============================
Regression algorithms (stanford , completed)
SVM algorithms
Data projection algorithms
Deep learning algorithms
Time series forecasting algorithms
Rating system algorithms
Recommender system algorithms
Feature selection algorithms
Class imbalance algorithms
Decision tree algorithms
Deep Learning
=============
Recurrent attention
Sequence masking
Additional Tools
==============
AWS
numpy
pandas (decent grip)
SQL
Libraries
================
Scikit Learn (decent grip)
PyTorch
TensorFlow
Linear Algebra (MIT OCW) 1/50
===============================
Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Eigendecomposition of a matrix, LU Decomposition, QR Decomposition/Factorization, Symmetric Matrices, Orthogonalization & Orthonormalization, Matrix Operations, Projections, Eigenvalues & Eigenvectors, Vector Spaces and Norms
Matrix Algebra (The Matrix Cookbook)
Probability Theory
Statistics (Think Stats)
================================
Combinatorics, Probability Rules & Axioms, Bayes’ Theorem, Random Variables, Variance and Expectation, Conditional and Joint Distributions, Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian), Moment Generating Functions, Maximum Likelihood Estimation (MLE), Prior and Posterior, Maximum a Posteriori Estimation (MAP) and Sampling Methods
Single Variable Calculus
MultiVariate Calculus
================================
Integral Calculus, Partial Derivatives, Vector-Values Functions, Directional Gradient, Hessian, Jacobian, Laplacian and Lagragian Distribution
Optimiztion Theory
Excellent understanding of machine learning techniques and algorithms, such as
k-NN,
Naive Bayes,
SVM,
Random Forest,
Decision Forests,
Logistic Regression,
Neural Networks,
Recommenders,
K-means,
Boosted machines,
Ensemble Learning,
Clustering,
Classification
Understanding of deep neural networks -
Autoencoders,
CNN,
RNN,
GAN,
Boltzmann Machine
Good understanding of Statistical modelling like
linear, logistic regression,
classification,
hypothesis testing,
ANOVA,
PCA,
SVD
===========================
Core Companies
===========================
AMD
MathWorks
NVIDIA
Visa
MasterCard
Amazon
Flipkart
Adobe
ZS Associates
IBM
Symantec
ThoughtWorks
VMware
EMC
CISCO
### How to use Cracking the Coding Interview to pass data science code challenges...
Start with chapter 7 and then work through chapters 1, 2, 3, 4, 10 in order, writing Python code (ideally OO) to solve 50% of the problems. It's a bit repetitive to solve every question in the book, so just complete all the odd or even problems at first (you can complete the rest of the problems later if you need extra practice).
👉 If you’re interested in more software-oriented roles, e.g. machine learning engineer, then do the problems from chapters 6 and 8 as well, otherwise those are optional.
Chapters 6 and 8 sometimes also help for tech and finance companies and chapter 6 can also be very relevant for analytics roles at companies that like to ask brain teasers.
Side Projects
=================
[COMPLETED] = Latent Dirichlet Allocation on NewsGroup Dataset using Scikit Learn
[COMPLETED] = Deep Learning Network on MNIST dataset using Keras
[COMPLETED] = Movie Recommendation on MovieLens dataset using Scala, Spark MLlib and Alternating Least Sqaures
[COMPLETED] = Predicting Breast Cancer on Wisconsin Breast Cancer dataset using Scala, Spark MLlib and Random Forests
[COMPLETED] = Spam Filtering Engine using Naive Bayes Classifier on Spam Assassin Public Corpus
[COMPLETED] = Spelling Corrector
[IN PROGRESS] = Sentdex Regression
[COMPLETED] = License Plate Recognition
[COMPLETED] = Bank Customer Churn using Keras and Scikit-Learn
[IN PROGRESS] = Implementing Random Forest Algorithm from scratch on Sonar Dataset
https://medium.freecodecamp.org/the-hitchhikers-guide-to-machine-learning-algorithms-in-python-bfad66adb378
[In Progress] = Linear Regression
[In Progress] = Logistic Regression
[In Progress] = Decision Trees
[In Progress] = Support Vector Machines
[In Progress] = K-Nearest Neighbors
[In Progress] = Random Forests
[In Progress] = K-Means Clustering
[In Progress] = Principal Components Analysis
Corporate Wisdom
==========================================================
First of all, looking at it from the employer’s perspective,
asking for more money immediately makes you self-centered and selfish.
Putting the focus on the long term gets you away from this.
Pick some point in the future that we both agree on, and what you want then.
If you immediately pivot to exactly where you want to be in the strategic future of the company, then the person you’re talking to says,
“This guy is going to make my future better,” which then puts you in a position to ask for more, because you’re automatically more valuable.
Achievers use a success list, not a to do list. They have a strong sense of priority.
If you want extraordinary results, you need to narrow your focus.
Do your most important work, your “one thing”, when your willpower is strongest. For many people, that’s early in the day.
Ask the question: what’s the one thing I can do that will make everything else easier?
I understand a business problem,
the decisions before the stakeholder,
the multiple ways to frame the problem, and
the trade-offs between each.
I understand how data can and cannot help,
the variety of techniques I can use given a chosen approach.
I can defend why I chose one particular approach,
how the resulting model works,
its limitations as applied to a problem,
and problem appropriate metrics.
I translate them into recommendations the business can digest,
and persuade towards a value creating outcome.
============================================================
The upper tier performers dont focus on the outcome, they focus on the process
The truly successful people attract success, not chase it
Every skill you acquire doubles your chances of success
In 1925, one year before he entered school, Isaac Asimov taught himself to read. Uneducated and thus unable to support his son, his father gave him a library card. Without any direction, the curious boy read everything.
Don’t boast. Those who know more will see you for the fool you really are. Those who know as much as you will resent you for boasting about things they already know. Those who know less will kiss your ass and be yes men until they know more than you
Movies
=========
Dog Day Afternoon - Al Pacino
Inside Man - Denzel Washington
Catch Me If You Can - Leonardo D'Caprio
China Town - Jack Nicholson
The Postman Always Rings Twice - Jack Nicholson
## The Blunt Guide to Mathematically Rigorous Machine Learning
I won’t be going through the math portions again, you can check out my other article or this excellent post by YC on the topic. My advice, learn enough Linear Algebra, Stats, Probability, and Multivariate Calculus to feel good about yourself, and learn everything else as you have to.
1. Elements of Statistical Learning
Prioritize Chapters 1–4 and Chapters 7–8. This covers supervised learning, linear regression, classification, Model Assessment and Inference. Its okay if you don’t understand it at first, absolutely nobody does. Keep reading it and learning whatever math you need to until you get it. If you want, knock the whole book out, you won’t regret it.
If Elements is really just too hard, you can start with Introduction to Statistical Learning, by the same authors. The book sacrifices some mathematical explanation and focuses on a subset of the problems in Elements, but is a good ramping up point to understanding the material. There is an excellent accompanying course provided by Stanford for free.
Both books focus on R, which is worth learning.
2. Stanford CS 229
Once you’ve finished Elements, you’re in a great position to take Stanford’s ML course, taught by Andrew Ng. You can think about this like the mathematically rigorous version of his popular Coursera course. Going into this course, make sure to refresh your Multivariate Calculus and Linear Algebra skills, as well as some probability. They provide some handy refresher guides on the site page.
Do all the exercises and problem sets, and try doing the programming assignments in both R and Python. You’ll thank me later.
You can again opt to go for a slightly easier route in Andrew Ng’s Coursera course, which is focused more on implementation and less on underlying theory and the math. I would really just do all the programming assignments from there as well. You don’t have to do them in Octave/Matlab, you can do R and Python versions. There are plenty of repos to compare to on Github.
3. Deep Learning Book
At this point, you’re starting to get formidable. You have a fundamental mathematical understanding of many popular, historic techniques in Machine Learning, and can choose to dive into any vertical you want. Of course, most people want to go into Deep Learning because of its significance in industry.
Go through the DL book. It will refresh you on a lot of math and also fundamentally explain much of modern Deep Learning well. You can start messing around with implementations by spinning up a Linux box and doing cool shit with CNNs, RNNs and regular old feed forward neural networks. Use Tensorflow and Pytorch, and start to get a sense of how awesome some of these libraries are for abstracting a lot of the complexity you learned.
I’ve also heard the DeepLearning.ai courses by Andrew Ng and co are worth it. They are not nearly as comprehensive as the textbook by Goodfellow et.al, but seem to be a useful companion.
4. arXiv and Google Scholar
If you’ve made it this far, congratulations, you’re probably in an excellent place to make sense of the latest papers in field. Just go onto Arxiv and Google Scholar and look at both seminal papers and recently papers that are popular. Remember that ML is a fast moving field and the literature changes, so keep checking back in every few months.
If you’re feeling particularly bold or find something cool, try implementing it yourself. The learning process will be invaluable.
5. Padding your resume and getting hired.
Excellent work. You’ve probably reached the point by now that you can get hired at most places and/or get into grad school. If you want to fill out your resume, you can continue to implement new architectures, or even do Kaggle Competitions.
If you want to do the latter, but feel that your actual implementation skills aren’t totally up to par, take Fast.ai courses 1 and 2. They focus on cohesively applying all the shit you’ve learned over the past few months using popular libraries and tooling.
There are a lot of AI residency programs popping up at OpenAI, Google, Facebook, Uber, and a few other places. You are probably a pretty good candidate, give them a shot.
If you get this far, holy shit. Well done. The journey is never over, but you’re in an excellent place and you understand ML as well as many experts. I think.
Oh and those of you just starting, I’m right there with you. Race you to the end ;)
### Python Practice
1. r subsets from a list of N elements
2. cartesian product of two sets
3. produce sample space of two dices given a number N
4. Lcm of two numbers
5. reverse binary representation of a number
6. first N prime number
7. count words in a string
8. sum of all even numbers and odd numbers
9. generate a list of sets of 2 element sets given a list of values
10. print the nth term of a fibonacci series
11. next prime number after a given number
12. dictionary of key as letter and values as list of strings, starting from that letter ; given a list of string
13. input a tuple of int values and gives a dictionay of each int value and its frequency