https://github.com/aershov24/machine-learning-ds-interview-questions
π΄ 1704 Machine Learning, Data Science & Python Interview Questions (ANSWERED) To Kill Your Next ML & DS Interview. Get All Answers + PDFs on MLStack.Cafe. Post your ML Jobs π
https://github.com/aershov24/machine-learning-ds-interview-questions
algorithms-and-data-structures data-analysis data-science interview-practice interview-preparation interview-questions machine-learning machine-learning-algorithms machinelearning
Last synced: 10 months ago
JSON representation
π΄ 1704 Machine Learning, Data Science & Python Interview Questions (ANSWERED) To Kill Your Next ML & DS Interview. Get All Answers + PDFs on MLStack.Cafe. Post your ML Jobs π
- Host: GitHub
- URL: https://github.com/aershov24/machine-learning-ds-interview-questions
- Owner: aershov24
- Created: 2021-08-26T09:45:13.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2023-01-11T00:13:22.000Z (over 3 years ago)
- Last Synced: 2025-03-12T16:16:52.551Z (about 1 year ago)
- Topics: algorithms-and-data-structures, data-analysis, data-science, interview-practice, interview-preparation, interview-questions, machine-learning, machine-learning-algorithms, machinelearning
- Homepage: https://www.mlstack.cafe
- Size: 297 KB
- Stars: 105
- Watchers: 2
- Forks: 32
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
Awesome Lists containing this project
README
# 1704 π€ Machine Learning, Data Science & Python Interview Questions (ANSWERED) To Land Your Next Six-Figure Job Offer from [MLStack.Cafe](https://www.mlstack.cafe)
[MLStack.Cafe](https://www.mlstack.cafe) is the biggest hand-picked collection of top Machine Learning, Data Science, Python and Coding interview questions for Junior and Experienced data analyst, machine learning engineers/developers and data scientists with more that 1704 ML & DS interview questions and answers. Prepare for your next ML, DS & Python interview and land 6-figure job offer in no time.
π΄ Get All 1704 Answers + PDFs + Latex Math on [MLStack.Cafe - Kill Your ML, DS & Python Interview](https://www.mlstack.cafe/?utm_source=github&utm_medium=mlsciq)
π¨βπ» Hiring Data Analysts, Machine Learning Engineers or Developers? [Post your Job on MLStack.Cafe](https://www.mlstack.cafe/?utm_source=github&utm_medium=mlsc-job-posting) and reach thousands of motivated engineers who is looking for a ML Job right now!
---
## Table of Contents
* [Anomaly Detection](#AnomalyDetection)
* [Autoencoders](#Autoencoders)
* [Bias & Variance](#Bias&Variance)
* [Big Data](#BigData)
* [Big-O Notation](#Big-ONotation)
* [Classification](#Classification)
* [Clustering](#Clustering)
* [Cost Function](#CostFunction)
* [Data Structures](#DataStructures)
* [Databases](#Databases)
* [Datasets](#Datasets)
* [Decision Trees](#DecisionTrees)
* [Deep Learning](#DeepLearning)
* [Dimensionality Reduction](#DimensionalityReduction)
* [Ensemble Learning](#EnsembleLearning)
* [Genetic Algorithms](#GeneticAlgorithms)
* [Gradient Descent](#GradientDescent)
* [K-Means Clustering](#K-MeansClustering)
* [K-Nearest Neighbors](#K-NearestNeighbors)
* [Linear Algebra](#LinearAlgebra)
* [Linear Regression](#LinearRegression)
* [Logistic Regression](#LogisticRegression)
* [Machine Learning](#MachineLearning)
* [Model Evaluation](#ModelEvaluation)
* [Natural Language Processing](#NaturalLanguageProcessing)
* [NaΓ―ve Bayes](#NaΓ―veBayes)
* [Neural Networks](#NeuralNetworks)
* [NumPy](#NumPy)
* [Optimization](#Optimization)
* [Pandas](#Pandas)
* [Probability](#Probability)
* [Python](#Python)
* [Random Forests](#RandomForests)
* [SQL](#SQL)
* [SVM](#SVM)
* [Scikit-Learn](#Scikit-Learn)
* [Searching](#Searching)
* [Sorting](#Sorting)
* [Statistics](#Statistics)
* [Supervised Learning](#SupervisedLearning)
* [TensorFlow](#TensorFlow)
* [Unsupervised Learning](#UnsupervisedLearning)
## [[β¬]](#toc) Anomaly Detection Interview Questions
#### Q1: Explain what is Anomaly Detection? β
##### Answer:
**Anomaly detection** (or outlier detection) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.

**Source:** _towardsdatascience.com_
#### Q2: Why do we care about Anomalies? ββ
##### Answer:
* The goal of anomaly detection is to identify cases that are unusual within data that is seemingly comparable hence anomaly detection can be used effectively as a tool for risk mitigation and fraud detection.
* When preparing datasets for machine learning models, it is really important to detect all the outliers and either get rid of them or analyze them to know why you had them there in the first place.
**Source:** _towardsdatascience.com_
#### Q3: What's the difference between _Normalisation_ and _Standardisation_? ββ
##### Answer:
**Normalization** rescales the values into a range of \[0,1\]. This might be useful in some cases where all parameters need to have the same positive scale. However, the _outliers_ from the data set _are lost._
$$
X_{changed} = \frac{X - X_{min}}{X_{max}-X_{min}}
$$
**Standardization** rescales data to have a mean ($\mu$) of 0 and standard deviation ($\sigma$) of 1 (unit variance).
$$
X_{changed} = \frac{X - \mu}{\sigma}
$$
For most applications standardization is recommended.

**Source:** _stats.stackexchange.com_
#### Q4: Why would you use the _Median_ as a measure of central tendency? ββ
##### Answer:
The **Median** is the most suitable measure of _central tendency_ for **skewed distributions** or distributions with **outliers**. For example, the median is often used as a measure of central tendency for income distributions, which are generally highly skewed.
Because the median only uses one or two values, itβs unaffected by extreme _outliers_ or _non-symmetric distributions_ of scores. In contrast, the **_mean_** and **_mode_** can vary in skewed distributions.

**Source:** _en.wikipedia.org_
#### Q5: Explain how to use _Standard Deviation_ for Anomalies Detection? βββ
Read answer on π MLStack.Cafe
#### Q6: What Are some _types_ of Anomalies? βββ
Read answer on π MLStack.Cafe
#### Q7: What are some _categories_ of outlier detection approaches? βββ
Read answer on π MLStack.Cafe
#### Q8: How to use _one-class SVM_ for Anomalies Detections? βββ
Read answer on π MLStack.Cafe
#### Q9: Explain the difference between _Outlier Detection_ vs _Novelty Detection_ βββ
Read answer on π MLStack.Cafe
#### Q10: Compare *SVM* and *Logistic Regression* in handling outliers βββ
Read answer on π MLStack.Cafe
#### Q11: How to use _Isolation Forest_ for Anomalies detection? ββββ
Read answer on π MLStack.Cafe
#### Q12: What are some _advantages_ of using _Isolation Forest_ algorithm for outliers detection? ββββ
Read answer on π MLStack.Cafe
#### Q13: How would you deal with _Outliers_ in your dataset? ββββ
Read answer on π MLStack.Cafe
#### Q14: Imagine that you know there are _outliers_ in your data, would you use _Logistic Regression_? ββββ
Read answer on π MLStack.Cafe
#### Q15: How is *PCA* used for *Anomaly Detection*? ββββ
Read answer on π MLStack.Cafe
#### Q16: How does *Dictionary Learning* perform *Anomaly Detection*? ββββ
Read answer on π MLStack.Cafe
#### Q17: What types of _Robust Regression Algorithms_ do you know? βββββ
Read answer on π MLStack.Cafe
## [[β¬]](#toc) Autoencoders Interview Questions
#### Q1: Describe the approach used in *Denoising Autoencoders* βββ
Read answer on π MLStack.Cafe
#### Q2: How can *Neural Networks* be used to create *Autoencoders*? βββ
Read answer on π MLStack.Cafe
#### Q3: Can you use *Batch Normalisation* in *Sparse Auto-encoders*? ββββ
Read answer on π MLStack.Cafe
#### Q4: What are the main differences between *Sparse Autoencoders* and *Convolution Autoencoders*? ββββ
Read answer on π MLStack.Cafe
#### Q5: What are some differences between the *Undercomplete Autoencoder* and the *Sparse Autoencoder*? ββββ
Read answer on π MLStack.Cafe
#### Q6: How can *Neural Networks* be _Unsupervised_?
Read answer on π MLStack.Cafe
## [[β¬]](#toc) Bias & Variance Interview Questions
#### Q1: What is _Bias_ in Machine Learning? ββ
##### Answer:
In supervised machine learning an algorithm learns a model from training data.
The goal of any supervised machine learning algorithm is to best estimate the mapping function (f) for the output variable (Y) given the input data (X). The mapping function is often called the target function because it is the function that a given supervised machine learning algorithm aims to approximate.
**Bias** are **the simplifying assumptions** made by a model to make the target function easier to learn.
Generally, linear algorithms have a high bias making them fast to learn and easier to understand but generally less flexible.
* Examples of **low\-bias** machine learning algorithmsΒ include:Β Decision Trees,Β k\-Nearest Neighbors andΒ [Support Vector Machines](https://machinelearningmastery.com/support-vector-machines-for-machine-learning/).
* Examples of **high\-bias** machine learning algorithms include:Β Linear Regression,Β Linear Discriminant Analysis andΒ Logistic Regression.
**Source:** _machinelearningmastery.com_
#### Q2: What is the *Bias-Variance* tradeoff? ββ
##### Answer:
* **High Bias** can cause an algorithm to miss the relevant relations between features and target outputs (*underfitting*).
* **High Variance** may result from an algorithm modeling random noise in the training data (*overfitting*).

* The **Bias-Variance tradeoff** is a central problem in _supervised learning_. Ideally, a model should be able to accurately capture the regularities in its training data, but also generalize well to unseen data.
* It is called a *tradeoff* because it is typically impossible to do both simultaneously:
* Algorithms with _high variance_ will be prone to _overfitting_ the dataset, but
* Algorithms with *high bias* will _underfit_ the dataset.

**Source:** _en.wikipedia.org_
#### Q3: Provide an intuitive explanation of the _Bias-Variance Tradeoff_ βββ
Read answer on π MLStack.Cafe
#### Q4: Name some types of _Data Biases_ in Machine Learning? βββ
Read answer on π MLStack.Cafe
#### Q5: What to do if you have _High Variance Problem_? βββ
Read answer on π MLStack.Cafe
#### Q6: What to do if you have _High Bias Problem_? βββ
Read answer on π MLStack.Cafe
#### Q7: What's the difference between _Bagging_ and _Boosting_ algorithms? βββ
Read answer on π MLStack.Cafe
#### Q8: How can you relate the _KNN Algorithm_ to the _Bias-Variance tradeoff_? βββ
Read answer on π MLStack.Cafe
#### Q9: What is the *Bias Error*? βββ
Read answer on π MLStack.Cafe
#### Q10: What is the *Variance Error*? βββ
Read answer on π MLStack.Cafe
#### Q11: When you sample, what potential _Sampling Biases_ could you be inflicting? βββ
Read answer on π MLStack.Cafe
## [[β¬]](#toc) Big Data Interview Questions
## [[β¬]](#toc) Big-O Notation Interview Questions
#### Q1: What is _Big O_ notation? β
##### Answer:
**Big-O** notation (also called "asymptotic growth" notation) is a relative representation of the complexity of an algorithm. It shows how an algorithm *scales* based on input size. We use it to talk about how thing _scale_. Big O complexity can be visualized with this graph:

**Source:** _stackoverflow.com_
#### Q2: Provide an example of O(1) algorithm β
##### Answer:
Say we have an array of `n` elements:
```cs
int array[n];
```
If we wanted to access the first (or any) element of the array this would be O(1) since it doesn't matter how big the array is, it always takes the same constant time to get the first item:
```cs
x = array[0];
```
**Source:** _stackoverflow.com_
#### Q3: What is Worst Case? ββ
##### Answer:
Big-O is often used to make statements about functions that measure the worst case behavior of an algorithm. **Worst case** analysis gives the maximum number of basic operations that have to be performed during execution of the algorithm. It assumes that the input is in the _worst possible state_ and maximum work has to be done to put things right.
**Source:** _stackoverflow.com_
#### Q4: What the heck does it mean if an operation is O(log n)? ββ
##### Answer:
**O(log n)** means for every element, you're doing something that only needs to look at **log N** of the elements. This is usually because you know something about the elements that let you make an _efficient choice_ (for example to reduce a _search space_).
The most common attributes of logarithmic running\-time function are that:
* the choice of the next element on which to perform some action is one of several possibilities, and
* only one will need to be chosen
or
* the elements on which the action is performed are digits of `n`
Most efficient sorts are an example of this, such as **merge sort**. βIt is `O(log n)` when we do divide and conquer type of algorithms e.g binary search. Another example is **quick sort** where each time we divide the array into two parts and each time it takes `O(N)` time to find a pivot element. Hence it `N O(log N)`
Plotting `log(n)` on a plain piece of paper, will result in a graph where the rise of the curve decelerates as `n` increases:

**Source:** _stackoverflow.com_
#### Q5: Why do we use Big O notation to compare algorithms? ββ
##### Answer:
The fact is it's difficult to determine the exact runtime of an algorithm. It depends on the speed of the computer processor. So instead of talking about the runtime directly, we use Big O Notation to talk about _how quickly the runtime grows_ depending on input size.
With Big O Notation, we use the size of the input, which we call `n`. So we can say things like the runtime grows βon the order of the size of the inputβ (O(n)) or βon the order of the square of the size of the inputβ (O(n2)). Our algorithm may have steps that seem expensive when `n` is small but are eclipsed eventually by other steps as `n` gets larger. For Big O Notation analysis, we care more about the stuff that grows fastest as the input grows, because everything else is quickly eclipsed as `n` gets very large.
**Source:** _medium.com_
#### Q6: What exactly would an O(n2) operation do? ββ
##### Answer:
**O(n2)** means for every element, you're doing something with _every_ other element, such as comparing them. Bubble sort is an example of this.
**Source:** _stackoverflow.com_
#### Q7: What is complexity of this code snippet? ββ
##### Details:
Let's say we wanted to find a number in the list:
```js
for (int i = 0; i < n; i++){
if(array[i] == numToFind){ return i; }
}
```
What will be the time complexity (Big O) of that code snippet?
##### Answer:
This would be O(n) since at most we would have to look through the entire list to find our number. The Big-O is still O(n) even though we might find our number the first try and run through the loop once because Big-O describes the upper bound for an algorithm.
**Source:** _stackoverflow.com_
#### Q8: What is complexity of `push` and `pop` for a Stack implemented using a LinkedList? ββ
##### Answer:
O(1). Note, you don't have to insert at the end of the list. If you insert at the front of a (singly-linked) list, they are both `O(1)`.
Stack contains 1,2,3:
```py
[1]->[2]->[3]
```
Push 5:
```js
[5]->[1]->[2]->[3]
```
Pop:
```js
[1]->[2]->[3] // returning 5
```
**Source:** _stackoverflow.com_
#### Q9: Explain the difference between _`O(1)`_ vs _`O(n)`_ space complexities ββ
##### Answer:
Let's consider a traversal algorithm for traversing a list.
* O(1) denotes _constant_ space use: the algorithm allocates the same number of pointers irrespective to the list size. That will happen if we move (reuse) our pointer along the list.
* In contrast, O(n) denotes _linear_ space use: the algorithm space use grows together with respect to the input size `n`. That will happen if let's say for some reason the algorithm needs to allocate 'N' pointers (or other variables) when traversing a list.
**Source:** _stackoverflow.com_
#### Q10: What is the big O notation of this function? ββ
##### Details:
Consider:
```js
f(x) = log n + 3n
```
What is the big O notation of this function?
##### Answer:
It is simply O(n).
When you have a composite of multiple parts in Big O notation which are added, you have to choose the biggest one. In this case it is _`O(3n)`_, but there is no need to include constants inside parentheses, so we are left with _`O(n)`_.
**Source:** _stackoverflow.com_
#### Q11: What is an algorithm? βββ
Read answer on π MLStack.Cafe
#### Q12: What is complexity of this code snippet? βββ
Read answer on π MLStack.Cafe
#### Q13: What is the time complexity for "Hello, World" function? βββ
Read answer on π MLStack.Cafe
#### Q14: What is meant by "Constant Amortized Time" when talking about time complexity of an algorithm? βββ
Read answer on π MLStack.Cafe
#### Q15: Why do we use Big O instead of Big Theta (Ξ)? βββ
Read answer on π MLStack.Cafe
#### Q16: Name some types of Big O complexity and corresponding algorithms βββ
Read answer on π MLStack.Cafe
#### Q17: What is complexity of "Reading a Book"? βββ
Read answer on π MLStack.Cafe
#### Q18: Explain your understanding of "Space Complexity" with examples βββ
Read answer on π MLStack.Cafe
#### Q19: What is the difference between Lower bound and Tight bound? ββββ
Read answer on π MLStack.Cafe
#### Q20: What does it mean if an operation is O(n!)? ββββ
Read answer on π MLStack.Cafe
#### Q21: Provide an example of algorithm with time complexity of O(ck)? ββββ
Read answer on π MLStack.Cafe
#### Q22: What are some algorithms which we use daily that has _`O(1)`_, _`O(n log n)`_ and _`O(log n)`_ complexities? ββββ
Read answer on π MLStack.Cafe
## [[β¬]](#toc) Classification Interview Questions
#### Q1: Why Naive Bayes is called _Naive_? ββ
##### Answer:
We call it **naive** because its assumptions (it assumes that all of the features in the dataset are equally important and independent) are really optimistic and rarely true in most real-world applications:
- we consider that these _predictors_ are _independent_
- we consider that all the predictors have an _equal effect_ on the outcome (like the day being windy does not have more importance in deciding to play golf or not)
**Source:** _towardsdatascience.com_
#### Q2: What is a *Perceptron*? ββ
##### Answer:
* A **Perceptron** is a fundamental unit of a Neural Network that is also a single-layer Neural Network.
* Perceptron is a linear _classifier_. Since it uses already labeled data points, it is a *supervised learning algorithm*.
* The _activation function_ applies a step rule (convert the numerical output into +1 or -1) to check if the output of the weighting function is greater than zero or not.
A **Perceptron** is shown in the figure below:

**Source:** _towardsdatascience.com_
#### Q3: What is a _Decision Boundary_? ββ
##### Answer:
A **decision boundary** is a line or a hyperplane that separates the classes. This is what we expect to obtain from _logistic regression_, as with any other classifier. With this, we can figure out some way to split the data to allow for an accurate prediction of a given observationβs class using the available information.
In the case of a generic two-dimensional example, the split might look something like this:

**Source:** _medium.com_
#### Q4: What types of _Classification Algorithms_ do you know? ββ
##### Answer:
- **Logistic regression**: ideally used for classification of _binary_ variables. Implements the _sigmoid function_ to calculate the probability that a data point belongs to a certain class.
- **K-Nearest Neighbours (kNN)**: calculate the distance of one data point from every other data point and then takes a majority vote from _k-nearest neighbors_ of each data points to classify the output.
- **Decision trees**: use multiple _if-else statements_ in the form of a tree structure that includes _nodes_ and _leaves_. The nodes breaking down the one major structure into smaller structures and eventually providing the final outcome.
- **Random Forest**: uses multiple _decision trees_ to predict the outcome of the target variable. Each decision tree provides its own outcome and then it takes the majority vote to classify the final outcome.
- **Support Vector Machines**: it creates an _n-dimensional space_ for the _n number of features_ in the dataset and then tries to create the hyperplanes such that it divides and classifies the data points with the maximum margin possible.
**Source:** _www.upgrad.com_
#### Q5: What is the difference between _KNN_ and _K-means Clustering_? ββ
##### Answer:
- **_K-nearest neighbors_** or _KNN_ is a _supervised classification algorithm_. This means that we need labeled data to classify an unlabeled data point. It attempts to classify a data point based on its proximity to other `K`-data points in the feature space.
- **_K-means Clustering_** is an _unsupervised classification algorithm_. It requires only a set of unlabeled points and a threshold `K`, so it gathers and groups data into `K` number of clusters.
**Source:** _www.quora.com_
#### Q6: How do you choose the optimal _k_ in _k-NN_? ββ
##### Answer:
There is not a rule of thumb to choose a standard optimal **_k_**. This value depends and varies from dataset to dataset, but as a general rule, the main goal is to keep it:
- small enough to exclude the samples of the other classes but
- large enough to minimize any noise in the data.
A way to looking for this optimal parameter, commonly called the _Elbow method_, consist in creating a _for loop_ that trains various **_KNN_** models with different **_k values_**, keeping track of the error for each of these models, and use the model with the **_k value_** that achieves the best accuracy.

**Source:** _medium.com_
#### Q7: How would you make a prediction using a _Logistic Regression_ model? ββ
##### Answer:
In **Logistic regression** models, we are modeling the _probability_ that an input `(X)` belongs to the default class `(Y=1)`, that is to say:
$$
P(X) = P(Y=1|X)
$$
where the `P(X)` values are given by the **_logistic function_**,
$$
P(X) = \frac{e^{\beta_0 + \beta_1X}}{1 + e^{\beta_0 + \beta_1X}}
$$
The `Ξ²0` and `Ξ²1` values are estimated during the training stage using _maximum-likelihood_ estimation or _gradient descent_. Once we have it, we can make predictions by simply putting numbers into the _logistic regression equation_ and calculating a result.
For example, let's consider that we have a model that can predict whether a person is male or female based on their height, such as if `P(X) β₯ 0.5` the person is male, and if `P(X) < 0.5` then is female.
During the training stage we obtain `Ξ²0 = -100` and `Ξ²1 = 0.6`, and we want to evaluate what's the probability that a person with a height of `150cm` is male, so with that intention we compute:
$$
y = \frac{e^{-100 + 0.6\cdot 150}}{1 + e^{-100 + 0.6\cdot 150}} = 0.00004539 \cdots
$$
Given that logistic regression solves a _classification_ task, we can use directly this value to predict that the person is a female.
**Source:** _machinelearningmastery.com_
#### Q8: Why would you use the _Kernel Trick_? ββ
##### Answer:
When it comes to **classification** problems, the goal is to establish a decision boundary that maximizes the margin between the classes. However, in the real world, this task can become difficult when we have to treat with **non-linearly separable data**. One approach to solve this problem is to perform a data transformation process, in which we map all the data points to a **higher dimension** find the boundary and make the classification.
That sounds alright, however, when there are more and more dimensions, computations within that space become more and more expensive. In such cases, the **kernel trick allows us to operate in the original feature space without computing the coordinates of the data** in a higher-dimensional space and therefore offers a more efficient and less expensive way to transform data into higher dimensions.
There exist different kernel functions, such as:
- _linear_,
- _nonlinear_,
- _polynomial_,
- _radial basis function (RBF)_, and
- _sigmoid_.
Each one of them can be suitable for a particular problem depending on the data.
**Source:** _medium.com_
#### Q9: What is the *Hinge Loss* in SVM? βββ
Read answer on π MLStack.Cafe
#### Q10: Name some _classification metrics_ and when would you use each one βββ
Read answer on π MLStack.Cafe
#### Q11: What is the difference between a _Weak Learner_ vs a _Strong Learner_ and why they could be usefu? βββ
Read answer on π MLStack.Cafe
#### Q12: What's the difference between _Bagging_ and _Boosting_ algorithms? βββ
Read answer on π MLStack.Cafe
#### Q13: Provide an intuitive explanation of _Linear Support Vector Machines (SVMs)_ βββ
Read answer on π MLStack.Cafe
#### Q14: Could you _convert_ Regression into Classification and vice versa? βββ
Read answer on π MLStack.Cafe
#### Q15: What's the difference between _One-vs-Rest_ and _One-vs-One_? βββ
Read answer on π MLStack.Cafe
#### Q16: Can you choose a _classifier_ based on the _size of the training set_? βββ
Read answer on π MLStack.Cafe
#### Q17: How would you use _Naive Bayes_ classifier for categorical features? What if some features are numerical? βββ
Read answer on π MLStack.Cafe
#### Q18: What's the difference between _Generative Classifiers_ and _Discriminative Classifiers_? Name some examples of each one βββ
Read answer on π MLStack.Cafe
#### Q19: How does the _Naive Bayes_ classifier work? βββ
Read answer on π MLStack.Cafe
#### Q20: How does the _AdaBoost_ algorithm work? βββ
Read answer on π MLStack.Cafe
#### Q21: What's the difference between _Softmax_ and _Sigmoid_ functions? βββ
Read answer on π MLStack.Cafe
#### Q22: How do you use a supervised *Logistic Regression* for Classification? βββ
Read answer on π MLStack.Cafe
#### Q23: What is a *Confusion Matrix*? βββ
Read answer on π MLStack.Cafe
#### Q24: How does *ROC* curve and *AUC* value help measure how good a model is? βββ
Read answer on π MLStack.Cafe
#### Q25: What are some advantages and disadvantages of using *AUC* to measure the _performance_ of the model? βββ
Read answer on π MLStack.Cafe
#### Q26: What is the *F-Score*? βββ
Read answer on π MLStack.Cafe
#### Q27: How is _AUC - ROC_ curve used in classification problems? ββββ
Read answer on π MLStack.Cafe
#### Q28: Name some advantages of using _Support Vector Machines_ vs _Logistic Regression_ for classification ββββ
Read answer on π MLStack.Cafe
#### Q29: When would you use _SVM_ vs _Logistic regression_? ββββ
Read answer on π MLStack.Cafe
#### Q30: Are there any problems using _Naive Bayes_ for Classification? ββββ
Read answer on π MLStack.Cafe
#### Q31: What's the difference between _Random Oversampling_ and _Random Undersampling_ and when they can be used? ββββ
Read answer on π MLStack.Cafe
#### Q32: How would you use a _Confusion Matrix_ for determining a model performance? ββββ
Read answer on π MLStack.Cafe
#### Q33: How would you deal with classification on _Non-linearly Separable_ data? ββββ
Read answer on π MLStack.Cafe
#### Q34: What are the trade-offs between the different types of _Classification Algorithms_? How would do you choose the best one? ββββ
Read answer on π MLStack.Cafe
#### Q35: Compare _Naive Bayes_ vs with _Logistic Regression_ to solve classification problems ββββ
Read answer on π MLStack.Cafe
#### Q36: How would you _Calibrate Probabilities_ for a classification model? ββββ
Read answer on π MLStack.Cafe
#### Q37: How would you choose an evaluation metric for an _Imbalanced classification_? ββββ
Read answer on π MLStack.Cafe
#### Q38: What is *AIC*? ββββ
Read answer on π MLStack.Cafe
#### Q39: Can _Logistic Regression_ be used for an _Imbalanced Classification_ problem? βββββ
Read answer on π MLStack.Cafe
#### Q40: Why would you use _Probability Calibration_? βββββ
Read answer on π MLStack.Cafe
#### Q41: What's the difference between _ROC_ and _Precision-Recall_ Curves? βββββ
Read answer on π MLStack.Cafe
#### Q42: How to interpret _F-measure_ values? βββββ
Read answer on π MLStack.Cafe
## [[β¬]](#toc) Clustering Interview Questions
#### Q1: Define what is *Clustering*? β
##### Answer:
* **Cluster analysis** is also called **clustering**.
* It is the task of grouping a set of objects in such a way that *objects* in the same *cluster* are *more similar* to each other than to those in other clusters.
* Cluster analysis itself is *not* one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them.

**Source:** _Handbook of Cluster Analysis from Chapman and Hall/CRC_
#### Q2: What is *Similarity-based Clustering*? ββ
##### Answer:
* Clustering, when the data are similar pairs of points is called **similarity-based clustering**.
* A typical example of similarity-based clustering is community detection in social networks, where the observations are individual links between people, which may be due to friendship, shared interests, and work relationships. The *strength* of a link can be the frequency of interactions, for example, communications by e-mail, phone, or other social media, co-authorships, or citations.
* In this clustering paradigm, the points to be clustered are not assumed to be part of a vector space. Their attributes (or features) are incorporated into a single dimension, the *link strength*, or *similarity*, which takes a numerical value $$S_{ij}$$ for each pair of points `i`, `j`. Hence, the natural representation for this problem is by means of the similarity matrix given below:
$$
S=[S_{ij}]_{i,j=1}^n
$$
The similarities are symmetric $$S_{ij} = S_{ji}$$ and nonnegative $$S_{ij} \geq 0$$.
**Source:** _Handbook of Cluster Analysis from Chapman and Hall/CRC_
#### Q3: Give examples of using *Clustering* to solve real-life problems ββ
##### Answer:
* **Identifying cancerous data:** Initially we take known samples of a cancerous and non-cancerous dataset, and label both the samples dataset. Then both the samples are mixed and different clustering algorithms are applied to the mixed samples dataset. It has been found through experiments that a cancerous dataset gives the best results with unsupervised non-linear clustering algorithms.
* **Search engines:** Search engines try to group similar objects in one cluster and the dissimilar objects far from each other. It provides results for the searched data according to the nearest similar object which is clustered around the data to be searched.
* **Wireless sensor network's based application:** Clustering algorithm can be used effectively in *Wireless Sensor Network's based application*. One application where it can be used is in *Landmine detection*. The clustering algorithm plays the role of finding the Cluster heads (or cluster center) which collects all the data in its respective cluster.
**Source:** _sites.google.com_
#### Q4: What is *Mean-Shift Clustering*? ββ
##### Answer:
* **Mean Shift** is a non-parametric feature-space analysis technique for locating the maxima of a *density function*. What we're trying to achieve here is, to keep shifting the window to a region of _higher density_.

* We can understand this algorithm by thinking of our data points to be represented as a probability density function. Naturally, in a probability function, higher density regions will correspond to the regions with more points, and lower density regions will correspond to the regions with less points.
In clustering, we need to find clusters of points, i.e the regions with a lot of points together. More points together mean higher density. Hence, we observe that clusters of points are more like the higher density regions in our probability density function.
So, we must iteratively go from lower density to higher density regions, in order to find our clusters.
* The mean shift method is an iterative method, and we start with an initial estimate `x`. Let a *kernel function* $$K(x_i - x)$$ be given. This function determines the weight of nearby points for re-estimation of the mean. Typically a *Gaussian kernel* on the distance to the current estimate is used,
$$
K(x_i-x)= e^{-c|x_i-x|^2}
$$
The weighted mean of the density in the window determined by `K` is
$$
m(x) = \frac{\sum_{x_i \in N(x)} K(x_i - x) x_i}{\sum_{x_i \in N(x) K(x_i - x)}}
$$
where `N(x)` is the neighborhood of `x`, a set of points for which $$K(x_i) \neq 0$$.
* The difference `m(x) - x` is called *mean shift*. The *mean-shift algorithm* now sets $$m(x) \to x$$, and repeats the estimation until `m(x)` converges. It means, after a sufficient number of steps, the position of the centroid of all the points, and the current location of the window will coincide. This is when we reach convergence, as no new points are added to our window in this step.
**Source:** _en.wikipedia.org_
#### Q5: What are *Self-Organizing Maps*? ββ
##### Answer:
* **Self-Organizing Maps** (**SOMs**) are a class of *self-organizing* clustering techniques.
* It is an _unsupervised form of artificial neural networks_. A self-organizing map consists of a set of neurons that are arranged in a rectangular or hexagonal grid. Each neuronal unit in the grid is associated with a numerical vector of fixed dimensionality. The learning process of a self-organizing map involves the adjustment of these vectors to provide a suitable representation of the input data.
* Self-organizing maps can be used for clustering numerical data in vector format.

**Source:** _medium.com_
#### Q6: Why do you need to perform *Significance Testing* in *Clustering*? ββ
##### Answer:
* **Significance testing** addresses an important aspect of cluster validation. Many cluster analysis methods will deliver clusterings even for homogeneous data. They assume implicitly that clustering has to be found, regardless of whether this is meaningful or not.
>A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation.
* **Significance testing** is performed to distinguish between a clustering that reflects meaningful _heterogeneity_ in the data and an artificial clustering of _homogeneous_ data.
* Significance testing is also used for more specific tasks in cluster analysis, such as; estimating the number of clusters, and for interpreting some or all of the individual clusters, to show the significance of the individual clusters.
**Source:** _www.ncbi.nlm.nih.gov_
#### Q7: What is the difference between a _Multiclass problem_ and a _Multilabel problem_? ββ
##### Answer:
**Multiclass classification** means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is _assigned to one and only one label_: a fruit can be either an apple or a pear but not both at the same time.
**Multilabel classification** assigns to each sample a set of target labels. This can be thought of as predicting properties of a data-point that are _not mutually exclusive_, such as topics that are relevant for a document. A text might be about any of religion, politics, finance or education at the same time or none of these.

**Source:** _stats.stackexchange.com_
#### Q8: What is the _Jaccard Index_? ββ
##### Answer:
The **Jaccard index**, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. The Jaccard coefficient measures **similarity** between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:



**Source:** _en.wikipedia.org_
#### Q9: What is the difference between the two types of *Hierarchical Clustering*? βββ
Read answer on π MLStack.Cafe
#### Q10: While performing *K-Means* Clustering, how do you determine the value of *K*? βββ
Read answer on π MLStack.Cafe
#### Q11: What are some different types of *Clustering Structures* that are used in *Clustering Algorithms*? βββ
Read answer on π MLStack.Cafe
#### Q12: When would you use *Hierarchical Clustering* over *Spectral Clustering*? βββ
Read answer on π MLStack.Cafe
#### Q13: Compare *Hierarchical Clustering* and *k-Means Clustering* βββ
Read answer on π MLStack.Cafe
#### Q14: Where do the *Similarities* come from in *Similarity-based Clustering*? βββ
Read answer on π MLStack.Cafe
#### Q15: What is a *Mixture Model*? βββ
Read answer on π MLStack.Cafe
#### Q16: What is the *Mixture* in *Gaussian Mixture Model*? βββ
Read answer on π MLStack.Cafe
#### Q17: What is *Latent Class Model*? βββ
Read answer on π MLStack.Cafe
#### Q18: How would you perform an *Observation-Based Clustering* for *Time-Series Data*? βββ
Read answer on π MLStack.Cafe
#### Q19: Name some pros and cons of _Mean Shift Clustering_ βββ
Read answer on π MLStack.Cafe
#### Q20: How can *Evolutionary Algorithms* be used for *Clustering*? βββ
Read answer on π MLStack.Cafe
#### Q21: What is _Silhouette Analysis_? βββ
Read answer on π MLStack.Cafe
#### Q22: Why does *K-Means* have a higher *bias* when compared to *Gaussian Mixture Model*? ββββ
Read answer on π MLStack.Cafe
#### Q23: Explain how a cluster is formed in the *DBSCAN* Clustering Algorithm ββββ
Read answer on π MLStack.Cafe
#### Q24: What makes the distance measurement of *k-Medoids* better than *k-Means*? ββββ
Read answer on π MLStack.Cafe
#### Q25: When using various Clustering Algorithms, why is *Euclidean Distance* not a good metric in _High Dimensions_? ββββ
Read answer on π MLStack.Cafe
#### Q26: When would you use *Hierarchical Clustering* over *k-Means Clustering*? ββββ
Read answer on π MLStack.Cafe
#### Q27: How would you choose the number of *Clusters* when designing a *K-Medoid Clustering Algorithm*? ββββ
Read answer on π MLStack.Cafe
#### Q28: Explain the *Dirichlet Process Gaussian Mixture Model* ββββ
Read answer on π MLStack.Cafe
#### Q29: Why is *Euclidean Distance* not good for *Sparse Data*? ββββ
Read answer on π MLStack.Cafe
#### Q30: When would you use *Segmentation* over *Clustering*? ββββ
Read answer on π MLStack.Cafe
#### Q31: How to tell if data is _clustered_ enough for clustering algorithms to produce meaningful results? ββββ
Read answer on π MLStack.Cafe
#### Q32: How to choose among the various clustering _Distance Measures_? ββββ
Read answer on π MLStack.Cafe
#### Q33: Explain the different frameworks used for *k-Means Clustering* βββββ
Read answer on π MLStack.Cafe
#### Q34: What is the motivation behind the *Expectation-Maximization Algorithm*? βββββ
Read answer on π MLStack.Cafe
#### Q35: What is the relationship between *k-Means Clustering* and *PCA*? βββββ
Read answer on π MLStack.Cafe
## [[β¬]](#toc) Cost Function Interview Questions
#### Q1: Provide an analogy for a _Cost Function_ in real life βββ
Read answer on π MLStack.Cafe
#### Q2: Explain what is _Cost (Loss) Function_ in Machine Learning? βββ
Read answer on π MLStack.Cafe
#### Q3: What is the difference between _Cost Function_ vs _Gradient Descent_? βββ
Read answer on π MLStack.Cafe
#### Q4: What is the difference between _Objective function_, _Cost function_ and _Loss function_ βββ
Read answer on π MLStack.Cafe
#### Q5: Why donβt we use _Mean Squared Error_ as a cost function in Logistic Regression? βββ
Read answer on π MLStack.Cafe
#### Q6: How would you fix Logistic Regression _Overfitting_ problem? βββ
Read answer on π MLStack.Cafe
#### Q7: What is the *Hinge Loss* in SVM? βββ
Read answer on π MLStack.Cafe
#### Q8: What type of *Cost Functions* do *Greedy Splitting* use? βββ
Read answer on π MLStack.Cafe
#### Q9: How would you choose the *Loss Function* for a Deep Learning model? βββ
Read answer on π MLStack.Cafe
## [[β¬]](#toc) Data Structures Interview Questions
#### Q1: Define Stack β
##### Answer:
A **Stack** is a container of objects that are inserted and removed according to the last-in first-out (**LIFO**) principle. In the pushdown stacks only two operations are allowed: push the item into the stack, and pop the item out of the stack.
There are basically three operations that can be performed on stacks. They are:
1. inserting an item into a stack (**push**).
2. deleting an item from the stack (**pop**).
3. displaying the contents of the stack (**peek** or **top**).
A stack is a limited access data structure - elements can be added and removed from the stack only at the top. push adds an item to the top of the stack, pop removes the item from the top. A helpful analogy is to think of a stack of books; you can remove only the top book, also you can add a new book on the top.

**Source:** _www.cs.cmu.edu_
#### Q2: Explain why Stack is a recursive data structure β
##### Answer:
A **stack** is a **recursive** data structure, so it's:
* a stack is either empty or
* it consists of a top and the rest which is a stack by itself;
**Source:** _www.cs.cmu.edu_
#### Q3: Define Linked List β
##### Answer:
A **linked list** is a linear data structure where each element is a separate object. Each element (we will call it a **node**) of a list is comprising of two items - the **data** and a **reference (pointer)** to the next node. The last node has a reference to **null**. The entry point into a linked list is called the **head** of the list. It should be noted that _head is not a separate node,_ but the reference to the first node. If the list is empty then the head is a null reference.
**Source:** _www.cs.cmu.edu_
#### Q4: Name some characteristics of Array Data Structure β
##### Answer:
Arrays are:
* **Finite (fixed-size)** - An array is finite because it contains only limited number of elements.
* **Order** -All the elements are stored one by one , in contiguous location of computer memory in a linear order and fashion
* **Homogenous** - All the elements of an array are of same data types only and hence it is termed as collection of homogenous
**Source:** _codelack.com_
#### Q5: What is Queue? β
##### Answer:
A **queue** is a container of objects (a _linear_ collection) that are inserted and removed according to the first-in first-out (FIFO) principle. The process to add an element into queue is called **Enqueue** and the process of removal of an element from queue is called **Dequeue**.

**Source:** _www.cs.cmu.edu_
#### Q6: What is Heap? β
##### Answer:
A **Heap** is a special Tree-based data structure which is an almost complete tree that satisfies the heap property:
* in a **max heap**, for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C.
* In a **min heap**, the key of P is less than or equal to the key of C. The node at the "top" of the heap (with no parents) is called the root node.
A common implementation of a heap is the binary heap, in which the tree is a **binary tree.**

**Source:** _www.geeksforgeeks.org_
##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q7: What is Hash Table? β
##### Answer:
A **hash table** (hash map) is a data structure that implements an **associative** array abstract data type, a **structure** that can **map keys to values**. Hash tables implement an associative array, which is indexed by arbitrary objects (keys). A hash table uses a **hash function** to compute an **index**, also called a **hash value**, into an **array of buckets** or slots, from which the desired **value** can be found.

**Source:** _en.wikipedia.org_
#### Q8: What is Priority Queue? β
##### Answer:
A **priority queue** is a data structure that stores **priorities** (comparable values) and perhaps associated information. A **priority queue** is different from a "normal" queue, because instead of being a "first-in-first-out" data structure, values come out in order by **priority**. Think of a priority queue as a kind of bag that holds priorities. You can put one in, and you can take out the current highest priority.

**Source:** _pages.cs.wisc.edu_
##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q9: Define Tree Data Structure β
##### Answer:
**Trees** are well-known as a _non-linear_ data structure. They donβt store data in a linear way. They organize data _hierarchically_.
A **tree** is a collection of entities called **nodes**. Nodes are connected by **edges**. Each node contains a **value** or **data** or **key**, and it may or may not have a **child** node. The first node of the tree is called the **root**. **Leaves** are the last nodes on a tree. They are nodes without children.

**Source:** _www.freecodecamp.org_
##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q10: What is a Graph? β
##### Answer:
A **graph** is a common data structure that consists of a finite set of **nodes** (or **vertices**) and a set of **edges** connecting them. A pair `(x,y)` is referred to as an edge, which communicates that the **x vertex** connects to the **y vertex**.
Graphs are used to solve real-life problems that involve representation of the problem space as a **network**. Examples of networks include telephone networks, circuit networks, social networks (like LinkedIn, Facebook etc.).

**Source:** _www.educative.io_
##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q11: What is String in Data Structures? β
##### Answer:
A **string** is generally considered as a **data type** and is often implemented as an **array data structure** of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
**Source:** _dev.to_
##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q12: What is Trie? β
##### Answer:
**Trie** (also called **digital tree **or **prefix tree**) is a _tree-based data structure_, which is used for efficient _retrieval_ of a key in a large data-set of strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated; i.e., **the value of the key is distributed across the structure**. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Each complete English word has an arbitrary integer value associated with it (see image).

**Source:** _medium.com_
##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q13: Define Binary Tree β
##### Answer:
A normal tree has no restrictions on the number of children each node can have. A **binary tree** is made of nodes, where each node contains a "left" pointer, a "right" pointer, and a data element.
There are three different types of binary trees:
* **Full binary tree**: Every node other than leaf nodes has 2 child nodes.
* **Complete binary tree**: All levels are filled except possibly the last one, and all nodes are filled in as far left as possible.
* **Perfect binary tree**: All nodes have two children and all leaves are at the same level.

**Source:** _study.com_
##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q14: Why and when should I use Stack or Queue data structures instead of Arrays/Lists? ββ
##### Answer:
Because they help manage your data in more a _particular_ way than arrays and lists. It means that when you're debugging a problem, you won't have to wonder if someone randomly inserted an element into the middle of your list, messing up some invariants.
Arrays and lists are random access. They are very flexible and also easily *corruptible*. If you want to manage your data as FIFO or LIFO it's best to use those, already implemented, collections.
More practically you should:
* Use a queue when you want to get things out in the order that you put them in (FIFO)
* Use a stack when you want to get things out in the reverse order than you put them in (LIFO)
* Use a list when you want to get anything out, regardless of when you put them in (and when you don't want them to automatically be removed).
**Source:** _stackoverflow.com_
#### Q15: What is Complexity Analysis of Queue operations? ββ
##### Answer:
* Queues offer random access to their contents by shifting the first element off the front of the queue. You have to do this repeatedly to access an arbitrary element somewhere in the queue. Therefore, **access** is O(n).
* Searching for a given value in the queue requires iterating until you find it. So **search** is O(n).
* Inserting into a queue, by definition, can only happen at the back of the queue, similar to someone getting in line for a delicious Double-Double burger at In 'n Out. Assuming an efficient queue implementation, queue **insertion** is O(1).
* Deleting from a queue happens at the front of the queue. Assuming an efficient queue implementation, queue **deletion** is `O(1).
**Source:** _github.com_
#### Q16: What are some types of Queue? ββ
##### Answer:
Queue can be classified into following types:
* **Simple Queue** - is a linear data structure in which removal of elements is done in the same order they were inserted i.e., the element will be removed first which is inserted first.

* **Circular Queue** - is a linear data structure in which the operations are performed based on FIFO (First In First Out) principle and the last position is connected back to the first position to make a circle. It is also called **Ring Buffer**. Circular queue avoids the wastage of space in a regular queue implementation using arrays.

* **Priority Queue** - is a type of queue where each element has a priority value and the deletion of the elements is depended upon the priority value

* In case of **max-priority queue**, the element will be deleted first which has the largest priority value
* In case of **min-priority queue** the element will be deleted first which has the minimum priority value.
* **De-queue (Double ended queue)** - allows insertion and deletion from both the ends i.e. elements can be added or removed from rear as well as front end.

* **Input restricted deque** - In input restricted double ended queue, the insertion operation is performed at only one end and deletion operation is performed at both the ends.

* **Output restricted deque** - In output restricted double ended queue, the deletion operation is performed at only one end and insertion operation is performed at both the ends.

**Source:** _www.ques10.com_
#### Q17: What are some types of Linked List? ββ
##### Answer:
* A **singly linked list**

* A **doubly linked list** is a list that has two references, one to the next node and another to previous node.

* A **multiply linked list** - each node contains two or more link fields, each field being used to connect the same set of data records in a different order of same set(e.g., by name, by department, by date of birth, etc.).
* A **circular linked list** - where last node of the list points back to the first node (or the head) of the list.

**Source:** _www.cs.cmu.edu_
#### Q18: What are Dynamic Arrays? ββ
##### Answer:
A **dynamic array** is an array with a big improvement: _automatic resizing_.
One limitation of arrays is that they're _fixed_ size, meaning you need to specify the number of elements your array will hold ahead of time. A dynamic array expands as you add more elements. So you don't need to determine the size ahead of time.
**Source:** _www.interviewcake.com_
#### Q19: Return the N-th value of the Fibonacci sequence. Solve in _`O(n)`_ time ββ
##### Answer:
The easiest solution that comes to mind here is iteration:
```js
function fib(n){
let arr = [0, 1];
for (let i = 2; i < n + 1; i++){
arr.push(arr[i - 2] + arr[i -1])
}
return arr[n]
}
```
And output:
```
fib(4)
=> 3
```
Notice that two first numbers can not really be effectively generated by a for loop, because our loop will involve adding two numbers together, so instead of creating an empty array we assign our arr variable to `[0, 1]` that we know for a fact will always be there. After that we create a loop that starts iterating from i = 2 and adds numbers to the array until the length of the array is equal to `n + 1`. Finally, we return the number at n index of array.
**Source:** _medium.com_
##### Complexity Analysis:
**Time Complexity**: O(n)
**Space Complexity**: O(n)
An algorithm in our iterative solution takes linear time to complete the task. Basically we iterate through the loop n-2 times, so Big O (notation used to describe our worst case scenario) would be simply equal to O`(n)` in this case. The space complexity is `O(n)`.
##### Implementation:
##### _JS_
```js
function fib(n){
let arr = [0, 1]
for (let i = 2; i < n + 1; i++){
arr.push(arr[i - 2] + arr[i -1])
}
return arr[n]
}
```
##### _Java_
```java
double fibbonaci(int n){
double prev=0d, next=1d, result=0d;
for (int i = 0; i < n; i++) {
result=prev+next;
prev=next;
next=result;
}
return result;
}
```
##### _PY_
```py
def fib_iterative(n):
a, b = 0, 1
while n > 0:
a, b = b, a + b
n -= 1
return a
```
#### Q20: Name some disadvantages of Linked Lists? ββ
##### Answer:
Few disadvantages of linked lists are :
* They use more memory than arrays because of the storage used by their pointers.
* Difficulties arise in linked lists when it comes to reverse traversing. For instance, singly linked lists are cumbersome to navigate backwards and while doubly linked lists are somewhat easier to read, memory is wasted in allocating space for a back-pointer.
* Nodes in a linked list must be read in order from the beginning as linked lists are inherently sequential access.
* Random access has linear time.
* Nodes are stored incontiguously (no or poor cache locality), greatly increasing the time required to access individual elements within the list, especially with a CPU cache.
* If the link to list's node is accidentally destroyed then the chances of data loss after the destruction point is huge. Data recovery is not possible.
* Search is linear versus logarithmic for sorted arrays and binary search trees.
* Different amount of time is required to access each element.
* Not easy to sort the elements stored in the linear linked list.
**Source:** _www.quora.com_
#### Q21: Return the N-th value of the Fibonacci sequence Recursively ββ
##### Answer:
Recursive solution looks pretty simple (see code).
Letβs look at the diagram that will help you understand whatβs going on here with the rest of our code. Function fib is called with argument 5:

Basically our **fib** function will continue to recursively call itself creating more and more branches of the tree until it hits the base case, from which it will start summing up each branchβs return values bottom up, until it finally sums them all up and returns an integer equal to 5.
**Source:** _medium.com_
##### Complexity Analysis:
**Time Complexity**: O(2^n)
In case of recursion the solution take **exponential** time, that can be explained by the fact that the size of the tree exponentially grows when n increases. So for every additional element in the Fibonacci sequence we get an increase in function calls. Big O in this case is equal to O(2n). Exponential Time complexity denotes an algorithm whose growth doubles with each addition to the input data set.
##### Implementation:
##### _JS_
```js
function fib(n) {
if (n < 2){
return n
}
return fib(n - 1) + fib (n - 2)
}
```
##### _Java_
```java
public int fibonacci(int n) {
if (n < 2) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}
```
##### _PY_
```py
def F(n):
if n == 0: return 0
elif n == 1: return 1
else: return F(n-1)+F(n-2)
```
#### Q22: What is the space complexity of a Hash Table? ββ
##### Answer:
The space complexity of a datastructure indicates how much space it occupies in relation to the amount of elements it holds. For example a space complexity of `O(1)` would mean that the datastructure alway consumes constant space no matter how many elements you put in there. `O(n)` would mean that the space consumption grows linearly with the amount of elements in it.
A **hashtable** typically has a space complexity of `O(n)`.
**Source:** _stackoverflow.com_
#### Q23: What is Binary Heap? ββ
##### Answer:
A **Binary Heap** is a _Binary Tree_ with following properties:
* Itβs a _complete_ tree (all levels are completely filled except possibly the last level and the last level has all keys as left as possible). This property of Binary Heap makes them suitable to be stored in an array.
* A Binary Heap is either **Min Heap** or **Max Heap**. In a Min Binary Heap, the key at root must be minimum among all keys present in Binary Heap. The same property must be recursively true for all nodes in Binary Tree. Max Binary Heap is similar to MinHeap.
```js
10 10
/ \ / \
20 100 15 30
/ / \ / \
30 40 50 100 40
```
**Source:** _www.geeksforgeeks.org_
##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q24: What is Binary Search Tree? ββ
##### Answer:
**Binary search tree** is a data structure that quickly allows to maintain a _sorted list_ of numbers.
* It is called a _binary tree_ because each tree node has maximum of two children.
* It is called a _search tree_ because it can be used to search for the presence of a number in `O(log n)` time.
The properties that separates a binary search tree from a regular binary tree are:
* All nodes of left subtree are less than root node
* All nodes of right subtree are more than root node
* Both subtrees of each node are also BSTs i.e. they have the above two properties

**Source:** _www.programiz.com_
##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q25: What is the difference between Strings vs. Char arrays? ββ
##### Answer:
**Char arrays**:
* Static-sized
* Fast access
* Few built-in methods to manipulate strings
* A char array doesnβt define a data type
**Strings**:
* Slower access
* Define a data type
* Dynamic allocation
* More built-in functions to support string manipulations
**Source:** _dev.to_
##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q26: How to implement a _Tree_ data-structure? Provide some code. ββ
##### Answer:
That is a basic (generic) tree structure that can be used for `String` or any other object:
**Source:** _stackoverflow.com_
##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
##### Implementation:
##### _Java_
```java
public class Tree {
private Node root;
public Tree(T rootData) {
root = new Node();
root.data = rootData;
root.children = new ArrayList>();
}
public static class Node {
private T data;
private Node parent;
private List> children;
}
}
```
##### _PY_
Generic Tree:
```py
class Tree(object):
"Generic tree node."
def __init__(self, name='root', children=None):
self.name = name
self.children = []
if children is not None:
for child in children:
self.add_child(child)
def __repr__(self):
return self.name
def add_child(self, node):
assert isinstance(node, Tree)
self.children.append(node)
# *
# /|\
# 1 2 +
# / \
# 3 4
t = Tree('*', [Tree('1'),
Tree('2'),
Tree('+', [Tree('3'),
Tree('4')])])
```
Binary tree:
```py
class Tree:
def __init__(self):
self.left = None
self.right = None
self.data = None
```
#### Q27: Convert a _Singly Linked List_ to _Circular Linked List_ ββ
##### Answer:
To convert a singly linked list to a circular linked list, we will set the next pointer of the tail node to the head pointer.
* Create a copy of the head pointer, let's say `temp`.
* Using a loop, traverse linked list till tail node (last node) using temp pointer.
* Now set the next pointer of the tail node to head node. `temp->next = head`
**Source:** _www.techcrashcourse.com_
##### Implementation:
##### _PY_
```py
def convertTocircular(head):
# declare a node variable
# start and assign head
# node into start node.
start = head
# check that
while head.next
# not equal to null then head
# points to next node.
while(head.next is not None):
head = head.next
#
if head.next points to null
# then start assign to the
# head.next node.
head.next = start
return start
```
#### Q28: What's the difference between the data structure Tree and Graph? ββ
##### Answer:
**Graph:**
* Consists of a set of vertices (or nodes) and a set of edges connecting some or all of them
* Any edge can connect any two vertices that aren't already connected by an identical edge (in the same direction, in the case of a directed graph)
* Doesn't have to be connected (the edges don't have to connect all vertices together): a single graph can consist of a few disconnected sets of vertices
* Could be directed or undirected (which would apply to all edges in the graph)
**Tree:**
* A type of graph (fit with in the category of Directed Acyclic Graphs (or a DAG))
* Vertices are more commonly called "nodes"
* Edges are directed and represent an "is child of" (or "is parent of") relationship
* Each node (except the root node) has exactly one parent (and zero or more children)
* Has exactly one "root" node (if the tree has at least one node), which is a node without a parent
* Has to be connected
* Is acyclic, meaning it has no cycles: "a cycle is a path [AKA sequence] of edges and vertices wherein a vertex is reachable from itself"
* Trees aren't a recursive data structure

**Source:** _stackoverflow.com_
##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q29: Under what circumstances are Linked Lists useful? ββ
##### Answer:
Linked lists are very useful when you need :
* to do a lot of insertions and removals, but not too much searching, on a list of arbitrary (unknown at compile\-time) length.
* splitting and joining (bidirectionally\-linked) lists is very efficient.
* You can also combine linked lists \- e.g. tree structures can be implemented as "vertical" linked lists (parent/child relationships) connecting together horizontal linked lists (siblings).
Using an array based list for these purposes has severe limitations:
* Adding a new item means the array must be reallocated (or you must allocate more space than you need to allow for future growth and reduce the number of reallocations)
* Removing items leaves wasted space or requires a reallocation
* inserting items anywhere except the end involves (possibly reallocating and) copying lots of the data up one position
**Source:** _stackoverflow.com_
#### Q30: Implement _Pre-order Traversal_ of _Binary Tree_ using _Recursion_ ββ
##### Answer:
For traversing a (non-empty) binary tree in pre-order fashion, we must do these three things for every node `N` starting from root node of the tree:
* (N) Process `N` itself.
* (L) Recursively traverse its _left_ subtree. When this step is finished we are back at N again.
* (R) Recursively traverse its _right_ subtree. When this step is finished we are back at N again.

**Source:** _github.com_
##### Complexity Analysis:
**Time Complexity**: O(n)
**Space Complexity**: O(n)
##### Implementation:
##### _Java_
```java
// Recursive function to perform pre-order traversal of the tree
public static void preorder(TreeNode root)
{
// return if the current node is empty
if (root == null) {
return;
}
// Display the data part of the root (or current node)
System.out.print(root.data + " ");
// Traverse the left subtree
preorder(root.left);
// Traverse the right subtree
preorder(root.right);
}
```
##### _PY_
```py
# Recursive function to perform pre-order traversal of the tree
def preorder(root):
# return if the current node is empty
if root is None:
return
# Display the data part of the root (or current node)
print(root.data, end=' ')
# Traverse the left subtree
preorder(root.left)
# Traverse the right subtree
preorder(root.right)
```
#### Q31: What is an Associative Array? βββ
Read answer on π MLStack.Cafe
#### Q32: What does Sparse Array mean? βββ
Read answer on π MLStack.Cafe
#### Q33: How to merge two sorted _Arrays_ into a _Sorted Array_? βββ
Read answer on π MLStack.Cafe
#### Q34: Explain how _Heap Sort_ works βββ
Read answer on π MLStack.Cafe
#### Q35: What is complexity of Hash Table? βββ
Read answer on π MLStack.Cafe
#### Q36: LIS: Find length of the _longest increasing subsequence (LIS)_ in the array. Solve using DP. βββ
Read answer on π MLStack.Cafe
#### Q37: Compare Heaps vs Arrays to implement Priority Queue βββ
Read answer on π MLStack.Cafe
#### Q38: How to check if two Strings (words) are _Anagrams_? βββ
Read answer on π MLStack.Cafe
#### Q39: Name some application of Trie data structure βββ
Read answer on π MLStack.Cafe
#### Q40: Find all the _Permutations_ of a String βββ
Read answer on π MLStack.Cafe
#### Q41: What is AVL Tree? βββ
Read answer on π MLStack.Cafe
#### Q42: What is Balanced Tree and why is that important? βββ
Read answer on π MLStack.Cafe
#### Q43: Name some common types and categories of Graphs βββ
Read answer on π MLStack.Cafe
#### Q44: Convert a Binary Tree to a Doubly Linked List βββ
Read answer on π MLStack.Cafe
#### Q45: Can you do _Iterative Pre-order Traversal_ of a _Binary Tree_ without _Recursion_? βββ
Read answer on π MLStack.Cafe
#### Q46: Explain how _QuickSort_ works ββββ
Read answer on π MLStack.Cafe
#### Q47: Binet's formula: How to calculate Fibonacci numbers without Recursion or Iteration? ββββ
Read answer on π MLStack.Cafe
#### Q48: What are some main advantages of Tries over Hash Tables ββββ
Read answer on π MLStack.Cafe
#### Q49: How would you traverse a Linked List in O(n1/2)? ββββ
Read answer on π MLStack.Cafe
#### Q50: Explain what is _Fibonacci Search_ technique? ββββ
Read answer on π MLStack.Cafe
#### Q51: What are Pascal Strings? ββββ
Read answer on π MLStack.Cafe
#### Q52: When is doubly linked list more efficient than singly linked list? ββββ
Read answer on π MLStack.Cafe
#### Q53: What is Red-Black tree? ββββ
Read answer on π MLStack.Cafe
#### Q54: How To Choose Between a Hash Table and a Trie (Prefix Tree)? ββββ
Read answer on π MLStack.Cafe
#### Q55: How to implement 3 _Stacks_ with one _Array_? ββββ
Read answer on π MLStack.Cafe
#### Q56: Find the _length_ of a Linked List which contains _Cycle (Loop)_ ββββ
Read answer on π MLStack.Cafe
#### Q57: What is Rope Data Structure is used for? ββββ
Read answer on π MLStack.Cafe
#### Q58: Explain what is B-Tree? ββββ
Read answer on π MLStack.Cafe
#### Q59: What is Bipartite Graph? How to detect one? ββββ
Read answer on π MLStack.Cafe
#### Q60: Compare lookup operation in Trie vs Hash Table βββββ
Read answer on π MLStack.Cafe
#### Q61: How are B-Trees used in practice? βββββ
Read answer on π MLStack.Cafe
## [[β¬]](#toc) Databases Interview Questions
#### Q1: What is _Normalisation_? ββ
##### Answer:
**Normalization** is basically to design a database schema such that **duplicate and redundant data is avoided**. If the same information is repeated in multiple places in the database, there is the risk that it is updated in one place but not the other, leading to data corruption.
There is a number of normalization levels from 1. normal form through 5. normal form. Each normal form describes how to get rid of some specific problem.
By having a database with normalization errors, you open the risk of getting invalid or corrupt data into the database. Since data "lives forever" it is very hard to get rid of corrupt data when first it has entered the database.
**Source:** _stackoverflow.com_
#### Q2: What is the difference between _Data Definition Language (DDL)_ and _Data Manipulation Language (DML)_? ββ
##### Answer:
* **Data definition language (DDL)** commands are the commands which are used to define the database. **CREATE**, **ALTER**, **DROP** and **TRUNCATE** are some common DDL commands.
* **Data manipulation language (DML)** commands are commands which are used for manipulation or modification of data. **INSERT**, **UPDATE** and **DELETE** are some common DML commands.
**Source:** _en.wikibooks.org_
#### Q3: What are the advantages of NoSQL over traditional RDBMS? ββ
##### Answer:
**NoSQL is better** than RDBMS because of the following reasons/properities of NoSQL:
* It supports semi-structured data and volatile data
* It does not have schema
* Read/Write throughput is very high
* Horizontal **scalability** can be achieved easily
* Will support Bigdata in volumes of Terra Bytes & Peta Bytes
* Provides good support for Analytic tools on top of Bigdata
* Can be hosted in cheaper hardware machines
* In-memory caching option is available to increase the performance of queries
* Faster development life cycles for developers
Still, **RDBMS is better** than NoSQL for the following reasons/properties of RDBMS:
* Transactions with **ACID** properties - Atomicity, Consistency, Isolation & Durability
* Adherence to **Strong Schema** of data being written/read
* Real time query management ( in case of data size < 10 Tera bytes )
* Execution of complex queries involving **join** & **group by** clauses
**Source:** _stackoverflow.com_
#### Q4: Define ACID Properties βββ
Read answer on π MLStack.Cafe
#### Q5: How a database index can help performance? βββ
Read answer on π MLStack.Cafe
#### Q6: What is Denormalization? βββ
Read answer on π MLStack.Cafe
#### Q7: What are the difference between _Clustered_ and a _Non-clustered_ index? βββ
Read answer on π MLStack.Cafe
#### Q8: What's the difference between a _Primary Key_ and a _Unique Key_? βββ
Read answer on π MLStack.Cafe
#### Q9: When would you use NoSQL? βββ
Read answer on π MLStack.Cafe
#### Q10: When should I use a NoSQL database instead of a relational database? βββ
Read answer on π MLStack.Cafe
#### Q11: What is Optimistic locking? βββ
Read answer on π MLStack.Cafe
#### Q12: What Is ACID Property Of A System? βββ
Read answer on π MLStack.Cafe
#### Q13: What is the _cost_ of having a database _index_? ββββ
Read answer on π MLStack.Cafe
#### Q14: Explain the difference between _Exclusive Lock_ and _Update Lock_ ββββ
Read answer on π MLStack.Cafe
#### Q15: How does _B-trees Index_ work? ββββ
Read answer on π MLStack.Cafe
#### Q16: Explain eventual consistency in context of NoSQL ββββ
Read answer on π MLStack.Cafe
#### Q17: How do you track record relations in NoSQL? ββββ
Read answer on π MLStack.Cafe
#### Q18: What Is Sharding? ββββ
Read answer on π MLStack.Cafe
#### Q19: What Is BASE Property Of A System? ββββ
Read answer on π MLStack.Cafe
#### Q20: How do you off load work from the Database? ββββ
Read answer on π MLStack.Cafe
#### Q21: What are some _other_ types of Indexes (vs B-Trees)? βββββ
Read answer on π MLStack.Cafe
#### Q22: Name some disadvantages of a _Hash index_ βββββ
Read answer on π MLStack.Cafe
#### Q23: What is _Optimistic Locking_ and _Pessimistic Locking_? βββββ
Read answer on π MLStack.Cafe
#### Q24: How does database _Indexing_ work? βββββ
Read answer on π MLStack.Cafe
#### Q25: What is the difference between _B-Tree_, _R-Tree_ and _Hash_ indexing? βββββ
Read answer on π MLStack.Cafe
#### Q26: Explain the differences in conceptual data design with NoSQL databases? βββββ
Read answer on π MLStack.Cafe
#### Q27: What Does Eventually Consistent Mean? βββββ
Read answer on π MLStack.Cafe
#### Q28: Is the C in ACID is not the C in CAP? βββββ
Read answer on π MLStack.Cafe
#### Q29: How do you make schema changes to a live database without downtime? βββββ
Read answer on π MLStack.Cafe
#### Q30: Why you should never use GUIDs as part of clustered index? βββββ
Read answer on π MLStack.Cafe
## [[β¬]](#toc) Datasets Interview Questions
#### Q1: What's the difference between _Covariance_ and _Correlation_? ββ
##### Answer:
- **Covariance** measures whether a **variation** in one _variable_ results in a variation in _another variable_, and deals with the linear relationship of only `2` variables in the dataset. Its value can take range from `-β` to `+β`. Simply speaking **Covariance** indicates the direction of the linear relationship between variables.

- **Correlation** measures how strongly two or more variables are **related** to each other. Its values are between `-1` to `1`. **Correlation** measures both the strength and direction of the linear relationship between two variables. Correlation is a function of the covariance.
**Source:** _careerfoundry.com_
#### Q2: Would you use _K-NN_ for large datasets? ββ
##### Answer:
It's not recommended to perform **K-NN** on large datasets, given that the computational and memory cost can increase. To understand the reason why we should remember how the **K-NN** algorithm works:
1. Starts by calculating the distances to all vectors in a training set and store them.
2. Then, it sorts the calculated distances.
3. Then, we store the K nearest vectors.
4. And finally, calculate the most frequent class displayed by K nearest vectors.
So implement **K-NN** on a large dataset it is not only a bad decision to store a large amount of data but it is also computationally costly to keep calculating and sorting all the values. For that reason, **K-NN** is not recommended and another classification algorithm like _**Naive Bayes**_ or _**SVM**_ is preferred in such cases.
**Source:** _towardsdatascience.com_
#### Q3: What is *Cross-Validation* and why is it important in supervised learning? ββ
##### Answer:
* ***Cross-validation*** is a method of assessing _how the results of a statistical analysis will generalize on an independent dataset_,
* It can be used in machine learning tasks to _evaluate the predictive capability of the model_,
* It also helps us to _avoid overfitting and underfitting_,
* A common way to cross-validate is to divide the dataset into *training*, *validation*, and *testing* where:
* **Training dataset** is a dataset of known data on which the training is run.
* **Validation dataset** is the dataset that is *unknown* against which the model is tested. The validation dataset is used after each epoch of learning to gauge the improvement of the model.
* **Testing dataset** is also an unknown dataset that is used to test the model. The testing dataset is used to measure the performance of the model after it has finished learning.

**Source:** _en.wikipedia.org_
#### Q4: How does _K-fold Cross Validation_ work? βββ
Read answer on π MLStack.Cafe
#### Q5: What is the difference between _Test Set_ and _Validation Set_? βββ
Read answer on π MLStack.Cafe
#### Q6: What are the assumptions before applying the _OLS estimator_? βββ
Read answer on π MLStack.Cafe
#### Q7: What are the difference between _Type I_ and _Type II_ errors? βββ
Read answer on π MLStack.Cafe
#### Q8: What's the difference between _Bagging_ and _Boosting_ algorithms? βββ
Read answer on π MLStack.Cafe
#### Q9: What's the difference between _One-vs-Rest_ and _One-vs-One_? βββ
Read answer on π MLStack.Cafe
#### Q10: What are some _disadvantages_ of using Decision Trees and how would you solve them? βββ
Read answer on π MLStack.Cafe
#### Q11: Name some best practices for working with Datasets βββ
Read answer on π MLStack.Cafe
#### Q12: When you sample, what potential _Sampling Biases_ could you be inflicting? βββ
Read answer on π MLStack.Cafe
#### Q13: How would you determine the needed _Sample Size_? βββ
Read answer on π MLStack.Cafe
#### Q14: What are some variations of _Cross-Validation_? ββββ
Read answer on π MLStack.Cafe
#### Q15: Explain what is an _Unrepresentative Dataset_ and how would you diagnose it? ββββ
Read answer on π MLStack.Cafe
#### Q16: How would you detect _Heteroskedasticity_? ββββ
Read answer on π MLStack.Cafe
#### Q17: How would you address the problem of _Heteroskedasticity_ caused for a _Measurement error_? ββββ
Read answer on π MLStack.Cafe
#### Q18: How would you deal with _Outliers_ in your dataset? ββββ
Read answer on π MLStack.Cafe
#### Q19: How would you deal with an _Imbalanced Dataset_? ββββ
Read answer on π MLStack.Cafe
#### Q20: What's the difference between _Random Oversampling_ and _Random Undersampling_ and when they can be used? ββββ
Read answer on π MLStack.Cafe
#### Q21: How would you use a _Confusion Matrix_ for determining a model performance? ββββ
Read answer on π MLStack.Cafe
#### Q22: What is *Multidimensional Scaling*? ββββ
Read answer on π MLStack.Cafe
#### Q23: Is _mean imputation_ of missing data acceptable practice? Why or why not? ββββ
Read answer on π MLStack.Cafe
#### Q24: When would you use *_chi-Square_* or an *_ANOVA_* test? ββββ
Read answer on π MLStack.Cafe
#### Q25: How would you handle _Missing Data_ and perform _Data Imputation_? ββββ
Read answer on π MLStack.Cafe
#### Q26: Compare _ Causation_ vs _Correlation_ ββββ
Read answer on π MLStack.Cafe
#### Q27: Which measures of _Variability_ would you use on your data? ββββ
Read answer on π MLStack.Cafe
#### Q28: How does an ANOVA test work? βββββ
Read answer on π MLStack.Cafe
## [[β¬]](#toc) Decision Trees Interview Questions
#### Q1: What are *Decision Trees*? β
##### Answer:
* ***Decision trees*** is a tool that uses a *tree-like model* of decisions and their possible consequences. If an algorithm only contains *conditional control statements*, decision trees can model that algorithm really well.
* *Decision trees* are a *non-parametric*, _supervised_ learning method.
* *Decision trees* are used for *classification* and *regression* tasks.
* The diagram below shows an example of a decision tree (the dataset used is the Titanic dataset to predict whether a passenger survived or not):

**Source:** _towardsdatascience.com_
#### Q2: Explain the _structure_ of a Decision Tree ββ
##### Answer:
A ***decision tree*** is a ***flowchart-like*** structure in which:
* Each *internal node* represents the ***test*** on an attribute (e.g. outcome of a coin flip).
* Each *branch* represents the **_outcome_** of the test.
* Each *leaf node* represents a ***class label***.
* The _paths_ from the root to leaf represent the ***classification rules***.

**Source:** _en.wikipedia.org_
#### Q3: How are the different nodes of decision trees _represented_? ββ
##### Answer:
A **decision tree** consists of three **types** of nodes:
* **Decision nodes:** Represented by **squares.** It is a node where a flow branches into several optional branches.
* **Chance nodes:** Represented by **circles.** It represents the probability of certain results.
* **End nodes:** Represented by **triangles.** It shows the final outcome of the decision path.

**Source:** _en.wikipedia.org_
#### Q4: What are some _advantages_ of using Decision Trees? ββ
##### Answer:
* It is **simple to understand** and interpret. It can be **visualized** easily.
* It **does not require as much data preprocessing** as other methods.
* It can handle both **numerical** and **categorical** data.
* It can handle **multiple output** problems.
**Source:** _scikit-learn.org_
#### Q5: What type of node is considered *Pure*? ββ
##### Answer:
* If the *Gini Index* of the data is `0` then it means that all the elements **belong to a specific class**. When this happens it is said to be *pure*.
* When all of the data belongs to a single class (*pure*) then the *leaf node* is reached in the tree.
* The leaf node represents the *class label* in the tree (which means that it gives the final output).

**Source:** _medium.com_
#### Q6: How is a _Random Forest_ related to _Decision Trees_? ββ
##### Answer:
* ***Random forest*** is an ***ensemble learning*** method that works by constructing a multitude of ***decision trees***. A random forest can be constructed for both classification and regression tasks.
* Random forest **outperforms** decision trees, and it also does not have the habit of *overfitting* the data as decision trees do.
* A decision tree trained on a specific dataset will become very deep and cause overfitting. To create a random forest, decision trees can be trained on different subsets of the training dataset, and then the different decision trees can be averaged with the goal of _decreasing the variance_.
**Source:** _en.wikipedia.org_
#### Q7: What is the difference between *OOB* score and *validation* score? βββ
Read answer on π MLStack.Cafe
#### Q8: How would you deal with an _Overfitted Decision Tree_? βββ
Read answer on π MLStack.Cafe
#### Q9: What are some _disadvantages_ of using Decision Trees and how would you solve them? βββ
Read answer on π MLStack.Cafe
#### Q10: What is *Greedy Splitting*? βββ
Read answer on π MLStack.Cafe
#### Q11: What type of *Cost Functions* do *Greedy Splitting* use? βββ
Read answer on π MLStack.Cafe
#### Q12: How would you define the *Stopping Criteria* for decision trees? βββ
Read answer on π MLStack.Cafe
#### Q13: Why do you need to *Prune* the decision tree? βββ
Read answer on π MLStack.Cafe
#### Q14: What is *Entropy*? βββ
Read answer on π MLStack.Cafe
#### Q15: How do we _measure_ the Information? βββ
Read answer on π MLStack.Cafe
#### Q16: What is *Gini Index* and how is it used in Decision Trees? βββ
Read answer on π MLStack.Cafe
#### Q17: What is the *Chi-squared test*? βββ
Read answer on π MLStack.Cafe
#### Q18: How does the *CART* algorithm produce *Classification Trees*? βββ
Read answer on π MLStack.Cafe
#### Q19: How does the *CART* algorithm produce *Regression Trees*? βββ
Read answer on π MLStack.Cafe
#### Q20: What is the difference between *Post-pruning* and *Pre-pruning*? βββ
Read answer on π MLStack.Cafe
#### Q21: Compare *Linear Regression* and *Decision Trees* βββ
Read answer on π MLStack.Cafe
#### Q22: What is _Tree Bagging_? βββ
Read answer on π MLStack.Cafe
#### Q23: What is _Tree Boosting_? βββ
Read answer on π MLStack.Cafe
#### Q24: How to use _Isolation Forest_ for Anomalies detection? ββββ
Read answer on π MLStack.Cafe
#### Q25: Imagine that you know there are _outliers_ in your data, would you use _Logistic Regression_? ββββ
Read answer on π MLStack.Cafe
#### Q26: What is the use of *Entropy* pertaining to Decision Trees? ββββ
Read answer on π MLStack.Cafe
#### Q27: While building Decision Tree how do you choose which attribute to _split_ at each node? ββββ
Read answer on π MLStack.Cafe
#### Q28: What is difference between _Gini Impurity_ and _Entropy_ in Decision Tree? ββββ
Read answer on π MLStack.Cafe
#### Q29: When should I use _Gini Impurity_ as opposed to _Information Gain (Entropy)_? ββββ
Read answer on π MLStack.Cafe
#### Q30: Explain the *CHAID* algorithm ββββ
Read answer on π MLStack.Cafe
#### Q31: What are some disadvantages of the *CHAID* algorithm? ββββ
Read answer on π MLStack.Cafe
#### Q32: Explain how can *CART* algorithm performs _Pruning_? ββββ
Read answer on π MLStack.Cafe
#### Q33: Explain how *ID3* produces *classification trees*? ββββ
Read answer on π MLStack.Cafe
#### Q34: How would you compare different _Algorithms_ to build _Decision Trees_? ββββ
Read answer on π MLStack.Cafe
#### Q35: Compare *ID3* and *C4.5* algorithms ββββ
Read answer on π MLStack.Cafe
#### Q36: Compare *C4.5* and *C5.0* algorithms ββββ
Read answer on π MLStack.Cafe
#### Q37: What is the relationship between *Information Gain* and *Information Gain Ratio*? ββββ
Read answer on π MLStack.Cafe
#### Q38: How do you _Gradient Boost_ decision trees? ββββ
Read answer on π MLStack.Cafe
#### Q39: Compare *Decision Trees* and *Logistic Regression* ββββ
Read answer on π MLStack.Cafe
#### Q40: What are the differences between *Decision Trees* and *Neural Networks*? ββββ
Read answer on π MLStack.Cafe
#### Q41: Compare *Decision Trees* and *k-Nearest Neighbors* ββββ
Read answer on π