An open API service indexing awesome lists of open source software.

https://github.com/aershov24/machine-learning-ds-interview-questions

πŸ”΄ 1704 Machine Learning, Data Science & Python Interview Questions (ANSWERED) To Kill Your Next ML & DS Interview. Get All Answers + PDFs on MLStack.Cafe. Post your ML Jobs πŸ‘‰
https://github.com/aershov24/machine-learning-ds-interview-questions

algorithms-and-data-structures data-analysis data-science interview-practice interview-preparation interview-questions machine-learning machine-learning-algorithms machinelearning

Last synced: 10 months ago
JSON representation

πŸ”΄ 1704 Machine Learning, Data Science & Python Interview Questions (ANSWERED) To Kill Your Next ML & DS Interview. Get All Answers + PDFs on MLStack.Cafe. Post your ML Jobs πŸ‘‰

Awesome Lists containing this project

README

          

# 1704 πŸ€– Machine Learning, Data Science & Python Interview Questions (ANSWERED) To Land Your Next Six-Figure Job Offer from [MLStack.Cafe](https://www.mlstack.cafe)

[MLStack.Cafe](https://www.mlstack.cafe) is the biggest hand-picked collection of top Machine Learning, Data Science, Python and Coding interview questions for Junior and Experienced data analyst, machine learning engineers/developers and data scientists with more that 1704 ML & DS interview questions and answers. Prepare for your next ML, DS & Python interview and land 6-figure job offer in no time.

πŸ”΄ Get All 1704 Answers + PDFs + Latex Math on [MLStack.Cafe - Kill Your ML, DS & Python Interview](https://www.mlstack.cafe/?utm_source=github&utm_medium=mlsciq)

πŸ‘¨β€πŸ’» Hiring Data Analysts, Machine Learning Engineers or Developers? [Post your Job on MLStack.Cafe](https://www.mlstack.cafe/?utm_source=github&utm_medium=mlsc-job-posting) and reach thousands of motivated engineers who is looking for a ML Job right now!

---

## Table of Contents
* [Anomaly Detection](#AnomalyDetection)
* [Autoencoders](#Autoencoders)
* [Bias & Variance](#Bias&Variance)
* [Big Data](#BigData)
* [Big-O Notation](#Big-ONotation)
* [Classification](#Classification)
* [Clustering](#Clustering)
* [Cost Function](#CostFunction)
* [Data Structures](#DataStructures)
* [Databases](#Databases)
* [Datasets](#Datasets)
* [Decision Trees](#DecisionTrees)
* [Deep Learning](#DeepLearning)
* [Dimensionality Reduction](#DimensionalityReduction)
* [Ensemble Learning](#EnsembleLearning)
* [Genetic Algorithms](#GeneticAlgorithms)
* [Gradient Descent](#GradientDescent)
* [K-Means Clustering](#K-MeansClustering)
* [K-Nearest Neighbors](#K-NearestNeighbors)
* [Linear Algebra](#LinearAlgebra)
* [Linear Regression](#LinearRegression)
* [Logistic Regression](#LogisticRegression)
* [Machine Learning](#MachineLearning)
* [Model Evaluation](#ModelEvaluation)
* [Natural Language Processing](#NaturalLanguageProcessing)
* [NaΓ―ve Bayes](#NaΓ―veBayes)
* [Neural Networks](#NeuralNetworks)
* [NumPy](#NumPy)
* [Optimization](#Optimization)
* [Pandas](#Pandas)
* [Probability](#Probability)
* [Python](#Python)
* [Random Forests](#RandomForests)
* [SQL](#SQL)
* [SVM](#SVM)
* [Scikit-Learn](#Scikit-Learn)
* [Searching](#Searching)
* [Sorting](#Sorting)
* [Statistics](#Statistics)
* [Supervised Learning](#SupervisedLearning)
* [TensorFlow](#TensorFlow)
* [Unsupervised Learning](#UnsupervisedLearning)
## [[⬆]](#toc) Anomaly Detection Interview Questions
#### Q1: Explain what is Anomaly Detection? ⭐
##### Answer:
**Anomaly detection** (or outlier detection) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.

![](https://miro.medium.com/max/732/1*J-ds9KYQBheiBaIJn78seg.gif)

**Source:** _towardsdatascience.com_

#### Q2: Why do we care about Anomalies? ⭐⭐
##### Answer:
* The goal of anomaly detection is to identify cases that are unusual within data that is seemingly comparable hence anomaly detection can be used effectively as a tool for risk mitigation and fraud detection.
* When preparing datasets for machine learning models, it is really important to detect all the outliers and either get rid of them or analyze them to know why you had them there in the first place.

**Source:** _towardsdatascience.com_

#### Q3: What's the difference between _Normalisation_ and _Standardisation_? ⭐⭐
##### Answer:
**Normalization** rescales the values into a range of \[0,1\]. This might be useful in some cases where all parameters need to have the same positive scale. However, the _outliers_ from the data set _are lost._

$$
X_{changed} = \frac{X - X_{min}}{X_{max}-X_{min}}
$$

**Standardization** rescales data to have a mean ($\mu$) of 0 and standard deviation ($\sigma$) of 1 (unit variance).

$$
X_{changed} = \frac{X - \mu}{\sigma}
$$

For most applications standardization is recommended.

![](https://i.stack.imgur.com/WqU1U.png)

**Source:** _stats.stackexchange.com_

#### Q4: Why would you use the _Median_ as a measure of central tendency? ⭐⭐
##### Answer:
The **Median** is the most suitable measure of _central tendency_ for **skewed distributions** or distributions with **outliers**. For example, the median is often used as a measure of central tendency for income distributions, which are generally highly skewed.

Because the median only uses one or two values, it’s unaffected by extreme _outliers_ or _non-symmetric distributions_ of scores. In contrast, the **_mean_** and **_mode_** can vary in skewed distributions.

![https://miro.medium.com/max/754/0*wHMvuwRa_YF9SFwY.png](https://miro.medium.com/max/754/0*wHMvuwRa_YF9SFwY.png)

**Source:** _en.wikipedia.org_

#### Q5: Explain how to use _Standard Deviation_ for Anomalies Detection? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q6: What Are some _types_ of Anomalies? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q7: What are some _categories_ of outlier detection approaches? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q8: How to use _one-class SVM_ for Anomalies Detections? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q9: Explain the difference between _Outlier Detection_ vs _Novelty Detection_ ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q10: Compare *SVM* and *Logistic Regression* in handling outliers ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q11: How to use _Isolation Forest_ for Anomalies detection? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q12: What are some _advantages_ of using _Isolation Forest_ algorithm for outliers detection? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q13: How would you deal with _Outliers_ in your dataset? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q14: Imagine that you know there are _outliers_ in your data, would you use _Logistic Regression_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q15: How is *PCA* used for *Anomaly Detection*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q16: How does *Dictionary Learning* perform *Anomaly Detection*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q17: What types of _Robust Regression Algorithms_ do you know? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

## [[⬆]](#toc) Autoencoders Interview Questions
#### Q1: Describe the approach used in *Denoising Autoencoders* ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q2: How can *Neural Networks* be used to create *Autoencoders*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q3: Can you use *Batch Normalisation* in *Sparse Auto-encoders*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q4: What are the main differences between *Sparse Autoencoders* and *Convolution Autoencoders*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q5: What are some differences between the *Undercomplete Autoencoder* and the *Sparse Autoencoder*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q6: How can *Neural Networks* be _Unsupervised_?
Read answer on πŸ‘‰ MLStack.Cafe

## [[⬆]](#toc) Bias & Variance Interview Questions
#### Q1: What is _Bias_ in Machine Learning? ⭐⭐
##### Answer:
In supervised machine learning an algorithm learns a model from training data.

The goal of any supervised machine learning algorithm is to best estimate the mapping function (f) for the output variable (Y) given the input data (X). The mapping function is often called the target function because it is the function that a given supervised machine learning algorithm aims to approximate.

**Bias** are **the simplifying assumptions** made by a model to make the target function easier to learn.

Generally, linear algorithms have a high bias making them fast to learn and easier to understand but generally less flexible.

* Examples of **low\-bias** machine learning algorithmsΒ include:Β Decision Trees,Β k\-Nearest Neighbors andΒ [Support Vector Machines](https://machinelearningmastery.com/support-vector-machines-for-machine-learning/).

* Examples of **high\-bias** machine learning algorithms include:Β Linear Regression,Β Linear Discriminant Analysis andΒ Logistic Regression.

**Source:** _machinelearningmastery.com_

#### Q2: What is the *Bias-Variance* tradeoff? ⭐⭐
##### Answer:
* **High Bias** can cause an algorithm to miss the relevant relations between features and target outputs (*underfitting*).
* **High Variance** may result from an algorithm modeling random noise in the training data (*overfitting*).
![https://community.alteryx.com/t5/image/serverpage/image-id/52874iE986B6E19F3248CF?v=v2](https://community.alteryx.com/t5/image/serverpage/image-id/52874iE986B6E19F3248CF?v=v2)

* The **Bias-Variance tradeoff** is a central problem in _supervised learning_. Ideally, a model should be able to accurately capture the regularities in its training data, but also generalize well to unseen data.
* It is called a *tradeoff* because it is typically impossible to do both simultaneously:
* Algorithms with _high variance_ will be prone to _overfitting_ the dataset, but
* Algorithms with *high bias* will _underfit_ the dataset.

![bias_variance_tradeoff](https://miro.medium.com/max/883/1*8sV6Sr9uc0Ef39YBivLzrw.jpeg)

**Source:** _en.wikipedia.org_

#### Q3: Provide an intuitive explanation of the _Bias-Variance Tradeoff_ ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q4: Name some types of _Data Biases_ in Machine Learning? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q5: What to do if you have _High Variance Problem_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q6: What to do if you have _High Bias Problem_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q7: What's the difference between _Bagging_ and _Boosting_ algorithms? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q8: How can you relate the _KNN Algorithm_ to the _Bias-Variance tradeoff_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q9: What is the *Bias Error*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q10: What is the *Variance Error*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q11: When you sample, what potential _Sampling Biases_ could you be inflicting? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

## [[⬆]](#toc) Big Data Interview Questions
## [[⬆]](#toc) Big-O Notation Interview Questions
#### Q1: What is _Big O_ notation? ⭐
##### Answer:
**Big-O** notation (also called "asymptotic growth" notation) is a relative representation of the complexity of an algorithm. It shows how an algorithm *scales* based on input size. We use it to talk about how thing _scale_. Big O complexity can be visualized with this graph:

![](https://i.stack.imgur.com/WcBRI.png)

**Source:** _stackoverflow.com_

#### Q2: Provide an example of O(1) algorithm ⭐
##### Answer:
Say we have an array of `n` elements:

```cs
int array[n];
```

If we wanted to access the first (or any) element of the array this would be O(1) since it doesn't matter how big the array is, it always takes the same constant time to get the first item:
```cs
x = array[0];
```

**Source:** _stackoverflow.com_

#### Q3: What is Worst Case? ⭐⭐
##### Answer:
Big-O is often used to make statements about functions that measure the worst case behavior of an algorithm. **Worst case** analysis gives the maximum number of basic operations that have to be performed during execution of the algorithm. It assumes that the input is in the _worst possible state_ and maximum work has to be done to put things right.

**Source:** _stackoverflow.com_

#### Q4: What the heck does it mean if an operation is O(log n)? ⭐⭐
##### Answer:
**O(log n)** means for every element, you're doing something that only needs to look at **log N** of the elements. This is usually because you know something about the elements that let you make an _efficient choice_ (for example to reduce a _search space_).
The most common attributes of logarithmic running\-time function are that:
* the choice of the next element on which to perform some action is one of several possibilities, and
* only one will need to be chosen

or

* the elements on which the action is performed are digits of `n`

Most efficient sorts are an example of this, such as **merge sort**. ​It is `O(log n)` when we do divide and conquer type of algorithms e.g binary search. Another example is **quick sort** where each time we divide the array into two parts and each time it takes `O(N)` time to find a pivot element. Hence it `N O(log N)`

Plotting `log(n)` on a plain piece of paper, will result in a graph where the rise of the curve decelerates as `n` increases:
![](https://i.stack.imgur.com/qPNNp.png)

**Source:** _stackoverflow.com_

#### Q5: Why do we use Big O notation to compare algorithms? ⭐⭐
##### Answer:
The fact is it's difficult to determine the exact runtime of an algorithm. It depends on the speed of the computer processor. So instead of talking about the runtime directly, we use Big O Notation to talk about _how quickly the runtime grows_ depending on input size.

With Big O Notation, we use the size of the input, which we call `n`. So we can say things like the runtime grows β€œon the order of the size of the input” (O(n)) or β€œon the order of the square of the size of the input” (O(n2)). Our algorithm may have steps that seem expensive when `n` is small but are eclipsed eventually by other steps as `n` gets larger. For Big O Notation analysis, we care more about the stuff that grows fastest as the input grows, because everything else is quickly eclipsed as `n` gets very large.

**Source:** _medium.com_

#### Q6: What exactly would an O(n2) operation do? ⭐⭐
##### Answer:
**O(n2)** means for every element, you're doing something with _every_ other element, such as comparing them. Bubble sort is an example of this.

**Source:** _stackoverflow.com_

#### Q7: What is complexity of this code snippet? ⭐⭐
##### Details:
Let's say we wanted to find a number in the list:
```js
for (int i = 0; i < n; i++){
if(array[i] == numToFind){ return i; }
}
```
What will be the time complexity (Big O) of that code snippet?

##### Answer:
This would be O(n) since at most we would have to look through the entire list to find our number. The Big-O is still O(n) even though we might find our number the first try and run through the loop once because Big-O describes the upper bound for an algorithm.

**Source:** _stackoverflow.com_

#### Q8: What is complexity of `push` and `pop` for a Stack implemented using a LinkedList? ⭐⭐
##### Answer:
O(1). Note, you don't have to insert at the end of the list. If you insert at the front of a (singly-linked) list, they are both `O(1)`.

Stack contains 1,2,3:

```py
[1]->[2]->[3]
```

Push 5:

```js
[5]->[1]->[2]->[3]
```

Pop:

```js
[1]->[2]->[3] // returning 5
```

**Source:** _stackoverflow.com_

#### Q9: Explain the difference between _`O(1)`_ vs _`O(n)`_ space complexities ⭐⭐
##### Answer:
Let's consider a traversal algorithm for traversing a list.

* O(1) denotes _constant_ space use: the algorithm allocates the same number of pointers irrespective to the list size. That will happen if we move (reuse) our pointer along the list.
* In contrast, O(n) denotes _linear_ space use: the algorithm space use grows together with respect to the input size `n`. That will happen if let's say for some reason the algorithm needs to allocate 'N' pointers (or other variables) when traversing a list.

**Source:** _stackoverflow.com_

#### Q10: What is the big O notation of this function? ⭐⭐
##### Details:
Consider:
```js
f(x) = log n + 3n
```
What is the big O notation of this function?

##### Answer:
It is simply O(n).

When you have a composite of multiple parts in Big O notation which are added, you have to choose the biggest one. In this case it is _`O(3n)`_, but there is no need to include constants inside parentheses, so we are left with _`O(n)`_.

**Source:** _stackoverflow.com_

#### Q11: What is an algorithm? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q12: What is complexity of this code snippet? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q13: What is the time complexity for "Hello, World" function? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q14: What is meant by "Constant Amortized Time" when talking about time complexity of an algorithm? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q15: Why do we use Big O instead of Big Theta (Θ)? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q16: Name some types of Big O complexity and corresponding algorithms ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q17: What is complexity of "Reading a Book"? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q18: Explain your understanding of "Space Complexity" with examples ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q19: What is the difference between Lower bound and Tight bound? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q20: What does it mean if an operation is O(n!)? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q21: Provide an example of algorithm with time complexity of O(ck)? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q22: What are some algorithms which we use daily that has _`O(1)`_, _`O(n log n)`_ and _`O(log n)`_ complexities? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

## [[⬆]](#toc) Classification Interview Questions
#### Q1: Why Naive Bayes is called _Naive_? ⭐⭐
##### Answer:
We call it **naive** because its assumptions (it assumes that all of the features in the dataset are equally important and independent) are really optimistic and rarely true in most real-world applications:
- we consider that these _predictors_ are _independent_
- we consider that all the predictors have an _equal effect_ on the outcome (like the day being windy does not have more importance in deciding to play golf or not)

**Source:** _towardsdatascience.com_

#### Q2: What is a *Perceptron*? ⭐⭐
##### Answer:
* A **Perceptron** is a fundamental unit of a Neural Network that is also a single-layer Neural Network.
* Perceptron is a linear _classifier_. Since it uses already labeled data points, it is a *supervised learning algorithm*.
* The _activation function_ applies a step rule (convert the numerical output into +1 or -1) to check if the output of the weighting function is greater than zero or not.

A **Perceptron** is shown in the figure below:

![perception](https://www.programmersought.com/images/258/633dca25508a646a6df343339c3d4eaa.png)

**Source:** _towardsdatascience.com_

#### Q3: What is a _Decision Boundary_? ⭐⭐
##### Answer:
A **decision boundary** is a line or a hyperplane that separates the classes. This is what we expect to obtain from _logistic regression_, as with any other classifier. With this, we can figure out some way to split the data to allow for an accurate prediction of a given observation’s class using the available information.

In the case of a generic two-dimensional example, the split might look something like this:

![](https://miro.medium.com/max/340/0*hyR0M6QP5OvedMkd.png)

**Source:** _medium.com_

#### Q4: What types of _Classification Algorithms_ do you know? ⭐⭐
##### Answer:
- **Logistic regression**: ideally used for classification of _binary_ variables. Implements the _sigmoid function_ to calculate the probability that a data point belongs to a certain class.

- **K-Nearest Neighbours (kNN)**: calculate the distance of one data point from every other data point and then takes a majority vote from _k-nearest neighbors_ of each data points to classify the output.

- **Decision trees**: use multiple _if-else statements_ in the form of a tree structure that includes _nodes_ and _leaves_. The nodes breaking down the one major structure into smaller structures and eventually providing the final outcome.

- **Random Forest**: uses multiple _decision trees_ to predict the outcome of the target variable. Each decision tree provides its own outcome and then it takes the majority vote to classify the final outcome.

- **Support Vector Machines**: it creates an _n-dimensional space_ for the _n number of features_ in the dataset and then tries to create the hyperplanes such that it divides and classifies the data points with the maximum margin possible.

**Source:** _www.upgrad.com_

#### Q5: What is the difference between _KNN_ and _K-means Clustering_? ⭐⭐
##### Answer:
- **_K-nearest neighbors_** or _KNN_ is a _supervised classification algorithm_. This means that we need labeled data to classify an unlabeled data point. It attempts to classify a data point based on its proximity to other `K`-data points in the feature space.

- **_K-means Clustering_** is an _unsupervised classification algorithm_. It requires only a set of unlabeled points and a threshold `K`, so it gathers and groups data into `K` number of clusters.

**Source:** _www.quora.com_

#### Q6: How do you choose the optimal _k_ in _k-NN_? ⭐⭐
##### Answer:
There is not a rule of thumb to choose a standard optimal **_k_**. This value depends and varies from dataset to dataset, but as a general rule, the main goal is to keep it:
- small enough to exclude the samples of the other classes but
- large enough to minimize any noise in the data.

A way to looking for this optimal parameter, commonly called the _Elbow method_, consist in creating a _for loop_ that trains various **_KNN_** models with different **_k values_**, keeping track of the error for each of these models, and use the model with the **_k value_** that achieves the best accuracy.

![https://i.stack.imgur.com/ct2ie.jpg](https://i.stack.imgur.com/ct2ie.jpg)

**Source:** _medium.com_

#### Q7: How would you make a prediction using a _Logistic Regression_ model? ⭐⭐
##### Answer:
In **Logistic regression** models, we are modeling the _probability_ that an input `(X)` belongs to the default class `(Y=1)`, that is to say:

$$
P(X) = P(Y=1|X)
$$

where the `P(X)` values are given by the **_logistic function_**,

$$
P(X) = \frac{e^{\beta_0 + \beta_1X}}{1 + e^{\beta_0 + \beta_1X}}
$$

The `Ξ²0` and `Ξ²1` values are estimated during the training stage using _maximum-likelihood_ estimation or _gradient descent_. Once we have it, we can make predictions by simply putting numbers into the _logistic regression equation_ and calculating a result.

For example, let's consider that we have a model that can predict whether a person is male or female based on their height, such as if `P(X) β‰₯ 0.5` the person is male, and if `P(X) < 0.5` then is female.

During the training stage we obtain `Ξ²0 = -100` and `Ξ²1 = 0.6`, and we want to evaluate what's the probability that a person with a height of `150cm` is male, so with that intention we compute:

$$
y = \frac{e^{-100 + 0.6\cdot 150}}{1 + e^{-100 + 0.6\cdot 150}} = 0.00004539 \cdots
$$

Given that logistic regression solves a _classification_ task, we can use directly this value to predict that the person is a female.

**Source:** _machinelearningmastery.com_

#### Q8: Why would you use the _Kernel Trick_? ⭐⭐
##### Answer:
When it comes to **classification** problems, the goal is to establish a decision boundary that maximizes the margin between the classes. However, in the real world, this task can become difficult when we have to treat with **non-linearly separable data**. One approach to solve this problem is to perform a data transformation process, in which we map all the data points to a **higher dimension** find the boundary and make the classification.

That sounds alright, however, when there are more and more dimensions, computations within that space become more and more expensive. In such cases, the **kernel trick allows us to operate in the original feature space without computing the coordinates of the data** in a higher-dimensional space and therefore offers a more efficient and less expensive way to transform data into higher dimensions.

There exist different kernel functions, such as:
- _linear_,
- _nonlinear_,
- _polynomial_,
- _radial basis function (RBF)_, and
- _sigmoid_.

Each one of them can be suitable for a particular problem depending on the data.

**Source:** _medium.com_

#### Q9: What is the *Hinge Loss* in SVM? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q10: Name some _classification metrics_ and when would you use each one ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q11: What is the difference between a _Weak Learner_ vs a _Strong Learner_ and why they could be usefu? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q12: What's the difference between _Bagging_ and _Boosting_ algorithms? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q13: Provide an intuitive explanation of _Linear Support Vector Machines (SVMs)_ ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q14: Could you _convert_ Regression into Classification and vice versa? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q15: What's the difference between _One-vs-Rest_ and _One-vs-One_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q16: Can you choose a _classifier_ based on the _size of the training set_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q17: How would you use _Naive Bayes_ classifier for categorical features? What if some features are numerical? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q18: What's the difference between _Generative Classifiers_ and _Discriminative Classifiers_? Name some examples of each one ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q19: How does the _Naive Bayes_ classifier work? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q20: How does the _AdaBoost_ algorithm work? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q21: What's the difference between _Softmax_ and _Sigmoid_ functions? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q22: How do you use a supervised *Logistic Regression* for Classification? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q23: What is a *Confusion Matrix*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q24: How does *ROC* curve and *AUC* value help measure how good a model is? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q25: What are some advantages and disadvantages of using *AUC* to measure the _performance_ of the model? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q26: What is the *F-Score*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q27: How is _AUC - ROC_ curve used in classification problems? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q28: Name some advantages of using _Support Vector Machines_ vs _Logistic Regression_ for classification ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q29: When would you use _SVM_ vs _Logistic regression_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q30: Are there any problems using _Naive Bayes_ for Classification? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q31: What's the difference between _Random Oversampling_ and _Random Undersampling_ and when they can be used? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q32: How would you use a _Confusion Matrix_ for determining a model performance? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q33: How would you deal with classification on _Non-linearly Separable_ data? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q34: What are the trade-offs between the different types of _Classification Algorithms_? How would do you choose the best one? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q35: Compare _Naive Bayes_ vs with _Logistic Regression_ to solve classification problems ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q36: How would you _Calibrate Probabilities_ for a classification model? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q37: How would you choose an evaluation metric for an _Imbalanced classification_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q38: What is *AIC*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q39: Can _Logistic Regression_ be used for an _Imbalanced Classification_ problem? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q40: Why would you use _Probability Calibration_? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q41: What's the difference between _ROC_ and _Precision-Recall_ Curves? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q42: How to interpret _F-measure_ values? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

## [[⬆]](#toc) Clustering Interview Questions
#### Q1: Define what is *Clustering*? ⭐
##### Answer:
* **Cluster analysis** is also called **clustering**.
* It is the task of grouping a set of objects in such a way that *objects* in the same *cluster* are *more similar* to each other than to those in other clusters.
* Cluster analysis itself is *not* one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them.

![clustering](https://media.geeksforgeeks.org/wp-content/uploads/merge3cluster.jpg)

**Source:** _Handbook of Cluster Analysis from Chapman and Hall/CRC_

#### Q2: What is *Similarity-based Clustering*? ⭐⭐
##### Answer:
* Clustering, when the data are similar pairs of points is called **similarity-based clustering**.
* A typical example of similarity-based clustering is community detection in social networks, where the observations are individual links between people, which may be due to friendship, shared interests, and work relationships. The *strength* of a link can be the frequency of interactions, for example, communications by e-mail, phone, or other social media, co-authorships, or citations.
* In this clustering paradigm, the points to be clustered are not assumed to be part of a vector space. Their attributes (or features) are incorporated into a single dimension, the *link strength*, or *similarity*, which takes a numerical value $$S_{ij}$$ for each pair of points `i`, `j`. Hence, the natural representation for this problem is by means of the similarity matrix given below:
$$
S=[S_{ij}]_{i,j=1}^n
$$
The similarities are symmetric $$S_{ij} = S_{ji}$$ and nonnegative $$S_{ij} \geq 0$$.

**Source:** _Handbook of Cluster Analysis from Chapman and Hall/CRC_

#### Q3: Give examples of using *Clustering* to solve real-life problems ⭐⭐
##### Answer:
* **Identifying cancerous data:** Initially we take known samples of a cancerous and non-cancerous dataset, and label both the samples dataset. Then both the samples are mixed and different clustering algorithms are applied to the mixed samples dataset. It has been found through experiments that a cancerous dataset gives the best results with unsupervised non-linear clustering algorithms.
* **Search engines:** Search engines try to group similar objects in one cluster and the dissimilar objects far from each other. It provides results for the searched data according to the nearest similar object which is clustered around the data to be searched.
* **Wireless sensor network's based application:** Clustering algorithm can be used effectively in *Wireless Sensor Network's based application*. One application where it can be used is in *Landmine detection*. The clustering algorithm plays the role of finding the Cluster heads (or cluster center) which collects all the data in its respective cluster.

**Source:** _sites.google.com_

#### Q4: What is *Mean-Shift Clustering*? ⭐⭐
##### Answer:
* **Mean Shift** is a non-parametric feature-space analysis technique for locating the maxima of a *density function*. What we're trying to achieve here is, to keep shifting the window to a region of _higher density_.

![https://iq.opengenus.org/content/images/2019/02/pdf.png](https://iq.opengenus.org/content/images/2019/02/pdf.png)

* We can understand this algorithm by thinking of our data points to be represented as a probability density function. Naturally, in a probability function, higher density regions will correspond to the regions with more points, and lower density regions will correspond to the regions with less points.
In clustering, we need to find clusters of points, i.e the regions with a lot of points together. More points together mean higher density. Hence, we observe that clusters of points are more like the higher density regions in our probability density function.

So, we must iteratively go from lower density to higher density regions, in order to find our clusters.

* The mean shift method is an iterative method, and we start with an initial estimate `x`. Let a *kernel function* $$K(x_i - x)$$ be given. This function determines the weight of nearby points for re-estimation of the mean. Typically a *Gaussian kernel* on the distance to the current estimate is used,
$$
K(x_i-x)= e^{-c|x_i-x|^2}
$$
The weighted mean of the density in the window determined by `K` is
$$
m(x) = \frac{\sum_{x_i \in N(x)} K(x_i - x) x_i}{\sum_{x_i \in N(x) K(x_i - x)}}
$$
where `N(x)` is the neighborhood of `x`, a set of points for which $$K(x_i) \neq 0$$.

* The difference `m(x) - x` is called *mean shift*. The *mean-shift algorithm* now sets $$m(x) \to x$$, and repeats the estimation until `m(x)` converges. It means, after a sufficient number of steps, the position of the centroid of all the points, and the current location of the window will coincide. This is when we reach convergence, as no new points are added to our window in this step.

**Source:** _en.wikipedia.org_

#### Q5: What are *Self-Organizing Maps*? ⭐⭐
##### Answer:
* **Self-Organizing Maps** (**SOMs**) are a class of *self-organizing* clustering techniques.
* It is an _unsupervised form of artificial neural networks_. A self-organizing map consists of a set of neurons that are arranged in a rectangular or hexagonal grid. Each neuronal unit in the grid is associated with a numerical vector of fixed dimensionality. The learning process of a self-organizing map involves the adjustment of these vectors to provide a suitable representation of the input data.
* Self-organizing maps can be used for clustering numerical data in vector format.

![som](https://www.researchgate.net/profile/Mohamed-Zair/publication/329337931/figure/fig1/AS:915377227849728@1595254346086/Structural-model-of-self-organizing-map-neural-network-Figure-2-Experimental-benchmark.png)

**Source:** _medium.com_

#### Q6: Why do you need to perform *Significance Testing* in *Clustering*? ⭐⭐
##### Answer:
* **Significance testing** addresses an important aspect of cluster validation. Many cluster analysis methods will deliver clusterings even for homogeneous data. They assume implicitly that clustering has to be found, regardless of whether this is meaningful or not.

>A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation.

* **Significance testing** is performed to distinguish between a clustering that reflects meaningful _heterogeneity_ in the data and an artificial clustering of _homogeneous_ data.
* Significance testing is also used for more specific tasks in cluster analysis, such as; estimating the number of clusters, and for interpreting some or all of the individual clusters, to show the significance of the individual clusters.

**Source:** _www.ncbi.nlm.nih.gov_

#### Q7: What is the difference between a _Multiclass problem_ and a _Multilabel problem_? ⭐⭐
##### Answer:
**Multiclass classification** means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is _assigned to one and only one label_: a fruit can be either an apple or a pear but not both at the same time.

**Multilabel classification** assigns to each sample a set of target labels. This can be thought of as predicting properties of a data-point that are _not mutually exclusive_, such as topics that are relevant for a document. A text might be about any of religion, politics, finance or education at the same time or none of these.

![https://i.stack.imgur.com/XghaO.png](https://i.stack.imgur.com/XghaO.png)

**Source:** _stats.stackexchange.com_

#### Q8: What is the _Jaccard Index_? ⭐⭐
##### Answer:
The **Jaccard index**, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. The Jaccard coefficient measures **similarity** between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:

![https://wikimedia.org/api/rest_v1/media/math/render/svg/eaef5aa86949f49e7dc6b9c8c3dd8b233332c9e7](https://wikimedia.org/api/rest_v1/media/math/render/svg/eaef5aa86949f49e7dc6b9c8c3dd8b233332c9e7)

![https://upload.wikimedia.org/wikipedia/commons/c/c7/Intersection_over_Union_-_visual_equation.png](https://upload.wikimedia.org/wikipedia/commons/c/c7/Intersection_over_Union_-_visual_equation.png)

![](https://upload.wikimedia.org/wikipedia/commons/e/e6/Intersection_over_Union_-_poor%2C_good_and_excellent_score.png)

**Source:** _en.wikipedia.org_

#### Q9: What is the difference between the two types of *Hierarchical Clustering*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q10: While performing *K-Means* Clustering, how do you determine the value of *K*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q11: What are some different types of *Clustering Structures* that are used in *Clustering Algorithms*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q12: When would you use *Hierarchical Clustering* over *Spectral Clustering*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q13: Compare *Hierarchical Clustering* and *k-Means Clustering* ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q14: Where do the *Similarities* come from in *Similarity-based Clustering*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q15: What is a *Mixture Model*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q16: What is the *Mixture* in *Gaussian Mixture Model*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q17: What is *Latent Class Model*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q18: How would you perform an *Observation-Based Clustering* for *Time-Series Data*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q19: Name some pros and cons of _Mean Shift Clustering_ ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q20: How can *Evolutionary Algorithms* be used for *Clustering*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q21: What is _Silhouette Analysis_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q22: Why does *K-Means* have a higher *bias* when compared to *Gaussian Mixture Model*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q23: Explain how a cluster is formed in the *DBSCAN* Clustering Algorithm ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q24: What makes the distance measurement of *k-Medoids* better than *k-Means*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q25: When using various Clustering Algorithms, why is *Euclidean Distance* not a good metric in _High Dimensions_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q26: When would you use *Hierarchical Clustering* over *k-Means Clustering*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q27: How would you choose the number of *Clusters* when designing a *K-Medoid Clustering Algorithm*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q28: Explain the *Dirichlet Process Gaussian Mixture Model* ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q29: Why is *Euclidean Distance* not good for *Sparse Data*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q30: When would you use *Segmentation* over *Clustering*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q31: How to tell if data is _clustered_ enough for clustering algorithms to produce meaningful results? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q32: How to choose among the various clustering _Distance Measures_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q33: Explain the different frameworks used for *k-Means Clustering* ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q34: What is the motivation behind the *Expectation-Maximization Algorithm*? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q35: What is the relationship between *k-Means Clustering* and *PCA*? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

## [[⬆]](#toc) Cost Function Interview Questions
#### Q1: Provide an analogy for a _Cost Function_ in real life ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q2: Explain what is _Cost (Loss) Function_ in Machine Learning? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q3: What is the difference between _Cost Function_ vs _Gradient Descent_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q4: What is the difference between _Objective function_, _Cost function_ and _Loss function_ ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q5: Why don’t we use _Mean Squared Error_ as a cost function in Logistic Regression? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q6: How would you fix Logistic Regression _Overfitting_ problem? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q7: What is the *Hinge Loss* in SVM? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q8: What type of *Cost Functions* do *Greedy Splitting* use? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q9: How would you choose the *Loss Function* for a Deep Learning model? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

## [[⬆]](#toc) Data Structures Interview Questions
#### Q1: Define Stack ⭐
##### Answer:
A **Stack** is a container of objects that are inserted and removed according to the last-in first-out (**LIFO**) principle. In the pushdown stacks only two operations are allowed: push the item into the stack, and pop the item out of the stack.

There are basically three operations that can be performed on stacks. They are:

1. inserting an item into a stack (**push**).
2. deleting an item from the stack (**pop**).
3. displaying the contents of the stack (**peek** or **top**).

A stack is a limited access data structure - elements can be added and removed from the stack only at the top. push adds an item to the top of the stack, pop removes the item from the top. A helpful analogy is to think of a stack of books; you can remove only the top book, also you can add a new book on the top.

![](https://user-images.githubusercontent.com/13550565/85218531-fea33d80-b3cd-11ea-8ba4-77c37d446d07.png)

**Source:** _www.cs.cmu.edu_

#### Q2: Explain why Stack is a recursive data structure ⭐
##### Answer:
A **stack** is a **recursive** data structure, so it's:

* a stack is either empty or
* it consists of a top and the rest which is a stack by itself;

**Source:** _www.cs.cmu.edu_

#### Q3: Define Linked List ⭐
##### Answer:
A **linked list** is a linear data structure where each element is a separate object. Each element (we will call it a **node**) of a list is comprising of two items - the **data** and a **reference (pointer)** to the next node. The last node has a reference to **null**. The entry point into a linked list is called the **head** of the list. It should be noted that _head is not a separate node,_ but the reference to the first node. If the list is empty then the head is a null reference.

**Source:** _www.cs.cmu.edu_

#### Q4: Name some characteristics of Array Data Structure ⭐
##### Answer:
Arrays are:
* **Finite (fixed-size)** - An array is finite because it contains only limited number of elements.
* **Order** -All the elements are stored one by one , in contiguous location of computer memory in a linear order and fashion
* **Homogenous** - All the elements of an array are of same data types only and hence it is termed as collection of homogenous

**Source:** _codelack.com_

#### Q5: What is Queue? ⭐
##### Answer:
A **queue** is a container of objects (a _linear_ collection) that are inserted and removed according to the first-in first-out (FIFO) principle. The process to add an element into queue is called **Enqueue** and the process of removal of an element from queue is called **Dequeue**.

![](https://user-images.githubusercontent.com/13550565/85218641-9d2f9e80-b3ce-11ea-8c1b-a9c058057a70.png)

**Source:** _www.cs.cmu.edu_

#### Q6: What is Heap? ⭐
##### Answer:
A **Heap** is a special Tree-based data structure which is an almost complete tree that satisfies the heap property:

* in a **max heap**, for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C.
* In a **min heap**, the key of P is less than or equal to the key of C. The node at the "top" of the heap (with no parents) is called the root node.

A common implementation of a heap is the binary heap, in which the tree is a **binary tree.**

![](https://www.techiedelight.com/wp-content/uploads/2016/11/Min-Max-Heap.png)

**Source:** _www.geeksforgeeks.org_

##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q7: What is Hash Table? ⭐
##### Answer:
A **hash table** (hash map) is a data structure that implements an **associative** array abstract data type, a **structure** that can **map keys to values**. Hash tables implement an associative array, which is indexed by arbitrary objects (keys). A hash table uses a **hash function** to compute an **index**, also called a **hash value**, into an **array of buckets** or slots, from which the desired **value** can be found.

![](https://i.stack.imgur.com/0yjYd.png)

**Source:** _en.wikipedia.org_

#### Q8: What is Priority Queue? ⭐
##### Answer:
A **priority queue** is a data structure that stores **priorities** (comparable values) and perhaps associated information. A **priority queue** is different from a "normal" queue, because instead of being a "first-in-first-out" data structure, values come out in order by **priority**. Think of a priority queue as a kind of bag that holds priorities. You can put one in, and you can take out the current highest priority.

![](https://cdn.programiz.com/sites/tutorial2program/files/Introduction.png)

**Source:** _pages.cs.wisc.edu_

##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q9: Define Tree Data Structure ⭐
##### Answer:
**Trees** are well-known as a _non-linear_ data structure. They don’t store data in a linear way. They organize data _hierarchically_.

A **tree** is a collection of entities called **nodes**. Nodes are connected by **edges**. Each node contains a **value** or **data** or **key**, and it may or may not have a **child** node. The first node of the tree is called the **root**. **Leaves** are the last nodes on a tree. They are nodes without children.

![](https://miro.medium.com/max/975/1*PWJiwTxRdQy8A_Y0hAv5Eg.png)

**Source:** _www.freecodecamp.org_

##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q10: What is a Graph? ⭐
##### Answer:
A **graph** is a common data structure that consists of a finite set of **nodes** (or **vertices**) and a set of **edges** connecting them. A pair `(x,y)` is referred to as an edge, which communicates that the **x vertex** connects to the **y vertex**.

Graphs are used to solve real-life problems that involve representation of the problem space as a **network**. Examples of networks include telephone networks, circuit networks, social networks (like LinkedIn, Facebook etc.).

![](https://miro.medium.com/max/1640/1*4s5Z7gVwVqmKcslgiamRyw.png)

**Source:** _www.educative.io_

##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q11: What is String in Data Structures? ⭐
##### Answer:
A **string** is generally considered as a **data type** and is often implemented as an **array data structure** of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

**Source:** _dev.to_

##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q12: What is Trie? ⭐
##### Answer:
**Trie** (also called **digital tree **or **prefix tree**) is a _tree-based data structure_, which is used for efficient _retrieval_ of a key in a large data-set of strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated; i.e., **the value of the key is distributed across the structure**. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Each complete English word has an arbitrary integer value associated with it (see image).


![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/ae/Patricia_trie.svg/1200px-Patricia_trie.svg.png)

**Source:** _medium.com_

##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q13: Define Binary Tree ⭐
##### Answer:
A normal tree has no restrictions on the number of children each node can have. A **binary tree** is made of nodes, where each node contains a "left" pointer, a "right" pointer, and a data element.

There are three different types of binary trees:

* **Full binary tree**: Every node other than leaf nodes has 2 child nodes.
* **Complete binary tree**: All levels are filled except possibly the last one, and all nodes are filled in as far left as possible.
* **Perfect binary tree**: All nodes have two children and all leaves are at the same level.

![](https://study.com/cimages/multimages/16/0e0646ba-30e5-40d9-b45c-a138f038f05b_full_complete_perfect.png)

**Source:** _study.com_

##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q14: Why and when should I use Stack or Queue data structures instead of Arrays/Lists? ⭐⭐
##### Answer:
Because they help manage your data in more a _particular_ way than arrays and lists. It means that when you're debugging a problem, you won't have to wonder if someone randomly inserted an element into the middle of your list, messing up some invariants.

Arrays and lists are random access. They are very flexible and also easily *corruptible*. If you want to manage your data as FIFO or LIFO it's best to use those, already implemented, collections.

More practically you should:
* Use a queue when you want to get things out in the order that you put them in (FIFO)
* Use a stack when you want to get things out in the reverse order than you put them in (LIFO)
* Use a list when you want to get anything out, regardless of when you put them in (and when you don't want them to automatically be removed).

**Source:** _stackoverflow.com_

#### Q15: What is Complexity Analysis of Queue operations? ⭐⭐
##### Answer:
* Queues offer random access to their contents by shifting the first element off the front of the queue. You have to do this repeatedly to access an arbitrary element somewhere in the queue. Therefore, **access** is O(n).
* Searching for a given value in the queue requires iterating until you find it. So **search** is O(n).
* Inserting into a queue, by definition, can only happen at the back of the queue, similar to someone getting in line for a delicious Double-Double burger at In 'n Out. Assuming an efficient queue implementation, queue **insertion** is O(1).
* Deleting from a queue happens at the front of the queue. Assuming an efficient queue implementation, queue **deletion** is `O(1).

**Source:** _github.com_

#### Q16: What are some types of Queue? ⭐⭐
##### Answer:
Queue can be classified into following types:

* **Simple Queue** - is a linear data structure in which removal of elements is done in the same order they were inserted i.e., the element will be removed first which is inserted first.

![](https://scanftree.com/Data_Structure/queues.png)

* **Circular Queue** - is a linear data structure in which the operations are performed based on FIFO (First In First Out) principle and the last position is connected back to the first position to make a circle. It is also called **Ring Buffer**. Circular queue avoids the wastage of space in a regular queue implementation using arrays.

![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Ring_buffer.svg/794px-Ring_buffer.svg.png)

* **Priority Queue** - is a type of queue where each element has a priority value and the deletion of the elements is depended upon the priority value

![](https://www.aoprogrammer.com/wp-content/uploads/2018/07/enqueue-priority-queue.png)

* In case of **max-priority queue**, the element will be deleted first which has the largest priority value
* In case of **min-priority queue** the element will be deleted first which has the minimum priority value.
* **De-queue (Double ended queue)** - allows insertion and deletion from both the ends i.e. elements can be added or removed from rear as well as front end.

![](https://i.imgur.com/AF3RVpP.png)

* **Input restricted deque** - In input restricted double ended queue, the insertion operation is performed at only one end and deletion operation is performed at both the ends.

![](https://i.imgur.com/dSJUHFE.png)

* **Output restricted deque** - In output restricted double ended queue, the deletion operation is performed at only one end and insertion operation is performed at both the ends.

![](https://i.imgur.com/vyHyffe.png)

**Source:** _www.ques10.com_

#### Q17: What are some types of Linked List? ⭐⭐
##### Answer:
* A **singly linked list**

![](https://www.andrew.cmu.edu/course/15-121/lectures/Linked%20Lists/pix/linkedlist.bmp)
* A **doubly linked list** is a list that has two references, one to the next node and another to previous node.

![](https://www.andrew.cmu.edu/course/15-121/lectures/Linked%20Lists/pix/doubly.bmp)
* A **multiply linked list** - each node contains two or more link fields, each field being used to connect the same set of data records in a different order of same set(e.g., by name, by department, by date of birth, etc.).
* A **circular linked list** - where last node of the list points back to the first node (or the head) of the list.

![](https://i2.wp.com/algorithms.tutorialhorizon.com/files/2016/03/Circular-Linked-List.png)

**Source:** _www.cs.cmu.edu_

#### Q18: What are Dynamic Arrays? ⭐⭐
##### Answer:
A **dynamic array** is an array with a big improvement: _automatic resizing_.

One limitation of arrays is that they're _fixed_ size, meaning you need to specify the number of elements your array will hold ahead of time. A dynamic array expands as you add more elements. So you don't need to determine the size ahead of time.

**Source:** _www.interviewcake.com_

#### Q19: Return the N-th value of the Fibonacci sequence. Solve in _`O(n)`_ time ⭐⭐
##### Answer:
The easiest solution that comes to mind here is iteration:

```js
function fib(n){
let arr = [0, 1];
for (let i = 2; i < n + 1; i++){
arr.push(arr[i - 2] + arr[i -1])
}
return arr[n]
}
```
And output:
```
fib(4)
=> 3
```

Notice that two first numbers can not really be effectively generated by a for loop, because our loop will involve adding two numbers together, so instead of creating an empty array we assign our arr variable to `[0, 1]` that we know for a fact will always be there. After that we create a loop that starts iterating from i = 2 and adds numbers to the array until the length of the array is equal to `n + 1`. Finally, we return the number at n index of array.

**Source:** _medium.com_

##### Complexity Analysis:
**Time Complexity**: O(n)
**Space Complexity**: O(n)

An algorithm in our iterative solution takes linear time to complete the task. Basically we iterate through the loop n-2 times, so Big O (notation used to describe our worst case scenario) would be simply equal to O`(n)` in this case. The space complexity is `O(n)`.
##### Implementation:
##### _JS_

```js
function fib(n){
let arr = [0, 1]
for (let i = 2; i < n + 1; i++){
arr.push(arr[i - 2] + arr[i -1])
}
return arr[n]
}
```

##### _Java_

```java
double fibbonaci(int n){
double prev=0d, next=1d, result=0d;
for (int i = 0; i < n; i++) {
result=prev+next;
prev=next;
next=result;
}
return result;
}
```

##### _PY_

```py
def fib_iterative(n):
a, b = 0, 1
while n > 0:
a, b = b, a + b
n -= 1
return a
```

#### Q20: Name some disadvantages of Linked Lists? ⭐⭐
##### Answer:
Few disadvantages of linked lists are :

* They use more memory than arrays because of the storage used by their pointers.
* Difficulties arise in linked lists when it comes to reverse traversing. For instance, singly linked lists are cumbersome to navigate backwards and while doubly linked lists are somewhat easier to read, memory is wasted in allocating space for a back-pointer.
* Nodes in a linked list must be read in order from the beginning as linked lists are inherently sequential access.
* Random access has linear time.
* Nodes are stored incontiguously (no or poor cache locality), greatly increasing the time required to access individual elements within the list, especially with a CPU cache.
* If the link to list's node is accidentally destroyed then the chances of data loss after the destruction point is huge. Data recovery is not possible.
* Search is linear versus logarithmic for sorted arrays and binary search trees.
* Different amount of time is required to access each element.
* Not easy to sort the elements stored in the linear linked list.

**Source:** _www.quora.com_

#### Q21: Return the N-th value of the Fibonacci sequence Recursively ⭐⭐
##### Answer:
Recursive solution looks pretty simple (see code).

Let’s look at the diagram that will help you understand what’s going on here with the rest of our code. Function fib is called with argument 5:

![](https://miro.medium.com/max/1400/1*LNBBacuaBFOVZXUV6VgEEg.png)

Basically our **fib** function will continue to recursively call itself creating more and more branches of the tree until it hits the base case, from which it will start summing up each branch’s return values bottom up, until it finally sums them all up and returns an integer equal to 5.

**Source:** _medium.com_

##### Complexity Analysis:
**Time Complexity**: O(2^n)

In case of recursion the solution take **exponential** time, that can be explained by the fact that the size of the tree exponentially grows when n increases. So for every additional element in the Fibonacci sequence we get an increase in function calls. Big O in this case is equal to O(2n). Exponential Time complexity denotes an algorithm whose growth doubles with each addition to the input data set.
##### Implementation:
##### _JS_

```js
function fib(n) {
if (n < 2){
return n
}
return fib(n - 1) + fib (n - 2)
}
```

##### _Java_

```java
public int fibonacci(int n) {
if (n < 2) return n;

return fibonacci(n - 1) + fibonacci(n - 2);
}
```

##### _PY_

```py
def F(n):
if n == 0: return 0
elif n == 1: return 1
else: return F(n-1)+F(n-2)
```

#### Q22: What is the space complexity of a Hash Table? ⭐⭐
##### Answer:
The space complexity of a datastructure indicates how much space it occupies in relation to the amount of elements it holds. For example a space complexity of `O(1)` would mean that the datastructure alway consumes constant space no matter how many elements you put in there. `O(n)` would mean that the space consumption grows linearly with the amount of elements in it.

A **hashtable** typically has a space complexity of `O(n)`.

**Source:** _stackoverflow.com_

#### Q23: What is Binary Heap? ⭐⭐
##### Answer:
A **Binary Heap** is a _Binary Tree_ with following properties:

* It’s a _complete_ tree (all levels are completely filled except possibly the last level and the last level has all keys as left as possible). This property of Binary Heap makes them suitable to be stored in an array.
* A Binary Heap is either **Min Heap** or **Max Heap**. In a Min Binary Heap, the key at root must be minimum among all keys present in Binary Heap. The same property must be recursively true for all nodes in Binary Tree. Max Binary Heap is similar to MinHeap.

```js
10 10
/ \ / \
20 100 15 30
/ / \ / \
30 40 50 100 40
```

**Source:** _www.geeksforgeeks.org_

##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q24: What is Binary Search Tree? ⭐⭐
##### Answer:
**Binary search tree** is a data structure that quickly allows to maintain a _sorted list_ of numbers.

* It is called a _binary tree_ because each tree node has maximum of two children.
* It is called a _search tree_ because it can be used to search for the presence of a number in `O(log n)` time.

The properties that separates a binary search tree from a regular binary tree are:

* All nodes of left subtree are less than root node
* All nodes of right subtree are more than root node
* Both subtrees of each node are also BSTs i.e. they have the above two properties

![](https://cdn.programiz.com/sites/tutorial2program/files/bst-vs-not-bst.jpg)

**Source:** _www.programiz.com_

##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q25: What is the difference between Strings vs. Char arrays? ⭐⭐
##### Answer:
**Char arrays**:
* Static-sized
* Fast access
* Few built-in methods to manipulate strings
* A char array doesn’t define a data type

**Strings**:
* Slower access
* Define a data type
* Dynamic allocation
* More built-in functions to support string manipulations

**Source:** _dev.to_

##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q26: How to implement a _Tree_ data-structure? Provide some code. ⭐⭐
##### Answer:
That is a basic (generic) tree structure that can be used for `String` or any other object:

**Source:** _stackoverflow.com_

##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
##### Implementation:
##### _Java_

```java
public class Tree {
private Node root;

public Tree(T rootData) {
root = new Node();
root.data = rootData;
root.children = new ArrayList>();
}

public static class Node {
private T data;
private Node parent;
private List> children;
}
}
```

##### _PY_

Generic Tree:

```py
class Tree(object):
"Generic tree node."
def __init__(self, name='root', children=None):
self.name = name
self.children = []
if children is not None:
for child in children:
self.add_child(child)
def __repr__(self):
return self.name
def add_child(self, node):
assert isinstance(node, Tree)
self.children.append(node)
# *
# /|\
# 1 2 +
# / \
# 3 4
t = Tree('*', [Tree('1'),
Tree('2'),
Tree('+', [Tree('3'),
Tree('4')])])
```
Binary tree:

```py
class Tree:
def __init__(self):
self.left = None
self.right = None
self.data = None
```

#### Q27: Convert a _Singly Linked List_ to _Circular Linked List_ ⭐⭐
##### Answer:
To convert a singly linked list to a circular linked list, we will set the next pointer of the tail node to the head pointer.

* Create a copy of the head pointer, let's say `temp`.
* Using a loop, traverse linked list till tail node (last node) using temp pointer.
* Now set the next pointer of the tail node to head node. `temp->next = head`

**Source:** _www.techcrashcourse.com_

##### Implementation:
##### _PY_

```py
def convertTocircular(head):
# declare a node variable
# start and assign head
# node into start node.
start = head

# check that
while head.next
# not equal to null then head
# points to next node.
while(head.next is not None):
head = head.next

#
if head.next points to null
# then start assign to the
# head.next node.
head.next = start
return start
```

#### Q28: What's the difference between the data structure Tree and Graph? ⭐⭐
##### Answer:
**Graph:**
* Consists of a set of vertices (or nodes) and a set of edges connecting some or all of them
* Any edge can connect any two vertices that aren't already connected by an identical edge (in the same direction, in the case of a directed graph)
* Doesn't have to be connected (the edges don't have to connect all vertices together): a single graph can consist of a few disconnected sets of vertices
* Could be directed or undirected (which would apply to all edges in the graph)

**Tree:**
* A type of graph (fit with in the category of Directed Acyclic Graphs (or a DAG))
* Vertices are more commonly called "nodes"
* Edges are directed and represent an "is child of" (or "is parent of") relationship
* Each node (except the root node) has exactly one parent (and zero or more children)
* Has exactly one "root" node (if the tree has at least one node), which is a node without a parent
* Has to be connected
* Is acyclic, meaning it has no cycles: "a cycle is a path [AKA sequence] of edges and vertices wherein a vertex is reachable from itself"
* Trees aren't a recursive data structure

![](https://miro.medium.com/max/2262/1*-yHATwTlY2hwceJ93-D-cw.jpeg)

**Source:** _stackoverflow.com_

##### Complexity Analysis:
**Time Complexity**: None
**Space Complexity**: None
#### Q29: Under what circumstances are Linked Lists useful? ⭐⭐
##### Answer:
Linked lists are very useful when you need :
* to do a lot of insertions and removals, but not too much searching, on a list of arbitrary (unknown at compile\-time) length.
* splitting and joining (bidirectionally\-linked) lists is very efficient.
* You can also combine linked lists \- e.g. tree structures can be implemented as "vertical" linked lists (parent/child relationships) connecting together horizontal linked lists (siblings).

Using an array based list for these purposes has severe limitations:

* Adding a new item means the array must be reallocated (or you must allocate more space than you need to allow for future growth and reduce the number of reallocations)
* Removing items leaves wasted space or requires a reallocation
* inserting items anywhere except the end involves (possibly reallocating and) copying lots of the data up one position

**Source:** _stackoverflow.com_

#### Q30: Implement _Pre-order Traversal_ of _Binary Tree_ using _Recursion_ ⭐⭐
##### Answer:
For traversing a (non-empty) binary tree in pre-order fashion, we must do these three things for every node `N` starting from root node of the tree:

* (N) Process `N` itself.
* (L) Recursively traverse its _left_ subtree. When this step is finished we are back at N again.
* (R) Recursively traverse its _right_ subtree. When this step is finished we are back at N again.

![](https://www.techiedelight.com/wp-content/uploads/Preorder-Traversal.png)

**Source:** _github.com_

##### Complexity Analysis:
**Time Complexity**: O(n)
**Space Complexity**: O(n)
##### Implementation:
##### _Java_

```java
// Recursive function to perform pre-order traversal of the tree
public static void preorder(TreeNode root)
{
// return if the current node is empty
if (root == null) {
return;
}

// Display the data part of the root (or current node)
System.out.print(root.data + " ");

// Traverse the left subtree
preorder(root.left);

// Traverse the right subtree
preorder(root.right);
}
```

##### _PY_

```py
# Recursive function to perform pre-order traversal of the tree
def preorder(root):

# return if the current node is empty
if root is None:
return

# Display the data part of the root (or current node)
print(root.data, end=' ')

# Traverse the left subtree
preorder(root.left)

# Traverse the right subtree
preorder(root.right)
```

#### Q31: What is an Associative Array? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q32: What does Sparse Array mean? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q33: How to merge two sorted _Arrays_ into a _Sorted Array_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q34: Explain how _Heap Sort_ works ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q35: What is complexity of Hash Table? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q36: LIS: Find length of the _longest increasing subsequence (LIS)_ in the array. Solve using DP. ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q37: Compare Heaps vs Arrays to implement Priority Queue ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q38: How to check if two Strings (words) are _Anagrams_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q39: Name some application of Trie data structure ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q40: Find all the _Permutations_ of a String ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q41: What is AVL Tree? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q42: What is Balanced Tree and why is that important? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q43: Name some common types and categories of Graphs ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q44: Convert a Binary Tree to a Doubly Linked List ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q45: Can you do _Iterative Pre-order Traversal_ of a _Binary Tree_ without _Recursion_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q46: Explain how _QuickSort_ works ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q47: Binet's formula: How to calculate Fibonacci numbers without Recursion or Iteration? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q48: What are some main advantages of Tries over Hash Tables ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q49: How would you traverse a Linked List in O(n1/2)? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q50: Explain what is _Fibonacci Search_ technique? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q51: What are Pascal Strings? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q52: When is doubly linked list more efficient than singly linked list? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q53: What is Red-Black tree? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q54: How To Choose Between a Hash Table and a Trie (Prefix Tree)? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q55: How to implement 3 _Stacks_ with one _Array_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q56: Find the _length_ of a Linked List which contains _Cycle (Loop)_ ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q57: What is Rope Data Structure is used for? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q58: Explain what is B-Tree? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q59: What is Bipartite Graph? How to detect one? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q60: Compare lookup operation in Trie vs Hash Table ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q61: How are B-Trees used in practice? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

## [[⬆]](#toc) Databases Interview Questions
#### Q1: What is _Normalisation_? ⭐⭐
##### Answer:
**Normalization** is basically to design a database schema such that **duplicate and redundant data is avoided**. If the same information is repeated in multiple places in the database, there is the risk that it is updated in one place but not the other, leading to data corruption.

There is a number of normalization levels from 1. normal form through 5. normal form. Each normal form describes how to get rid of some specific problem.

By having a database with normalization errors, you open the risk of getting invalid or corrupt data into the database. Since data "lives forever" it is very hard to get rid of corrupt data when first it has entered the database.

**Source:** _stackoverflow.com_

#### Q2: What is the difference between _Data Definition Language (DDL)_ and _Data Manipulation Language (DML)_? ⭐⭐
##### Answer:
* **Data definition language (DDL)** commands are the commands which are used to define the database. **CREATE**, **ALTER**, **DROP** and **TRUNCATE** are some common DDL commands.

* **Data manipulation language (DML)** commands are commands which are used for manipulation or modification of data. **INSERT**, **UPDATE** and **DELETE** are some common DML commands.

**Source:** _en.wikibooks.org_

#### Q3: What are the advantages of NoSQL over traditional RDBMS? ⭐⭐
##### Answer:
**NoSQL is better** than RDBMS because of the following reasons/properities of NoSQL:

* It supports semi-structured data and volatile data
* It does not have schema
* Read/Write throughput is very high
* Horizontal **scalability** can be achieved easily
* Will support Bigdata in volumes of Terra Bytes & Peta Bytes
* Provides good support for Analytic tools on top of Bigdata
* Can be hosted in cheaper hardware machines
* In-memory caching option is available to increase the performance of queries
* Faster development life cycles for developers

Still, **RDBMS is better** than NoSQL for the following reasons/properties of RDBMS:
* Transactions with **ACID** properties - Atomicity, Consistency, Isolation & Durability
* Adherence to **Strong Schema** of data being written/read
* Real time query management ( in case of data size < 10 Tera bytes )
* Execution of complex queries involving **join** & **group by** clauses

**Source:** _stackoverflow.com_

#### Q4: Define ACID Properties ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q5: How a database index can help performance? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q6: What is Denormalization? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q7: What are the difference between _Clustered_ and a _Non-clustered_ index? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q8: What's the difference between a _Primary Key_ and a _Unique Key_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q9: When would you use NoSQL? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q10: When should I use a NoSQL database instead of a relational database? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q11: What is Optimistic locking? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q12: What Is ACID Property Of A System? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q13: What is the _cost_ of having a database _index_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q14: Explain the difference between _Exclusive Lock_ and _Update Lock_ ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q15: How does _B-trees Index_ work? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q16: Explain eventual consistency in context of NoSQL ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q17: How do you track record relations in NoSQL? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q18: What Is Sharding? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q19: What Is BASE Property Of A System? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q20: How do you off load work from the Database? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q21: What are some _other_ types of Indexes (vs B-Trees)? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q22: Name some disadvantages of a _Hash index_ ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q23: What is _Optimistic Locking_ and _Pessimistic Locking_? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q24: How does database _Indexing_ work? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q25: What is the difference between _B-Tree_, _R-Tree_ and _Hash_ indexing? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q26: Explain the differences in conceptual data design with NoSQL databases? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q27: What Does Eventually Consistent Mean? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q28: Is the C in ACID is not the C in CAP? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q29: How do you make schema changes to a live database without downtime? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q30: Why you should never use GUIDs as part of clustered index? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

## [[⬆]](#toc) Datasets Interview Questions
#### Q1: What's the difference between _Covariance_ and _Correlation_? ⭐⭐
##### Answer:
- **Covariance** measures whether a **variation** in one _variable_ results in a variation in _another variable_, and deals with the linear relationship of only `2` variables in the dataset. Its value can take range from `-∞` to `+∞`. Simply speaking **Covariance** indicates the direction of the linear relationship between variables.

![](https://www.simplilearn.com/ice9/free_resources_article_thumb/covx-y.jpg)

- **Correlation** measures how strongly two or more variables are **related** to each other. Its values are between `-1` to `1`. **Correlation** measures both the strength and direction of the linear relationship between two variables. Correlation is a function of the covariance.

**Source:** _careerfoundry.com_

#### Q2: Would you use _K-NN_ for large datasets? ⭐⭐
##### Answer:
It's not recommended to perform **K-NN** on large datasets, given that the computational and memory cost can increase. To understand the reason why we should remember how the **K-NN** algorithm works:

1. Starts by calculating the distances to all vectors in a training set and store them.
2. Then, it sorts the calculated distances.
3. Then, we store the K nearest vectors.
4. And finally, calculate the most frequent class displayed by K nearest vectors.

So implement **K-NN** on a large dataset it is not only a bad decision to store a large amount of data but it is also computationally costly to keep calculating and sorting all the values. For that reason, **K-NN** is not recommended and another classification algorithm like _**Naive Bayes**_ or _**SVM**_ is preferred in such cases.

**Source:** _towardsdatascience.com_

#### Q3: What is *Cross-Validation* and why is it important in supervised learning? ⭐⭐
##### Answer:
* ***Cross-validation*** is a method of assessing _how the results of a statistical analysis will generalize on an independent dataset_,
* It can be used in machine learning tasks to _evaluate the predictive capability of the model_,
* It also helps us to _avoid overfitting and underfitting_,
* A common way to cross-validate is to divide the dataset into *training*, *validation*, and *testing* where:

* **Training dataset** is a dataset of known data on which the training is run.
* **Validation dataset** is the dataset that is *unknown* against which the model is tested. The validation dataset is used after each epoch of learning to gauge the improvement of the model.
* **Testing dataset** is also an unknown dataset that is used to test the model. The testing dataset is used to measure the performance of the model after it has finished learning.

![cross_validation](https://s3.ap-south-1.amazonaws.com/myinterviewtrainer-domestic/public_assets/assets/000/000/094/original/Cross-Validation.png?1614946006)

**Source:** _en.wikipedia.org_

#### Q4: How does _K-fold Cross Validation_ work? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q5: What is the difference between _Test Set_ and _Validation Set_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q6: What are the assumptions before applying the _OLS estimator_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q7: What are the difference between _Type I_ and _Type II_ errors? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q8: What's the difference between _Bagging_ and _Boosting_ algorithms? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q9: What's the difference between _One-vs-Rest_ and _One-vs-One_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q10: What are some _disadvantages_ of using Decision Trees and how would you solve them? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q11: Name some best practices for working with Datasets ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q12: When you sample, what potential _Sampling Biases_ could you be inflicting? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q13: How would you determine the needed _Sample Size_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q14: What are some variations of _Cross-Validation_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q15: Explain what is an _Unrepresentative Dataset_ and how would you diagnose it? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q16: How would you detect _Heteroskedasticity_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q17: How would you address the problem of _Heteroskedasticity_ caused for a _Measurement error_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q18: How would you deal with _Outliers_ in your dataset? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q19: How would you deal with an _Imbalanced Dataset_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q20: What's the difference between _Random Oversampling_ and _Random Undersampling_ and when they can be used? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q21: How would you use a _Confusion Matrix_ for determining a model performance? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q22: What is *Multidimensional Scaling*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q23: Is _mean imputation_ of missing data acceptable practice? Why or why not? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q24: When would you use *_chi-Square_* or an *_ANOVA_* test? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q25: How would you handle _Missing Data_ and perform _Data Imputation_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q26: Compare _ Causation_ vs _Correlation_ ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q27: Which measures of _Variability_ would you use on your data? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q28: How does an ANOVA test work? ⭐⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

## [[⬆]](#toc) Decision Trees Interview Questions
#### Q1: What are *Decision Trees*? ⭐
##### Answer:
* ***Decision trees*** is a tool that uses a *tree-like model* of decisions and their possible consequences. If an algorithm only contains *conditional control statements*, decision trees can model that algorithm really well.
* *Decision trees* are a *non-parametric*, _supervised_ learning method.
* *Decision trees* are used for *classification* and *regression* tasks.
* The diagram below shows an example of a decision tree (the dataset used is the Titanic dataset to predict whether a passenger survived or not):

![decision](https://miro.medium.com/max/540/1*XMId5sJqPtm8-RIwVVz2tg.png)

**Source:** _towardsdatascience.com_

#### Q2: Explain the _structure_ of a Decision Tree ⭐⭐
##### Answer:
A ***decision tree*** is a ***flowchart-like*** structure in which:
* Each *internal node* represents the ***test*** on an attribute (e.g. outcome of a coin flip).
* Each *branch* represents the **_outcome_** of the test.
* Each *leaf node* represents a ***class label***.
* The _paths_ from the root to leaf represent the ***classification rules***.

![https://aiaspirant.com/wp-content/uploads/2020/02/dt_struct.png](https://aiaspirant.com/wp-content/uploads/2020/02/dt_struct.png)

**Source:** _en.wikipedia.org_

#### Q3: How are the different nodes of decision trees _represented_? ⭐⭐
##### Answer:
A **decision tree** consists of three **types** of nodes:
* **Decision nodes:** Represented by **squares.** It is a node where a flow branches into several optional branches.
* **Chance nodes:** Represented by **circles.** It represents the probability of certain results.
* **End nodes:** Represented by **triangles.** It shows the final outcome of the decision path.

![decision_nodes](https://upload.wikimedia.org/wikipedia/commons/a/ad/Decision-Tree-Elements.png)

**Source:** _en.wikipedia.org_

#### Q4: What are some _advantages_ of using Decision Trees? ⭐⭐
##### Answer:
* It is **simple to understand** and interpret. It can be **visualized** easily.
* It **does not require as much data preprocessing** as other methods.
* It can handle both **numerical** and **categorical** data.
* It can handle **multiple output** problems.

**Source:** _scikit-learn.org_

#### Q5: What type of node is considered *Pure*? ⭐⭐
##### Answer:
* If the *Gini Index* of the data is `0` then it means that all the elements **belong to a specific class**. When this happens it is said to be *pure*.
* When all of the data belongs to a single class (*pure*) then the *leaf node* is reached in the tree.
* The leaf node represents the *class label* in the tree (which means that it gives the final output).

![pure_node](https://miro.medium.com/max/2000/1*k4qcPhr04dHjciI5nFdQrw.png)

**Source:** _medium.com_

#### Q6: How is a _Random Forest_ related to _Decision Trees_? ⭐⭐
##### Answer:
* ***Random forest*** is an ***ensemble learning*** method that works by constructing a multitude of ***decision trees***. A random forest can be constructed for both classification and regression tasks.
* Random forest **outperforms** decision trees, and it also does not have the habit of *overfitting* the data as decision trees do.
* A decision tree trained on a specific dataset will become very deep and cause overfitting. To create a random forest, decision trees can be trained on different subsets of the training dataset, and then the different decision trees can be averaged with the goal of _decreasing the variance_.

**Source:** _en.wikipedia.org_

#### Q7: What is the difference between *OOB* score and *validation* score? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q8: How would you deal with an _Overfitted Decision Tree_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q9: What are some _disadvantages_ of using Decision Trees and how would you solve them? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q10: What is *Greedy Splitting*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q11: What type of *Cost Functions* do *Greedy Splitting* use? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q12: How would you define the *Stopping Criteria* for decision trees? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q13: Why do you need to *Prune* the decision tree? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q14: What is *Entropy*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q15: How do we _measure_ the Information? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q16: What is *Gini Index* and how is it used in Decision Trees? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q17: What is the *Chi-squared test*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q18: How does the *CART* algorithm produce *Classification Trees*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q19: How does the *CART* algorithm produce *Regression Trees*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q20: What is the difference between *Post-pruning* and *Pre-pruning*? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q21: Compare *Linear Regression* and *Decision Trees* ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q22: What is _Tree Bagging_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q23: What is _Tree Boosting_? ⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q24: How to use _Isolation Forest_ for Anomalies detection? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q25: Imagine that you know there are _outliers_ in your data, would you use _Logistic Regression_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q26: What is the use of *Entropy* pertaining to Decision Trees? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q27: While building Decision Tree how do you choose which attribute to _split_ at each node? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q28: What is difference between _Gini Impurity_ and _Entropy_ in Decision Tree? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q29: When should I use _Gini Impurity_ as opposed to _Information Gain (Entropy)_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q30: Explain the *CHAID* algorithm ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q31: What are some disadvantages of the *CHAID* algorithm? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q32: Explain how can *CART* algorithm performs _Pruning_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q33: Explain how *ID3* produces *classification trees*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q34: How would you compare different _Algorithms_ to build _Decision Trees_? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q35: Compare *ID3* and *C4.5* algorithms ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q36: Compare *C4.5* and *C5.0* algorithms ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q37: What is the relationship between *Information Gain* and *Information Gain Ratio*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q38: How do you _Gradient Boost_ decision trees? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q39: Compare *Decision Trees* and *Logistic Regression* ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q40: What are the differences between *Decision Trees* and *Neural Networks*? ⭐⭐⭐⭐
Read answer on πŸ‘‰ MLStack.Cafe

#### Q41: Compare *Decision Trees* and *k-Nearest Neighbors* ⭐⭐⭐⭐
Read answer on πŸ‘‰