{"id":20875354,"url":"https://github.com/aershov24/machine-learning-ds-interview-questions","last_synced_at":"2025-08-17T01:33:50.887Z","repository":{"id":37376789,"uuid":"400117593","full_name":"aershov24/machine-learning-ds-interview-questions","owner":"aershov24","description":"🔴 1704 Machine Learning, Data Science \u0026 Python Interview Questions (ANSWERED) To Kill Your Next ML \u0026 DS Interview. Get All Answers + PDFs on MLStack.Cafe. Post your ML Jobs 👉","archived":false,"fork":false,"pushed_at":"2023-01-11T00:13:22.000Z","size":304,"stargazers_count":105,"open_issues_count":2,"forks_count":32,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-12T16:16:52.551Z","etag":null,"topics":["algorithms-and-data-structures","data-analysis","data-science","interview-practice","interview-preparation","interview-questions","machine-learning","machine-learning-algorithms","machinelearning"],"latest_commit_sha":null,"homepage":"https://www.mlstack.cafe","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aershov24.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"custom":["https://www.mlstack.cafe","https://www.fullstack.cafe"]}},"created_at":"2021-08-26T09:45:13.000Z","updated_at":"2025-01-19T18:39:18.000Z","dependencies_parsed_at":"2023-02-08T21:01:07.020Z","dependency_job_id":null,"html_url":"https://github.com/aershov24/machine-learning-ds-interview-questions","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aershov24/machine-learning-ds-interview-questions","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aershov24%2Fmachine-learning-ds-interview-questions","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aershov24%2Fmachine-learning-ds-interview-questions/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aershov24%2Fmachine-learning-ds-interview-questions/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aershov24%2Fmachine-learning-ds-interview-questions/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aershov24","download_url":"https://codeload.github.com/aershov24/machine-learning-ds-interview-questions/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aershov24%2Fmachine-learning-ds-interview-questions/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270796216,"owners_count":24647319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms-and-data-structures","data-analysis","data-science","interview-practice","interview-preparation","interview-questions","machine-learning","machine-learning-algorithms","machinelearning"],"created_at":"2024-11-18T06:44:11.035Z","updated_at":"2025-08-17T01:33:50.850Z","avatar_url":"https://github.com/aershov24.png","language":null,"funding_links":["https://www.mlstack.cafe","https://www.fullstack.cafe"],"categories":[],"sub_categories":[],"readme":"# 1704 🤖 Machine Learning, Data Science \u0026 Python Interview Questions (ANSWERED) To Land Your Next Six-Figure Job Offer from [MLStack.Cafe](https://www.mlstack.cafe)\n\n[MLStack.Cafe](https://www.mlstack.cafe) is the biggest hand-picked collection of top Machine Learning, Data Science, Python and Coding interview questions for Junior and Experienced data analyst, machine learning engineers/developers and data scientists with more that 1704 ML \u0026 DS interview questions and answers. Prepare for your next ML, DS \u0026 Python interview and land 6-figure job offer in no time.\n\n🔴 Get All 1704 Answers + PDFs + Latex Math on [MLStack.Cafe - Kill Your ML, DS \u0026 Python Interview](https://www.mlstack.cafe/?utm_source=github\u0026utm_medium=mlsciq)\n\n👨‍💻 Hiring Data Analysts, Machine Learning Engineers or Developers? [Post your Job on MLStack.Cafe](https://www.mlstack.cafe/?utm_source=github\u0026utm_medium=mlsc-job-posting) and reach thousands of motivated engineers who is looking for a ML Job right now!\n\n---\n\n## \u003ca name='toc'\u003eTable of Contents\u003c/a\u003e\n * [Anomaly Detection](#AnomalyDetection)\n * [Autoencoders](#Autoencoders)\n * [Bias \u0026 Variance](#Bias\u0026Variance)\n * [Big Data](#BigData)\n * [Big-O Notation](#Big-ONotation)\n * [Classification](#Classification)\n * [Clustering](#Clustering)\n * [Cost Function](#CostFunction)\n * [Data Structures](#DataStructures)\n * [Databases](#Databases)\n * [Datasets](#Datasets)\n * [Decision Trees](#DecisionTrees)\n * [Deep Learning](#DeepLearning)\n * [Dimensionality Reduction](#DimensionalityReduction)\n * [Ensemble Learning](#EnsembleLearning)\n * [Genetic Algorithms](#GeneticAlgorithms)\n * [Gradient Descent](#GradientDescent)\n * [K-Means Clustering](#K-MeansClustering)\n * [K-Nearest Neighbors](#K-NearestNeighbors)\n * [Linear Algebra](#LinearAlgebra)\n * [Linear Regression](#LinearRegression)\n * [Logistic Regression](#LogisticRegression)\n * [Machine Learning](#MachineLearning)\n * [Model Evaluation](#ModelEvaluation)\n * [Natural Language Processing](#NaturalLanguageProcessing)\n * [Naïve Bayes](#NaïveBayes)\n * [Neural Networks](#NeuralNetworks)\n * [NumPy](#NumPy)\n * [Optimization](#Optimization)\n * [Pandas](#Pandas)\n * [Probability](#Probability)\n * [Python](#Python)\n * [Random Forests](#RandomForests)\n * [SQL](#SQL)\n * [SVM](#SVM)\n * [Scikit-Learn](#Scikit-Learn)\n * [Searching](#Searching)\n * [Sorting](#Sorting)\n * [Statistics](#Statistics)\n * [Supervised Learning](#SupervisedLearning)\n * [TensorFlow](#TensorFlow)\n * [Unsupervised Learning](#UnsupervisedLearning)\n## [[⬆]](#toc) \u003ca name=AnomalyDetection\u003eAnomaly Detection\u003c/a\u003e Interview Questions\n#### Q1: Explain what is Anomaly Detection? ⭐\n##### Answer:\n**Anomaly detection** (or outlier detection) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.\n\n![](https://miro.medium.com/max/732/1*J-ds9KYQBheiBaIJn78seg.gif)\n\n**Source:** _towardsdatascience.com_\n\n#### Q2: Why do we care about Anomalies? ⭐⭐\n##### Answer:\n* The goal of anomaly detection is to identify cases that are unusual within data that is seemingly comparable hence anomaly detection can be used effectively as a tool for risk mitigation and fraud detection. \n* When preparing datasets for machine learning models, it is really important to detect all the outliers and either get rid of them or analyze them to know why you had them there in the first place.\n\n**Source:** _towardsdatascience.com_\n\n#### Q3: What's the difference between _Normalisation_ and _Standardisation_? ⭐⭐\n##### Answer:\n**Normalization** rescales the values into a range of \\[0,1\\].  This might be useful in some cases where all parameters need to have the same positive scale. However, the _outliers_ from the data set _are lost._\n\n$$\nX_{changed} = \\frac{X - X_{min}}{X_{max}-X_{min}}\n$$ \n\n**Standardization** rescales data to have a mean ($\\mu$) of 0 and standard deviation ($\\sigma$) of 1 (unit variance).\n\n$$\nX_{changed} = \\frac{X - \\mu}{\\sigma}\n$$        \n\nFor most applications standardization is recommended.\n\n![](https://i.stack.imgur.com/WqU1U.png)\n\n**Source:** _stats.stackexchange.com_\n\n#### Q4: Why would you use the _Median_ as a measure of central tendency? ⭐⭐\n##### Answer:\nThe **Median** is the most suitable measure of _central tendency_ for **skewed distributions** or distributions with **outliers**. For example, the median is often used as a measure of central tendency for income distributions, which are generally highly skewed.\n\nBecause the median only uses one or two values, it’s unaffected by extreme _outliers_ or _non-symmetric distributions_ of scores. In contrast, the **_mean_** and **_mode_** can vary in skewed distributions.\n\n![https://miro.medium.com/max/754/0*wHMvuwRa_YF9SFwY.png](https://miro.medium.com/max/754/0*wHMvuwRa_YF9SFwY.png)\n\n**Source:** _en.wikipedia.org_\n\n#### Q5: Explain how to use _Standard Deviation_ for Anomalies Detection? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q6: What Are some _types_ of Anomalies? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q7: What are some _categories_ of outlier detection approaches? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q8: How to use _one-class SVM_ for Anomalies Detections? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q9: Explain the difference between _Outlier Detection_ vs _Novelty Detection_ ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q10: Compare *SVM* and *Logistic Regression* in handling outliers ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q11: How to use _Isolation Forest_ for Anomalies detection? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q12: What are some _advantages_ of using _Isolation Forest_ algorithm for outliers detection?  ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q13: How would you deal with _Outliers_ in your dataset? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q14: Imagine that you know there are _outliers_ in your data, would you use _Logistic Regression_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q15: How is *PCA* used for *Anomaly Detection*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q16: How does *Dictionary Learning* perform *Anomaly Detection*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q17: What types of _Robust Regression Algorithms_ do you know? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n## [[⬆]](#toc) \u003ca name=Autoencoders\u003eAutoencoders\u003c/a\u003e Interview Questions\n#### Q1: Describe the approach used in *Denoising Autoencoders* ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q2: How can *Neural Networks* be used to create *Autoencoders*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q3: Can you use *Batch Normalisation* in *Sparse Auto-encoders*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q4: What are the main differences between *Sparse Autoencoders* and *Convolution Autoencoders*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q5: What are some differences between the *Undercomplete Autoencoder* and the *Sparse Autoencoder*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q6: How can *Neural Networks* be _Unsupervised_? \nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n## [[⬆]](#toc) \u003ca name=Bias\u0026Variance\u003eBias \u0026 Variance\u003c/a\u003e Interview Questions\n#### Q1: What is _Bias_ in Machine Learning? ⭐⭐\n##### Answer:\nIn supervised machine learning an algorithm learns a model from training data.\n\nThe goal of any supervised machine learning algorithm is to best estimate the mapping function (f) for the output variable (Y) given the input data (X). The mapping function is often called the target function because it is the function that a given supervised machine learning algorithm aims to approximate.\n\n**Bias** are **the simplifying assumptions** made by a model to make the target function easier to learn.\n\nGenerally, linear algorithms have a high bias making them fast to learn and easier to understand but generally less flexible.\n\n* Examples of **low\\-bias** machine learning algorithms include: Decision Trees, k\\-Nearest Neighbors and [Support Vector Machines](https://machinelearningmastery.com/support-vector-machines-for-machine-learning/).\n\n* Examples of **high\\-bias** machine learning algorithms include: Linear Regression, Linear Discriminant Analysis and Logistic Regression.\n\n**Source:** _machinelearningmastery.com_\n\n#### Q2: What is the *Bias-Variance* tradeoff? ⭐⭐\n##### Answer:\n* **High Bias** can cause an algorithm to miss the relevant relations between features and target outputs (*underfitting*).\n* **High Variance** may result from an algorithm modeling random noise in the training data (*overfitting*).\n![https://community.alteryx.com/t5/image/serverpage/image-id/52874iE986B6E19F3248CF?v=v2](https://community.alteryx.com/t5/image/serverpage/image-id/52874iE986B6E19F3248CF?v=v2)\n\n\n* The **Bias-Variance tradeoff** is a central problem in _supervised learning_. Ideally, a model should be able to accurately capture the regularities in its training data, but also generalize well to unseen data.\n* It is called a *tradeoff* because it is typically impossible to do both simultaneously:  \n  * Algorithms with _high variance_ will be prone to _overfitting_ the dataset, but \n  * Algorithms with *high bias* will _underfit_ the dataset.\n\n![bias_variance_tradeoff](https://miro.medium.com/max/883/1*8sV6Sr9uc0Ef39YBivLzrw.jpeg)\n\n**Source:** _en.wikipedia.org_\n\n#### Q3: Provide an intuitive explanation of the _Bias-Variance Tradeoff_ ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q4: Name some types of _Data Biases_ in Machine Learning? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q5: What to do if you have _High Variance Problem_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q6: What to do if you have _High Bias Problem_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q7: What's the difference between _Bagging_ and _Boosting_ algorithms? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q8: How can you relate the _KNN Algorithm_ to the _Bias-Variance tradeoff_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q9: What is the *Bias Error*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q10: What is the *Variance Error*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q11: When you sample, what potential _Sampling Biases_ could you be inflicting? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n## [[⬆]](#toc) \u003ca name=BigData\u003eBig Data\u003c/a\u003e Interview Questions\n## [[⬆]](#toc) \u003ca name=Big-ONotation\u003eBig-O Notation\u003c/a\u003e Interview Questions\n#### Q1: What is _Big O_ notation? ⭐\n##### Answer:\n**Big-O** notation (also called \"asymptotic growth\" notation) is a relative representation of the complexity of an algorithm. It shows how an algorithm *scales* based on input size. We use it to talk about how thing _scale_. Big O complexity can be visualized with this graph:\n\n\n![](https://i.stack.imgur.com/WcBRI.png)\n\n\n**Source:** _stackoverflow.com_\n\n#### Q2: Provide an example of \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003e1\u003c/i\u003e)\u003c/code\u003e algorithm ⭐\n##### Answer:\nSay we have an array of `n` elements:\n\n```cs\nint array[n];\n```\n\nIf we wanted to access the first (or any) element of the array this would be \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003e1\u003c/i\u003e)\u003c/code\u003e since it doesn't matter how big the array is, it always takes the same constant time to get the first item:\n```cs\nx = array[0];\n```\n\n**Source:** _stackoverflow.com_\n\n#### Q3: What is Worst Case? ⭐⭐\n##### Answer:\nBig-O is often used to make statements about functions that measure the worst case behavior of an algorithm. **Worst case** analysis gives the maximum number of basic operations that have to be performed during execution of the algorithm. It assumes that the input is in the _worst possible state_ and maximum work has to be done to put things right.\n\n**Source:** _stackoverflow.com_\n\n#### Q4: What the heck does it mean if an operation is \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003elog n\u003c/i\u003e)\u003c/code\u003e? ⭐⭐\n##### Answer:\n**\u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003elog n\u003c/i\u003e)\u003c/code\u003e** means for every element, you're doing something that only needs to look at **log N** of the elements. This is usually because you know something about the elements that let you make an _efficient choice_ (for example to reduce a _search space_). \nThe most common attributes of logarithmic running\\-time function are that:\n*   the choice of the next element on which to perform some action is one of several possibilities, and\n*   only one will need to be chosen\n\nor\n\n*   the elements on which the action is performed are digits of `n`\n\nMost efficient sorts are an example of this, such as **merge sort**. ​It is `O(log n)` when we do divide and conquer type of algorithms e.g binary search. Another example is **quick sort** where each time we divide the array into two parts and each time it takes `O(N)` time to find a pivot element. Hence it `N O(log N)`\n\nPlotting `log(n)` on a plain piece of paper, will result in a graph where the rise of the curve decelerates as `n` increases:\n![](https://i.stack.imgur.com/qPNNp.png)\n\n\n**Source:** _stackoverflow.com_\n\n#### Q5: Why do we use Big O notation to compare algorithms?  ⭐⭐\n##### Answer:\nThe fact is it's difficult to determine the exact runtime of an algorithm. It depends on the speed of the computer processor. So instead of talking about the runtime directly, we use Big O Notation to talk about _how quickly the runtime grows_ depending on input size.\n\nWith Big O Notation, we use the size of the input, which we call `n`. So we can say things like the runtime grows “on the order of the size of the input” (\u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003en\u003c/i\u003e)\u003c/code\u003e) or “on the order of the square of the size of the input” (\u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003en\u003c/i\u003e\u003csup\u003e2\u003c/sup\u003e)\u003c/code\u003e). Our algorithm may have steps that seem expensive when `n` is small but are eclipsed eventually by other steps as `n` gets larger. For Big O Notation analysis, we care more about the stuff that grows fastest as the input grows, because everything else is quickly eclipsed as `n` gets very large.\n\n**Source:** _medium.com_\n\n#### Q6: What exactly would an \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003en\u003c/i\u003e\u003csup\u003e2\u003c/sup\u003e)\u003c/code\u003e operation do? ⭐⭐\n##### Answer:\n**\u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003en\u003c/i\u003e\u003csup\u003e2\u003c/sup\u003e)\u003c/code\u003e** means for every element, you're doing something with _every_ other element, such as comparing them. Bubble sort is an example of this.\n\n**Source:** _stackoverflow.com_\n\n#### Q7: What is complexity of this code snippet? ⭐⭐\n##### Details:\nLet's say we wanted to find a number in the list:\n```js\nfor (int i = 0; i \u003c n; i++){\n    if(array[i] == numToFind){ return i; }\n}\n```\nWhat will be the time complexity (Big O) of that code snippet?\n\n##### Answer:\nThis would be \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003en\u003c/i\u003e)\u003c/code\u003e since at most we would have to look through the entire list to find our number. The Big-O is still \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003en\u003c/i\u003e)\u003c/code\u003e even though we might find our number the first try and run through the loop once because Big-O describes the upper bound for an algorithm.\n\n**Source:** _stackoverflow.com_\n\n#### Q8: What is complexity of `push` and `pop` for a Stack implemented using a LinkedList? ⭐⭐\n##### Answer:\n\u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003e1\u003c/i\u003e)\u003c/code\u003e. Note, you don't have to insert at the end of the list. If you insert at the front of a (singly-linked) list, they are both `O(1)`.\n\nStack contains 1,2,3:\n\n```py\n[1]-\u003e[2]-\u003e[3]\n```\n\nPush 5:\n\n```js\n[5]-\u003e[1]-\u003e[2]-\u003e[3]\n```\n\nPop:\n\n```js\n[1]-\u003e[2]-\u003e[3] // returning 5\n```\n\n\n**Source:** _stackoverflow.com_\n\n#### Q9: Explain the difference between _`O(1)`_ vs _`O(n)`_ space complexities  ⭐⭐\n##### Answer:\nLet's consider a traversal algorithm for traversing a list.\n\n* \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003e1\u003c/i\u003e)\u003c/code\u003e denotes _constant_ space use: the algorithm allocates the same number of pointers irrespective to the list size. That will happen if we move (reuse) our pointer along the list.\n* In contrast, \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003en\u003c/i\u003e)\u003c/code\u003e denotes _linear_ space use: the algorithm space use grows together with respect to the input size `n`. That will happen if let's say for some reason the algorithm needs to allocate 'N' pointers (or other variables) when traversing a list.\n\n**Source:** _stackoverflow.com_\n\n#### Q10: What is the big O notation of this function? ⭐⭐\n##### Details:\nConsider:\n```js\nf(x) = log n + 3n\n```\nWhat is the big O notation of this function?\n\n##### Answer:\nIt is simply \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003en\u003c/i\u003e)\u003c/code\u003e.\n\nWhen you have a composite of multiple parts in Big O notation which are added, you have to choose the biggest one. In this case it is _`O(3n)`_, but there is no need to include constants inside parentheses, so we are left with _`O(n)`_.\n\n**Source:** _stackoverflow.com_\n\n#### Q11: What is an algorithm? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q12: What is complexity of this code snippet? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q13: What is the time complexity for \"Hello, World\" function? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q14: What is meant by \"Constant Amortized Time\" when talking about time complexity of an algorithm? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q15: Why do we use Big O instead of Big Theta (Θ)? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q16: Name some types of Big O complexity and corresponding algorithms ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q17: What is complexity of \"Reading a Book\"? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q18: Explain your understanding of \"Space Complexity\" with examples ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q19: What is the difference between Lower bound and Tight bound? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q20: What does it mean if an operation is \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003en!\u003c/i\u003e)\u003c/code\u003e? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q21: Provide an example of algorithm with time complexity of \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003ec\u003c/i\u003e\u003csup\u003ek\u003c/sup\u003e)\u003c/code\u003e? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q22: What are some algorithms which we use daily that has _`O(1)`_, _`O(n log n)`_ and _`O(log n)`_ complexities? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n## [[⬆]](#toc) \u003ca name=Classification\u003eClassification\u003c/a\u003e Interview Questions\n#### Q1: Why Naive Bayes is called _Naive_? ⭐⭐\n##### Answer:\nWe call it **naive** because its assumptions (it assumes that all of the features in the dataset are equally important and independent) are really optimistic and rarely true in most real-world applications:\n- we consider that these _predictors_ are _independent_\n- we consider that all the predictors have an _equal effect_ on the outcome (like the day being windy does not have more importance in deciding to play golf or not)\n\n**Source:** _towardsdatascience.com_\n\n#### Q2: What is a *Perceptron*? ⭐⭐\n##### Answer:\n* A **Perceptron** is a fundamental unit of a Neural Network that is also a single-layer Neural Network.\n* Perceptron is a linear _classifier_. Since it uses already labeled data points, it is a *supervised learning algorithm*.  \n* The _activation function_ applies a step rule (convert the numerical output into +1 or -1) to check if the output of the weighting function is greater than zero or not.\n\nA **Perceptron** is shown in the figure below:\n\n![perception](https://www.programmersought.com/images/258/633dca25508a646a6df343339c3d4eaa.png)\n\n\n**Source:** _towardsdatascience.com_\n\n#### Q3: What is a _Decision Boundary_? ⭐⭐\n##### Answer:\nA **decision boundary** is a line or a hyperplane that separates the classes. This is what we expect to obtain from _logistic regression_, as with any other classifier. With this, we can figure out some way to split the data to allow for an accurate prediction of a given observation’s class using the available information.\n\nIn the case of a generic two-dimensional example, the split might look something like this: \n\n![](https://miro.medium.com/max/340/0*hyR0M6QP5OvedMkd.png)\n\n**Source:** _medium.com_\n\n#### Q4: What types of _Classification Algorithms_ do you know? ⭐⭐\n##### Answer:\n- **Logistic regression**: ideally used for classification of _binary_ variables. Implements the _sigmoid function_ to calculate the probability that a data point belongs to a certain class. \n\n- **K-Nearest Neighbours (kNN)**: calculate the distance of one data point from every other data point and then takes a majority vote from _k-nearest neighbors_ of each data points to classify the output.\n\n- **Decision trees**: use multiple _if-else statements_ in the form of a tree structure that includes _nodes_ and _leaves_. The nodes breaking down the one major structure into smaller structures and eventually providing the final outcome.\n\n- **Random Forest**: uses multiple _decision trees_ to predict the outcome of the target variable. Each decision tree provides its own outcome and then it takes the majority vote to classify the final outcome. \n\n- **Support Vector Machines**: it creates an _n-dimensional space_ for the _n number of features_ in the dataset and then tries to create the hyperplanes such that it divides and classifies the data points with the maximum margin possible.\n\n**Source:** _www.upgrad.com_\n\n#### Q5: What is the difference between _KNN_ and _K-means Clustering_? ⭐⭐\n##### Answer:\n- **_K-nearest neighbors_** or _KNN_ is a _supervised classification algorithm_. This means that we need labeled data to classify an unlabeled data point. It attempts to classify a data point based on its proximity to other `K`-data points in the feature space.\n\n- **_K-means Clustering_** is an _unsupervised classification algorithm_. It requires only a set of unlabeled points and a threshold `K`, so it gathers and groups data into `K` number of clusters.\n\n**Source:** _www.quora.com_\n\n#### Q6: How do you choose the optimal _k_ in _k-NN_? ⭐⭐\n##### Answer:\nThere is not a rule of thumb to choose a standard optimal **_k_**. This value depends and varies from dataset to dataset, but as a general rule, the main goal is to keep it:\n- small enough to exclude the samples of the other classes but \n- large enough to minimize any noise in the data.\n\nA way to looking for this optimal parameter, commonly called the _Elbow method_, consist in creating a _for loop_ that trains various **_KNN_** models with different **_k values_**, keeping track of the error for each of these models, and use the model with the **_k value_** that achieves the best accuracy.\n\n![https://i.stack.imgur.com/ct2ie.jpg](https://i.stack.imgur.com/ct2ie.jpg)\n\n**Source:** _medium.com_\n\n#### Q7: How would you make a prediction using a _Logistic Regression_ model? ⭐⭐\n##### Answer:\nIn **Logistic regression** models, we are modeling the _probability_ that an input `(X)` belongs to the default class `(Y=1)`, that is to say:\n\n$$\nP(X) = P(Y=1|X)\n$$\n\nwhere the `P(X)` values are given by the **_logistic function_**,\n\n$$\nP(X) = \\frac{e^{\\beta_0 + \\beta_1X}}{1 + e^{\\beta_0 + \\beta_1X}}\n$$\n\nThe `β0` and `β1` values are estimated during the training stage using _maximum-likelihood_ estimation or _gradient descent_. Once we have it, we can make predictions by simply putting numbers into the _logistic regression equation_ and calculating a result.\n\nFor example, let's consider that we have a model that can predict whether a person is male or female based on their height, such as if `P(X) ≥ 0.5` the person is male, and if `P(X) \u003c 0.5` then is female.  \n\nDuring the training stage we obtain `β0 = -100` and `β1 = 0.6`, and we want to evaluate what's the probability that a person with a height of `150cm` is male, so with that intention we compute: \n\n$$\ny = \\frac{e^{-100 + 0.6\\cdot 150}}{1 + e^{-100 + 0.6\\cdot 150}} = 0.00004539 \\cdots\n$$\n\nGiven that logistic regression solves a _classification_ task, we can use directly this value to predict that the person is a female. \n\n**Source:** _machinelearningmastery.com_\n\n#### Q8: Why would you use the _Kernel Trick_? ⭐⭐\n##### Answer:\nWhen it comes to **classification** problems, the goal is to establish a decision boundary that maximizes the margin between the classes. However, in the real world, this task can become difficult when we have to treat with **non-linearly separable data**. One approach to solve this problem is to perform a data transformation process, in which we map all the data points to a **higher dimension** find the boundary and make the classification.\n\nThat sounds alright, however, when there are more and more dimensions, computations within that space become more and more expensive. In such cases, the **kernel trick allows us to operate in the original feature space without computing the coordinates of the data** in a higher-dimensional space and therefore offers a more efficient and less expensive way to transform data into higher dimensions.\n\nThere exist different kernel functions, such as:\n- _linear_, \n- _nonlinear_, \n- _polynomial_, \n- _radial basis function (RBF)_, and \n- _sigmoid_. \n\nEach one of them can be suitable for a particular problem depending on the data.  \n\n\n**Source:** _medium.com_\n\n#### Q9: What is the *Hinge Loss* in SVM? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q10: Name some _classification metrics_ and when would you use each one ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q11: What is the difference between a _Weak Learner_ vs a _Strong Learner_ and why they could be usefu? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q12: What's the difference between _Bagging_ and _Boosting_ algorithms? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q13: Provide an intuitive explanation of _Linear Support Vector Machines (SVMs)_ ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q14: Could you _convert_ Regression into Classification and vice versa? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q15: What's the difference between _One-vs-Rest_ and _One-vs-One_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q16: Can you choose a _classifier_ based on the _size of the training set_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q17: How would you use _Naive Bayes_ classifier for categorical features? What if some features are numerical? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q18: What's the difference between _Generative Classifiers_ and _Discriminative Classifiers_? Name some examples of each one ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q19: How does the _Naive Bayes_ classifier work? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q20: How does the _AdaBoost_ algorithm work? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q21: What's the difference between _Softmax_ and _Sigmoid_ functions? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q22: How do you use a supervised *Logistic Regression* for Classification? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q23: What is a *Confusion Matrix*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q24: How does *ROC* curve and *AUC* value help measure how good a model is? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q25: What are some advantages and disadvantages of using *AUC* to measure the _performance_ of the model? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q26: What is the *F-Score*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q27: How is _AUC - ROC_ curve used in classification problems?  ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q28: Name some advantages of using _Support Vector Machines_ vs _Logistic Regression_ for classification ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q29: When would you use _SVM_ vs _Logistic regression_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q30: Are there any problems using _Naive Bayes_ for Classification? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q31: What's the difference between _Random Oversampling_ and _Random Undersampling_ and when they can be used? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q32: How would you use a _Confusion Matrix_ for determining a model performance? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q33: How would you deal with classification on _Non-linearly Separable_ data? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q34: What are the trade-offs between the different types of _Classification Algorithms_? How would do you choose the best one? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q35: Compare _Naive Bayes_ vs with _Logistic Regression_ to solve classification problems ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q36: How would you _Calibrate Probabilities_ for a classification model? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q37: How would you choose an evaluation metric for an _Imbalanced classification_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q38: What is *AIC*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q39: Can _Logistic Regression_ be used for an _Imbalanced Classification_ problem? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q40: Why would you use _Probability Calibration_? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q41: What's the difference between _ROC_ and _Precision-Recall_ Curves? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q42: How to interpret _F-measure_ values? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n## [[⬆]](#toc) \u003ca name=Clustering\u003eClustering\u003c/a\u003e Interview Questions\n#### Q1: Define what is *Clustering*? ⭐\n##### Answer:\n* **Cluster analysis** is also called **clustering**.\n* It is the task of grouping a set of objects in such a way that *objects* in the same *cluster* are *more similar* to each other than to those in other clusters.\n* Cluster analysis itself is *not* one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them.\n\n![clustering](https://media.geeksforgeeks.org/wp-content/uploads/merge3cluster.jpg)\n\n**Source:** _Handbook of Cluster Analysis from Chapman and Hall/CRC_\n\n#### Q2: What is *Similarity-based Clustering*? ⭐⭐\n##### Answer:\n* Clustering, when the data are similar pairs of points is called **similarity-based clustering**.\n* A typical example of similarity-based clustering is community detection in social networks, where the observations are individual links between people, which may be due to friendship, shared interests, and work relationships. The *strength* of a link can be the frequency of interactions, for example, communications by e-mail, phone, or other social media, co-authorships, or citations.\n* In this clustering paradigm, the points to be clustered are not assumed to be part of a vector space. Their attributes (or features) are incorporated into a single dimension, the *link strength*, or *similarity*, which takes a numerical value $$S_{ij}$$ for each pair of points `i`, `j`. Hence, the natural representation for this problem is by means of the similarity matrix given below:\n$$\nS=[S_{ij}]_{i,j=1}^n\n$$\nThe similarities are symmetric $$S_{ij} = S_{ji}$$ and nonnegative $$S_{ij} \\geq 0$$.\n\n**Source:** _Handbook of Cluster Analysis from Chapman and Hall/CRC_\n\n#### Q3: Give examples of using *Clustering* to solve real-life problems ⭐⭐\n##### Answer:\n* **Identifying cancerous data:** Initially we take known samples of a cancerous and non-cancerous dataset, and label both the samples dataset. Then both the samples are mixed and different clustering algorithms are applied to the mixed samples dataset. It has been found through experiments that a cancerous dataset gives the best results with unsupervised non-linear clustering algorithms.\n* **Search engines:** Search engines try to group similar objects in one cluster and the dissimilar objects far from each other. It provides results for the searched data according to the nearest similar object which is clustered around the data to be searched.\n* **Wireless sensor network's based application:** Clustering algorithm can be used effectively in *Wireless Sensor Network's based application*. One application where it can be used is in *Landmine detection*. The clustering algorithm plays the role of finding the Cluster heads (or cluster center) which collects all the data in its respective cluster.\n\n**Source:** _sites.google.com_\n\n#### Q4: What is *Mean-Shift Clustering*? ⭐⭐\n##### Answer:\n* **Mean Shift** is a non-parametric feature-space analysis technique for locating the maxima of a *density function*. What we're trying to achieve here is, to keep shifting the window to a region of _higher density_.\n\n![https://iq.opengenus.org/content/images/2019/02/pdf.png](https://iq.opengenus.org/content/images/2019/02/pdf.png)\n\n* We can understand this algorithm by thinking of our data points to be represented as a probability density function. Naturally, in a probability function, higher density regions will correspond to the regions with more points, and lower density regions will correspond to the regions with less points.\nIn clustering, we need to find clusters of points, i.e the regions with a lot of points together. More points together mean higher density. Hence, we observe that clusters of points are more like the higher density regions in our probability density function.\n\n So, we must iteratively go from lower density to higher density regions, in order to find our clusters.\n\n* The mean shift method is an iterative method, and we start with an initial estimate `x`. Let a *kernel function* $$K(x_i - x)$$ be given. This function determines the weight of nearby points for re-estimation of the mean. Typically a *Gaussian kernel* on the distance to the current estimate is used,\n$$\nK(x_i-x)= e^{-c|x_i-x|^2}\n$$\nThe weighted mean of the density in the window determined by `K` is\n$$\nm(x) = \\frac{\\sum_{x_i \\in N(x)} K(x_i - x) x_i}{\\sum_{x_i \\in N(x) K(x_i - x)}}\n$$\nwhere `N(x)` is the neighborhood of `x`, a set of points for which $$K(x_i) \\neq 0$$.\n\n* The difference `m(x) - x` is called *mean shift*. The *mean-shift algorithm* now sets $$m(x) \\to x$$, and repeats the estimation until `m(x)` converges. It means, after a sufficient number of steps, the position of the centroid of all the points, and the current location of the window will coincide. This is when we reach convergence, as no new points are added to our window in this step.\n\n\n**Source:** _en.wikipedia.org_\n\n#### Q5: What are *Self-Organizing Maps*? ⭐⭐\n##### Answer:\n* **Self-Organizing Maps** (**SOMs**) are a class of *self-organizing* clustering techniques.\n* It is an _unsupervised form of artificial neural networks_. A self-organizing map consists of a set of neurons that are arranged in a rectangular or hexagonal grid. Each neuronal unit in the grid is associated with a numerical vector of fixed dimensionality. The learning process of a self-organizing map involves the adjustment of these vectors to provide a suitable representation of the input data.\n* Self-organizing maps can be used for clustering numerical data in vector format.\n\n![som](https://www.researchgate.net/profile/Mohamed-Zair/publication/329337931/figure/fig1/AS:915377227849728@1595254346086/Structural-model-of-self-organizing-map-neural-network-Figure-2-Experimental-benchmark.png)\n\n**Source:** _medium.com_\n\n#### Q6: Why do you need to perform *Significance Testing* in *Clustering*? ⭐⭐\n##### Answer:\n* **Significance testing** addresses an important aspect of cluster validation. Many cluster analysis methods will deliver clusterings even for homogeneous data. They assume implicitly that clustering has to be found, regardless of whether this is meaningful or not. \n\n\u003eA critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation.\n\n* **Significance testing** is performed to distinguish between a clustering that reflects meaningful _heterogeneity_ in the data and an artificial clustering of _homogeneous_ data.\n* Significance testing is also used for more specific tasks in cluster analysis, such as; estimating the number of clusters, and for interpreting some or all of the individual clusters, to show the significance of the individual clusters.\n\n**Source:** _www.ncbi.nlm.nih.gov_\n\n#### Q7: What is the difference between a _Multiclass problem_ and a _Multilabel problem_? ⭐⭐\n##### Answer:\n**Multiclass classification** means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is _assigned to one and only one label_: a fruit can be either an apple or a pear but not both at the same time.\n\n**Multilabel classification** assigns to each sample a set of target labels. This can be thought of as predicting properties of a data-point that are _not mutually exclusive_, such as topics that are relevant for a document. A text might be about any of religion, politics, finance or education at the same time or none of these.\n\n![https://i.stack.imgur.com/XghaO.png](https://i.stack.imgur.com/XghaO.png)\n\n**Source:** _stats.stackexchange.com_\n\n#### Q8: What is the _Jaccard Index_? ⭐⭐\n##### Answer:\nThe **Jaccard index**, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. The Jaccard coefficient measures **similarity** between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:\n\n![https://wikimedia.org/api/rest_v1/media/math/render/svg/eaef5aa86949f49e7dc6b9c8c3dd8b233332c9e7](https://wikimedia.org/api/rest_v1/media/math/render/svg/eaef5aa86949f49e7dc6b9c8c3dd8b233332c9e7)\n\n![https://upload.wikimedia.org/wikipedia/commons/c/c7/Intersection_over_Union_-_visual_equation.png](https://upload.wikimedia.org/wikipedia/commons/c/c7/Intersection_over_Union_-_visual_equation.png)\n\n![](https://upload.wikimedia.org/wikipedia/commons/e/e6/Intersection_over_Union_-_poor%2C_good_and_excellent_score.png)\n\n**Source:** _en.wikipedia.org_\n\n#### Q9: What is the difference between the two types of *Hierarchical Clustering*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q10: While performing *K-Means* Clustering, how do you determine the value of *K*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q11: What are some different types of *Clustering Structures* that are used in *Clustering Algorithms*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q12: When would you use *Hierarchical Clustering* over *Spectral Clustering*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q13: Compare *Hierarchical Clustering* and *k-Means Clustering* ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q14: Where do the *Similarities* come from in *Similarity-based Clustering*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q15: What is a *Mixture Model*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q16: What is the *Mixture* in *Gaussian Mixture Model*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q17: What is *Latent Class Model*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q18: How would you perform an *Observation-Based Clustering* for *Time-Series Data*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q19: Name some pros and cons of _Mean Shift Clustering_ ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q20: How can *Evolutionary Algorithms* be used for *Clustering*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q21: What is _Silhouette Analysis_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q22: Why does *K-Means* have a higher *bias* when compared to *Gaussian Mixture Model*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q23: Explain how a cluster is formed in the *DBSCAN* Clustering Algorithm ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q24: What makes the distance measurement of *k-Medoids* better than *k-Means*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q25: When using various Clustering Algorithms, why is *Euclidean Distance* not a good metric in _High Dimensions_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q26: When would you use *Hierarchical Clustering* over *k-Means Clustering*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q27: How would you choose the number of *Clusters* when designing a *K-Medoid Clustering Algorithm*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q28: Explain the *Dirichlet Process Gaussian Mixture Model* ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q29: Why is *Euclidean Distance* not good for *Sparse Data*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q30: When would you use *Segmentation* over *Clustering*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q31: How to tell if data is _clustered_ enough for clustering algorithms to produce meaningful results? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q32: How to choose among the various clustering _Distance Measures_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q33: Explain the different frameworks used for *k-Means Clustering* ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q34: What is the motivation behind the *Expectation-Maximization Algorithm*? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q35: What is the relationship between *k-Means Clustering* and *PCA*? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n## [[⬆]](#toc) \u003ca name=CostFunction\u003eCost Function\u003c/a\u003e Interview Questions\n#### Q1: Provide an analogy for a _Cost Function_ in real life ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q2: Explain what is _Cost (Loss) Function_ in Machine Learning? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q3: What is the difference between _Cost Function_ vs _Gradient Descent_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q4: What is the difference between _Objective function_, _Cost function_ and _Loss function_ ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q5: Why don’t we use _Mean Squared Error_ as a cost function in Logistic Regression? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q6: How would you fix Logistic Regression _Overfitting_ problem? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q7: What is the *Hinge Loss* in SVM? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q8: What type of *Cost Functions* do *Greedy Splitting* use? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q9: How would you choose the *Loss Function* for a Deep Learning model? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n## [[⬆]](#toc) \u003ca name=DataStructures\u003eData Structures\u003c/a\u003e Interview Questions\n#### Q1: Define Stack ⭐\n##### Answer:\nA **Stack** is a container of objects that are inserted and removed according to the last-in first-out (**LIFO**) principle. In the pushdown stacks only two operations are allowed: push the item into the stack, and pop the item out of the stack.\n\nThere are basically three operations that can be performed on stacks. They are:\n \n1. inserting an item into a stack (**push**). \n2. deleting an item from the stack (**pop**). \n3. displaying the contents of the stack (**peek** or **top**).\n\nA stack is a limited access data structure - elements can be added and removed from the stack only at the top. push adds an item to the top of the stack, pop removes the item from the top. A helpful analogy is to think of a stack of books; you can remove only the top book, also you can add a new book on the top.\n\n![](https://user-images.githubusercontent.com/13550565/85218531-fea33d80-b3cd-11ea-8ba4-77c37d446d07.png)\n\n\n**Source:** _www.cs.cmu.edu_\n\n#### Q2: Explain why Stack is a recursive data structure ⭐\n##### Answer:\nA **stack** is a **recursive** data structure, so it's:\n\n* a stack is either empty or\n* it consists of a top and the rest which is a stack by itself;\n\n**Source:** _www.cs.cmu.edu_\n\n#### Q3: Define Linked List ⭐\n##### Answer:\nA **linked list** is a linear data structure where each element is a separate object. Each element (we will call it a **node**) of a list is comprising of two items - the **data** and a **reference (pointer)** to the next node. The last node has a reference to **null**. The entry point into a linked list is called the **head** of the list. It should be noted that _head is not a separate node,_ but the reference to the first node. If the list is empty then the head is a null reference.\n\n**Source:** _www.cs.cmu.edu_\n\n#### Q4: Name some characteristics of Array Data Structure ⭐\n##### Answer:\nArrays are: \n* **Finite (fixed-size)** - An array is finite because it contains only limited number of elements.\n* **Order** -All the elements are stored one by one , in contiguous  location of computer memory in a linear order and fashion\n* **Homogenous** - All  the elements of an array are of  same  data types only  and hence  it is termed as collection of homogenous\n\n**Source:** _codelack.com_\n\n#### Q5: What is Queue? ⭐\n##### Answer:\nA **queue** is a container of objects (a _linear_ collection) that are inserted and removed according to the first-in first-out (FIFO) principle. The process to add an element into queue is called **Enqueue** and the process of removal of an element from queue is called **Dequeue**.\n\n![](https://user-images.githubusercontent.com/13550565/85218641-9d2f9e80-b3ce-11ea-8c1b-a9c058057a70.png)\n\n\n**Source:** _www.cs.cmu.edu_\n\n#### Q6: What is Heap? ⭐\n##### Answer:\nA **Heap** is a special Tree-based data structure which is an almost complete tree that satisfies the heap property:\n\n* in a **max heap**, for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C. \n* In a **min heap**, the key of P is less than or equal to the key of C. The node at the \"top\" of the heap (with no parents) is called the root node.\n\n\nA common implementation of a heap is the binary heap, in which the tree is a **binary tree.**\n\n\n![](https://www.techiedelight.com/wp-content/uploads/2016/11/Min-Max-Heap.png)\n\n\n**Source:** _www.geeksforgeeks.org_\n\n##### Complexity Analysis:\n**Time Complexity**: None\n**Space Complexity**: None\n#### Q7: What is Hash Table? ⭐\n##### Answer:\nA **hash table** (hash map) is a data structure that implements an **associative** array abstract data type, a **structure** that can **map keys to values**. Hash tables implement an associative array, which is indexed by arbitrary objects (keys). A hash table uses a **hash function** to compute an **index**, also called a **hash value**, into an **array of buckets** or slots, from which the desired **value** can be found.\n\n\n![](https://i.stack.imgur.com/0yjYd.png)\n\n\n**Source:** _en.wikipedia.org_\n\n#### Q8: What is Priority Queue? ⭐\n##### Answer:\nA **priority queue** is a data structure that stores **priorities** (comparable values) and perhaps associated information.  A **priority queue** is different from a \"normal\" queue, because instead of being a \"first-in-first-out\" data structure, values come out in order by **priority**. Think of a priority queue as a kind of bag that holds priorities. You can put one in, and you can take out the current highest priority.\n\n![](https://cdn.programiz.com/sites/tutorial2program/files/Introduction.png)\n\n\n**Source:** _pages.cs.wisc.edu_\n\n##### Complexity Analysis:\n**Time Complexity**: None\n**Space Complexity**: None\n#### Q9: Define Tree Data Structure ⭐\n##### Answer:\n**Trees** are well-known as a _non-linear_ data structure. They don’t store data in a linear way. They organize data _hierarchically_.\n\nA **tree** is a collection of entities called **nodes**. Nodes are connected by **edges**. Each node contains a **value** or **data** or **key**, and it may or may not have a **child** node. The first node of the tree is called the **root**. **Leaves** are the last nodes on a tree. They are nodes without children.\n\n\n\n![](https://miro.medium.com/max/975/1*PWJiwTxRdQy8A_Y0hAv5Eg.png)\n\n\n**Source:** _www.freecodecamp.org_\n\n##### Complexity Analysis:\n**Time Complexity**: None\n**Space Complexity**: None\n#### Q10: What is a Graph? ⭐\n##### Answer:\nA **graph** is a common data structure that consists of a finite set of **nodes** (or **vertices**) and a set of **edges** connecting them. A pair `(x,y)` is referred to as an edge, which communicates that the **x vertex** connects to the **y vertex**.\n\nGraphs are used to solve real-life problems that involve representation of the problem space as a **network**. Examples of networks include telephone networks, circuit networks, social networks (like LinkedIn, Facebook etc.).\n\n\n\n![](https://miro.medium.com/max/1640/1*4s5Z7gVwVqmKcslgiamRyw.png)\n\n\n**Source:** _www.educative.io_\n\n##### Complexity Analysis:\n**Time Complexity**: None\n**Space Complexity**: None\n#### Q11: What is String in Data Structures? ⭐\n##### Answer:\nA **string** is generally considered as a **data type** and is often implemented as an **array data structure** of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.\n\n**Source:** _dev.to_\n\n##### Complexity Analysis:\n**Time Complexity**: None\n**Space Complexity**: None\n#### Q12: What is Trie? ⭐\n##### Answer:\n**Trie** (also called **digital tree **or **prefix tree**) is a _tree-based data structure_, which is used for efficient _retrieval_ of a key in a large data-set of strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated; i.e., **the value of the key is distributed across the structure**. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Each complete English word has an arbitrary integer value associated with it (see image).\n\n\u003cbr/\u003e\n\n![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/ae/Patricia_trie.svg/1200px-Patricia_trie.svg.png)\n\u003cbr/\u003e\n\n**Source:** _medium.com_\n\n##### Complexity Analysis:\n**Time Complexity**: None\n**Space Complexity**: None\n#### Q13: Define Binary Tree ⭐\n##### Answer:\nA normal tree has no restrictions on the number of children each node can have. A **binary tree** is made of nodes, where each node contains a \"left\" pointer, a \"right\" pointer, and a data element. \n\nThere are three different types of binary trees:\n\n* **Full binary tree**: Every node other than leaf nodes has 2 child nodes.\n* **Complete binary tree**: All levels are filled except possibly the last one, and all nodes are filled in as far left as possible.\n* **Perfect binary tree**: All nodes have two children and all leaves are at the same level.\n\n\n\n![](https://study.com/cimages/multimages/16/0e0646ba-30e5-40d9-b45c-a138f038f05b_full_complete_perfect.png)\n\n\n\n**Source:** _study.com_\n\n##### Complexity Analysis:\n**Time Complexity**: None\n**Space Complexity**: None\n#### Q14: Why and when should I use Stack or Queue data structures instead of Arrays/Lists? ⭐⭐\n##### Answer:\nBecause they help manage your data in more a _particular_ way than arrays and lists. It means that when you're debugging a problem, you won't have to wonder if someone randomly inserted an element into the middle of your list, messing up some invariants.\n\nArrays and lists are random access. They are very flexible and also easily *corruptible*. If you want to manage your data as FIFO or LIFO it's best to use those, already implemented, collections.\n\nMore practically you should:\n* Use a queue when you want to get things out in the order that you put them in (FIFO)\n* Use a stack when you want to get things out in the reverse order than you put them in (LIFO)\n* Use a list when you want to get anything out, regardless of when you put them in (and when you don't want them to automatically be removed).\n\n**Source:** _stackoverflow.com_\n\n#### Q15: What is Complexity Analysis of Queue operations?  ⭐⭐\n##### Answer:\n* Queues offer random access to their contents by shifting the first element off the front of the queue. You have to do this repeatedly to access an arbitrary element somewhere in the queue. Therefore, **access** is \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003en\u003c/i\u003e)\u003c/code\u003e.\n* Searching for a given value in the queue requires iterating until you find it. So **search** is \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003en\u003c/i\u003e)\u003c/code\u003e.\n* Inserting into a queue, by definition, can only happen at the back of the queue, similar to someone getting in line for a delicious Double-Double burger at In 'n Out. Assuming an efficient queue implementation, queue **insertion** is \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003e1\u003c/i\u003e)\u003c/code\u003e.\n* Deleting from a queue happens at the front of the queue. Assuming an efficient queue implementation, queue **deletion** is `\u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003e1\u003c/i\u003e)\u003c/code\u003e.\n\n**Source:** _github.com_\n\n#### Q16: What are some types of Queue? ⭐⭐\n##### Answer:\n Queue can be classified into following types:\n\n* **Simple Queue** - is a linear data structure in which removal of elements is done in the same order they were inserted i.e., the element will be removed first which is inserted first.\n\n![](https://scanftree.com/Data_Structure/queues.png)\n\n* **Circular Queue** - is a linear data structure in which the operations are performed based on FIFO (First In First Out) principle and the last position is connected back to the first position to make a circle. It is also called **Ring Buffer**. Circular queue avoids the wastage of space in a regular queue implementation using arrays.\n\n\n![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Ring_buffer.svg/794px-Ring_buffer.svg.png)\n\n\n* **Priority Queue** - is a type of queue where each element has a priority value and the deletion of the elements is depended upon the priority value\n\n![](https://www.aoprogrammer.com/wp-content/uploads/2018/07/enqueue-priority-queue.png)\n\n  * In case of **max-priority queue**, the element will be deleted first which has the largest priority value\n  * In case of **min-priority queue** the element will be deleted first which has the minimum priority value.\n* **De-queue (Double ended queue)** - allows insertion and deletion from both the ends i.e. elements can be added or removed from rear as well as front end.\n\n![](https://i.imgur.com/AF3RVpP.png)\n\n  * **Input restricted deque** - In input restricted double ended queue, the insertion operation is performed at only one end and deletion operation is performed at both the ends.\n\n![](https://i.imgur.com/dSJUHFE.png)\n\n  * **Output restricted deque** - In output restricted double ended queue, the deletion operation is performed at only one end and insertion operation is performed at both the ends.\n\n\n![](https://i.imgur.com/vyHyffe.png)\n\n \n\n\n**Source:** _www.ques10.com_\n\n#### Q17: What are some types of Linked List? ⭐⭐\n##### Answer:\n* A **singly linked list**\n\n![](https://www.andrew.cmu.edu/course/15-121/lectures/Linked%20Lists/pix/linkedlist.bmp)\n* A **doubly linked list** is a list that has two references, one to the next node and another to previous node.\n\n![](https://www.andrew.cmu.edu/course/15-121/lectures/Linked%20Lists/pix/doubly.bmp)\n* A **multiply linked list** - each node contains two or more link fields, each field being used to connect the same set of data records in a different order of same set(e.g., by name, by department, by date of birth, etc.).\n* A **circular linked list** - where last node of the list points back to the first node (or the head) of the list.\n\n![](https://i2.wp.com/algorithms.tutorialhorizon.com/files/2016/03/Circular-Linked-List.png)\n\n\n**Source:** _www.cs.cmu.edu_\n\n#### Q18: What are Dynamic Arrays? ⭐⭐\n##### Answer:\nA **dynamic array** is an array with a big improvement: _automatic resizing_.\n\nOne limitation of arrays is that they're _fixed_ size, meaning you need to specify the number of elements your array will hold ahead of time. A dynamic array expands as you add more elements. So you don't need to determine the size ahead of time.\n\n**Source:** _www.interviewcake.com_\n\n#### Q19: Return the N-th value of the Fibonacci sequence. Solve in _`O(n)`_ time ⭐⭐\n##### Answer:\nThe easiest solution that comes to mind here is iteration: \n\n```js\nfunction fib(n){\n  let arr = [0, 1];\n  for (let i = 2; i \u003c n + 1; i++){\n    arr.push(arr[i - 2] + arr[i -1])\n  }\n return arr[n]\n}\n```\nAnd output:\n```\nfib(4)\n=\u003e 3\n```\n\n\nNotice that two first numbers can not really be effectively generated by a for loop, because our loop will involve adding two numbers together, so instead of creating an empty array we assign our arr variable to `[0, 1]` that we know for a fact will always be there. After that we create a loop that starts iterating from i = 2 and adds numbers to the array until the length of the array is equal to `n + 1`. Finally, we return the number at n index of array.\n\n**Source:** _medium.com_\n\n##### Complexity Analysis:\n**Time Complexity**: O(n)\n**Space Complexity**: O(n)\n\nAn algorithm in our iterative solution takes linear time to complete the task. Basically we iterate through the loop n-2 times, so Big O (notation used to describe our worst case scenario) would be simply equal to O`(n)` in this case. The space complexity is `O(n)`.\n##### Implementation:\n##### _JS_\n\n```js\nfunction fib(n){\n  let arr = [0, 1]\n  for (let i = 2; i \u003c n + 1; i++){\n    arr.push(arr[i - 2] + arr[i -1])\n  }\n return arr[n]\n}\n```\n\n##### _Java_\n\n```java\ndouble fibbonaci(int n){\n    double prev=0d, next=1d, result=0d;\n    for (int i = 0; i \u003c n; i++) {\n        result=prev+next;\n        prev=next;\n        next=result;\n    }\n    return result;\n}\n```\n\n##### _PY_\n\n```py\ndef fib_iterative(n):\n    a, b = 0, 1\n    while n \u003e 0:\n        a, b = b, a + b\n        n -= 1\n    return a\n```\n\n#### Q20: Name some disadvantages of Linked Lists? ⭐⭐\n##### Answer:\nFew disadvantages of linked lists are :\n\n* They use more memory than arrays because of the storage used by their pointers.\n* Difficulties arise in linked lists when it comes to reverse traversing. For instance, singly linked lists are cumbersome to navigate backwards and while doubly linked lists are somewhat easier to read, memory is wasted in allocating space for a back-pointer.\n* Nodes in a linked list must be read in order from the beginning as linked lists are inherently sequential access.\n* Random access has linear time.\n* Nodes are stored incontiguously (no or poor cache locality), greatly increasing the time required to access individual elements within the list, especially with a CPU cache.\n* If the link to list's node is accidentally destroyed then the chances of data loss after the destruction point is huge. Data recovery is not possible.\n* Search is linear versus logarithmic for sorted arrays and binary search trees.\n* Different amount of time is required to access each element.\n* Not easy to sort the elements stored in the linear linked list.\n\n**Source:** _www.quora.com_\n\n#### Q21: Return the N-th value of the Fibonacci sequence Recursively ⭐⭐\n##### Answer:\nRecursive solution looks pretty simple (see code).\n\nLet’s look at the diagram that will help you understand what’s going on here with the rest of our code. Function fib is called with argument 5:\n\n![](https://miro.medium.com/max/1400/1*LNBBacuaBFOVZXUV6VgEEg.png)\n\nBasically our **fib** function will continue to recursively call itself creating more and more branches of the tree until it hits the base case, from which it will start summing up each branch’s return values bottom up, until it finally sums them all up and returns an integer equal to 5.\n\n**Source:** _medium.com_\n\n##### Complexity Analysis:\n**Time Complexity**: O(2^n)\n\nIn case of recursion the solution take **exponential** time, that can be explained by the fact that the size of the tree exponentially grows when n increases. So for every additional element in the Fibonacci sequence we get an increase in function calls. Big O in this case is equal to \u003ccode\u003e\u003ci\u003eO\u003c/i\u003e(\u003ci\u003e2\u003c/i\u003e\u003csup\u003en\u003c/sup\u003e)\u003c/code\u003e. Exponential Time complexity denotes an algorithm whose growth doubles with each addition to the input data set. \n##### Implementation:\n##### _JS_\n\n```js\nfunction fib(n) {\n  if (n \u003c 2){\n    return n\n  }\n  return fib(n - 1) + fib (n - 2)\n}\n```\n\n##### _Java_\n\n```java\npublic int fibonacci(int n)  {\n    if (n \u003c 2) return n;\n\n    return fibonacci(n - 1) + fibonacci(n - 2);\n}\n```\n\n##### _PY_\n\n```py\ndef F(n):\n    if n == 0: return 0\n    elif n == 1: return 1\n    else: return F(n-1)+F(n-2)\n```\n\n#### Q22: What is the space complexity of a Hash Table? ⭐⭐\n##### Answer:\nThe space complexity of a datastructure indicates how much space it occupies in relation to the amount of elements it holds. For example a space complexity of `O(1)` would mean that the datastructure alway consumes constant space no matter how many elements you put in there. `O(n)` would mean that the space consumption grows linearly with the amount of elements in it.\n\nA **hashtable** typically has a space complexity of `O(n)`.\n\n**Source:** _stackoverflow.com_\n\n#### Q23: What is Binary Heap? ⭐⭐\n##### Answer:\nA **Binary Heap** is a _Binary Tree_ with following properties:\n\n* It’s a _complete_ tree (all levels are completely filled except possibly the last level and the last level has all keys as left as possible). This property of Binary Heap makes them suitable to be stored in an array.\n* A Binary Heap is either **Min Heap** or **Max Heap**. In a Min Binary Heap, the key at root must be minimum among all keys present in Binary Heap. The same property must be recursively true for all nodes in Binary Tree. Max Binary Heap is similar to MinHeap.\n\n```js\n            10                      10\n         /      \\               /       \\  \n       20        100          15         30  \n      /                      /  \\        /  \\\n    30                     40    50    100   40\n```\n\n**Source:** _www.geeksforgeeks.org_\n\n##### Complexity Analysis:\n**Time Complexity**: None\n**Space Complexity**: None\n#### Q24: What is Binary Search Tree? ⭐⭐\n##### Answer:\n**Binary search tree** is a data structure that quickly allows to maintain a _sorted list_ of numbers.\n\n* It is called a _binary tree_ because each tree node has maximum of two children.\n* It is called a _search tree_ because it can be used to search for the presence of a number in `O(log n)` time.\n\nThe properties that separates a binary search tree from a regular binary tree are:\n\n* All nodes of left subtree are less than root node\n* All nodes of right subtree are more than root node\n* Both subtrees of each node are also BSTs i.e. they have the above two properties\n\n\n![](https://cdn.programiz.com/sites/tutorial2program/files/bst-vs-not-bst.jpg)\n\n\n**Source:** _www.programiz.com_\n\n##### Complexity Analysis:\n**Time Complexity**: None\n**Space Complexity**: None\n#### Q25: What is the difference between Strings vs. Char arrays? ⭐⭐\n##### Answer:\n**Char arrays**: \n* Static-sized\n* Fast access\n* Few built-in methods to manipulate strings\n* A char array doesn’t define a data type\n\n**Strings**:\n* Slower access\n* Define a data type\n* Dynamic allocation\n* More built-in functions to support string manipulations\n\n**Source:** _dev.to_\n\n##### Complexity Analysis:\n**Time Complexity**: None\n**Space Complexity**: None\n#### Q26: How to implement a _Tree_ data-structure? Provide some code. ⭐⭐\n##### Answer:\nThat is a basic (generic) tree structure that can be used for `String` or any other object:\n\n**Source:** _stackoverflow.com_\n\n##### Complexity Analysis:\n**Time Complexity**: None\n**Space Complexity**: None\n##### Implementation:\n##### _Java_\n\n```java\npublic class Tree\u003cT\u003e {\n    private Node\u003cT\u003e root;\n\n    public Tree(T rootData) {\n        root = new Node\u003cT\u003e();\n        root.data = rootData;\n        root.children = new ArrayList\u003cNode\u003cT\u003e\u003e();\n    }\n\n    public static class Node\u003cT\u003e {\n        private T data;\n        private Node\u003cT\u003e parent;\n        private List\u003cNode\u003cT\u003e\u003e children;\n    }\n}\n```\n\n##### _PY_\n\n\nGeneric Tree:\n\n```py\nclass Tree(object):\n    \"Generic tree node.\"\n    def __init__(self, name='root', children=None):\n        self.name = name\n        self.children = []\n        if children is not None:\n            for child in children:\n                self.add_child(child)\n    def __repr__(self):\n        return self.name\n    def add_child(self, node):\n        assert isinstance(node, Tree)\n        self.children.append(node)\n#    *\n#   /|\\\n#  1 2 +\n#     / \\\n#    3   4\nt = Tree('*', [Tree('1'),\n               Tree('2'),\n               Tree('+', [Tree('3'),\n                          Tree('4')])])\n```\nBinary tree:\n\n```py\nclass Tree:\n    def __init__(self):\n        self.left = None\n        self.right = None\n        self.data = None\n```\n\n#### Q27: Convert a _Singly Linked List_ to _Circular Linked List_ ⭐⭐\n##### Answer:\nTo convert a singly linked list to a circular linked list, we will set the next pointer of the tail node to the head pointer.\n\n*   Create a copy of the head pointer, let's say `temp`.\n*   Using a loop, traverse linked list till tail node (last node) using temp pointer.\n*   Now set the next pointer of the tail node to head node. `temp-\u003enext = head`\n\n**Source:** _www.techcrashcourse.com_\n\n##### Implementation:\n##### _PY_\n\n```py\ndef convertTocircular(head):\n    # declare a node variable\n    # start and assign head\n    # node into start node.\n    start = head\n    \n    # check that\n    while head.next\n    # not equal to null then head\n    # points to next node.\n    while(head.next is not None):\n      head = head.next\n    \n    #\n    if head.next points to null\n    # then start assign to the\n    # head.next node.\n    head.next = start\n    return start\n```\n\n\n#### Q28: What's the difference between the data structure Tree and Graph? ⭐⭐\n##### Answer:\n**Graph:**\n* Consists of a set of vertices (or nodes) and a set of edges connecting some or all of them\n* Any edge can connect any two vertices that aren't already connected by an identical edge (in the same direction, in the case of a directed graph)\n* Doesn't have to be connected (the edges don't have to connect all vertices together): a single graph can consist of a few disconnected sets of vertices\n* Could be directed or undirected (which would apply to all edges in the graph)\n\n**Tree:**\n* A type of graph (fit with in the category of Directed Acyclic Graphs (or a DAG))\n* Vertices are more commonly called \"nodes\"\n* Edges are directed and represent an \"is child of\" (or \"is parent of\") relationship\n* Each node (except the root node) has exactly one parent (and zero or more children)\n* Has exactly one \"root\" node (if the tree has at least one node), which is a node without a parent\n* Has to be connected\n* Is acyclic, meaning it has no cycles: \"a cycle is a path [AKA sequence] of edges and vertices wherein a vertex is reachable from itself\"\n* Trees aren't a recursive data structure\n\n\n\n![](https://miro.medium.com/max/2262/1*-yHATwTlY2hwceJ93-D-cw.jpeg)\n\n\n**Source:** _stackoverflow.com_\n\n##### Complexity Analysis:\n**Time Complexity**: None\n**Space Complexity**: None\n#### Q29: Under what circumstances are Linked Lists useful? ⭐⭐\n##### Answer:\nLinked lists are very useful when you need :\n* to do a lot of insertions and removals, but not too much searching, on a list of arbitrary (unknown at compile\\-time) length.\n* splitting and joining (bidirectionally\\-linked) lists is very efficient.\n* You can also combine linked lists \\- e.g. tree structures can be implemented as \"vertical\" linked lists (parent/child relationships) connecting together horizontal linked lists (siblings).\n\nUsing an array based list for these purposes has severe limitations:\n\n*   Adding a new item means the array must be reallocated (or you must allocate more space than you need to allow for future growth and reduce the number of reallocations)\n*   Removing items leaves wasted space or requires a reallocation\n*   inserting items anywhere except the end involves (possibly reallocating and) copying lots of the data up one position\n\n**Source:** _stackoverflow.com_\n\n#### Q30: Implement _Pre-order Traversal_ of _Binary Tree_ using _Recursion_ ⭐⭐\n##### Answer:\nFor traversing a (non-empty) binary tree in pre-order fashion, we must do these three things for every node `N` starting from root node of the tree:\n\n* (N) Process `N` itself.\n* (L) Recursively traverse its _left_ subtree. When this step is finished we are back at N again.\n* (R) Recursively traverse its _right_ subtree. When this step is finished we are back at N again.\n\n![](https://www.techiedelight.com/wp-content/uploads/Preorder-Traversal.png)\n\n**Source:** _github.com_\n\n##### Complexity Analysis:\n**Time Complexity**: O(n)\n**Space Complexity**: O(n)\n##### Implementation:\n##### _Java_\n\n```java\n// Recursive function to perform pre-order traversal of the tree\npublic static void preorder(TreeNode root)\n{\n    // return if the current node is empty\n    if (root == null) {\n        return;\n    }\n \n    // Display the data part of the root (or current node)\n    System.out.print(root.data + \" \");\n \n    // Traverse the left subtree\n    preorder(root.left);\n \n    // Traverse the right subtree\n    preorder(root.right);\n}\n```\n\n\n##### _PY_\n\n```py\n# Recursive function to perform pre-order traversal of the tree\ndef preorder(root):\n \n    # return if the current node is empty\n    if root is None:\n        return\n \n    # Display the data part of the root (or current node)\n    print(root.data, end=' ')\n \n    # Traverse the left subtree\n    preorder(root.left)\n \n    # Traverse the right subtree\n    preorder(root.right)\n```\n\n\n#### Q31: What is an Associative Array? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q32: What does Sparse Array mean? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q33: How to merge two sorted _Arrays_ into a _Sorted Array_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q34: Explain how _Heap Sort_ works ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q35: What is complexity of Hash Table? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q36: LIS: Find length of the _longest increasing subsequence (LIS)_ in the array. Solve using DP. ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q37: Compare Heaps vs Arrays to implement Priority Queue ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q38: How to check if two Strings (words) are _Anagrams_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q39: Name some application of Trie data structure ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q40: Find all the _Permutations_ of a String ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q41: What is AVL Tree? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q42: What is Balanced Tree and why is that important? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q43: Name some common types and categories of Graphs ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q44: Convert a Binary Tree to a Doubly Linked List ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q45: Can you do _Iterative Pre-order Traversal_ of a _Binary Tree_ without _Recursion_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q46: Explain how _QuickSort_ works ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q47: Binet's formula: How to calculate Fibonacci numbers without Recursion or Iteration?  ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q48: What are some main advantages of Tries over Hash Tables ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q49: How would you traverse a Linked List in \u003ci\u003e\u003ccode\u003eO(n\u003csup\u003e1/2\u003c/sup\u003e)\u003c/code\u003e\u003c/i\u003e? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q50: Explain what is _Fibonacci Search_ technique? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q51: What are Pascal Strings? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q52: When is doubly linked list more efficient than singly linked list? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q53: What is Red-Black tree? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q54: How To Choose Between a Hash Table and a Trie (Prefix Tree)? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q55: How to implement 3 _Stacks_ with one _Array_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q56: Find the _length_ of a Linked List which contains _Cycle (Loop)_ ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q57: What is Rope Data Structure is used for? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q58: Explain what is B-Tree? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q59: What is Bipartite Graph? How to detect one? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q60: Compare lookup operation in Trie vs Hash Table ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q61: How are B-Trees used in practice? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n## [[⬆]](#toc) \u003ca name=Databases\u003eDatabases\u003c/a\u003e Interview Questions\n#### Q1: What is _Normalisation_? ⭐⭐\n##### Answer:\n**Normalization** is basically to design a database schema such that **duplicate and redundant data is avoided**. If the same information is repeated in multiple places in the database, there is the risk that it is updated in one place but not the other, leading to data corruption.\n\nThere is a number of normalization levels from 1. normal form through 5. normal form. Each normal form describes how to get rid of some specific problem.\n\nBy having a database with normalization errors, you open the risk of getting invalid or corrupt data into the database. Since data \"lives forever\" it is very hard to get rid of corrupt data when first it has entered the database.\n\n**Source:** _stackoverflow.com_\n\n#### Q2: What is the difference between _Data Definition Language (DDL)_ and _Data Manipulation Language (DML)_? ⭐⭐\n##### Answer:\n* **Data definition language (DDL)** commands are the commands which are used to define the database. **CREATE**, **ALTER**, **DROP** and **TRUNCATE** are some common DDL commands.\n\n* **Data manipulation language (DML)** commands are commands which are used for manipulation or modification of data. **INSERT**, **UPDATE** and **DELETE** are some common DML commands.\n\n**Source:** _en.wikibooks.org_\n\n#### Q3: What are the advantages of NoSQL over traditional RDBMS? ⭐⭐\n##### Answer:\n**NoSQL is better** than RDBMS because of the following reasons/properities of NoSQL:\n\n* It supports semi-structured data and volatile data\n* It does not have schema\n* Read/Write throughput is very high\n* Horizontal **scalability** can be achieved easily\n* Will support Bigdata in volumes of Terra Bytes \u0026 Peta Bytes\n* Provides good support for Analytic tools on top of Bigdata\n* Can be hosted in cheaper hardware machines\n* In-memory caching option is available to increase the performance of queries\n* Faster development life cycles for developers\n\nStill, **RDBMS is better** than NoSQL for the following reasons/properties of RDBMS:\n* Transactions with **ACID** properties - Atomicity, Consistency, Isolation \u0026 Durability\n* Adherence to **Strong Schema** of data being written/read\n* Real time query management ( in case of data size \u003c 10 Tera bytes )\n* Execution of complex queries involving **join** \u0026 **group by** clauses\n\n**Source:** _stackoverflow.com_\n\n#### Q4: Define ACID Properties ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q5: How a database index can help performance? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q6: What is Denormalization? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q7: What are the difference between _Clustered_ and a _Non-clustered_ index? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q8: What's the difference between a _Primary Key_ and a _Unique Key_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q9: When would you use NoSQL? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q10: When should I use a NoSQL database instead of a relational database? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q11: What is Optimistic locking? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q12: What Is ACID Property Of A System? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q13: What is the _cost_ of having a database _index_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q14: Explain the difference between _Exclusive Lock_ and _Update Lock_ ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q15: How does _B-trees Index_ work? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q16: Explain eventual consistency in context of NoSQL ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q17: How do you track record relations in NoSQL? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q18: What Is Sharding? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q19: What Is BASE Property Of A System? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q20: How do you off load work from the Database? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q21: What are some _other_ types of Indexes (vs B-Trees)? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q22: Name some disadvantages of a _Hash index_ ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q23: What is _Optimistic Locking_ and _Pessimistic Locking_? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q24: How does database _Indexing_ work? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q25: What is the difference between _B-Tree_, _R-Tree_ and _Hash_ indexing? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q26: Explain the differences in conceptual data design with NoSQL databases? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q27: What Does Eventually Consistent Mean? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q28: Is the C in ACID is not the C in CAP? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q29: How do you make schema changes to a live database without downtime? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q30: Why you should never use GUIDs as part of clustered index? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n## [[⬆]](#toc) \u003ca name=Datasets\u003eDatasets\u003c/a\u003e Interview Questions\n#### Q1: What's the difference between _Covariance_ and _Correlation_? ⭐⭐\n##### Answer:\n- **Covariance** measures whether a **variation** in one _variable_ results in a variation in _another variable_, and deals with the linear relationship of only `2` variables in the dataset. Its value can take range from  `-∞` to `+∞`. Simply speaking **Covariance** indicates the direction of the linear relationship between variables. \n\n![](https://www.simplilearn.com/ice9/free_resources_article_thumb/covx-y.jpg)\n\n- **Correlation** measures how strongly two or more variables are **related** to each other. Its values are between `-1` to `1`. **Correlation** measures both the strength and direction of the linear relationship between two variables. Correlation is a function of the covariance.\n\n\n**Source:** _careerfoundry.com_\n\n#### Q2: Would you use _K-NN_ for large datasets? ⭐⭐\n##### Answer:\nIt's not recommended to perform **K-NN** on large datasets, given that the computational and memory cost can increase. To understand the reason why we should remember how the **K-NN** algorithm works:\n\n1. Starts by calculating the distances to all vectors in a training set and store them.\n2. Then, it sorts the calculated distances.\n3. Then, we store the K nearest vectors.\n4. And finally, calculate the most frequent class displayed by K nearest vectors.\n\nSo implement **K-NN** on a large dataset it is not only a bad decision to store a large amount of data but it is also computationally costly to keep calculating and sorting all the values. For that reason, **K-NN** is not recommended and another classification algorithm like _**Naive Bayes**_ or _**SVM**_ is preferred in such cases.\n\n\n**Source:** _towardsdatascience.com_\n\n#### Q3: What is *Cross-Validation* and why is it important in supervised learning? ⭐⭐\n##### Answer:\n* ***Cross-validation*** is a method of assessing _how the results of a statistical analysis will generalize on an independent dataset_,\n* It can be used in machine learning tasks to _evaluate the predictive capability of the model_,\n*   It also helps us to _avoid overfitting and underfitting_,\n* A common way to cross-validate is to divide the dataset into *training*, *validation*, and *testing* where:\n\n  * **Training dataset** is a dataset of known data on which the training is run.\n  * **Validation dataset** is the dataset that is *unknown* against which the model is tested. The validation dataset is used after each epoch of learning to gauge the improvement of the model.\n  * **Testing dataset** is also an unknown dataset that is used to test the model. The testing dataset is used to measure the performance of the model after it has finished learning.\n\n![cross_validation](https://s3.ap-south-1.amazonaws.com/myinterviewtrainer-domestic/public_assets/assets/000/000/094/original/Cross-Validation.png?1614946006)\n\n**Source:** _en.wikipedia.org_\n\n#### Q4: How does _K-fold Cross Validation_ work? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q5: What is the difference between _Test Set_ and _Validation Set_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q6: What are the assumptions before applying the _OLS estimator_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q7: What are the difference between _Type I_ and _Type II_ errors? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q8: What's the difference between _Bagging_ and _Boosting_ algorithms? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q9: What's the difference between _One-vs-Rest_ and _One-vs-One_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q10: What are some _disadvantages_ of using Decision Trees and how would you solve them? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q11: Name some best practices for working with Datasets ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q12: When you sample, what potential _Sampling Biases_ could you be inflicting? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q13: How would you determine the needed _Sample Size_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q14: What are some variations of _Cross-Validation_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q15: Explain what is an _Unrepresentative Dataset_ and how would you diagnose it? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q16: How would you detect _Heteroskedasticity_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q17: How would you address the problem of _Heteroskedasticity_ caused for a _Measurement error_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q18: How would you deal with _Outliers_ in your dataset? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q19: How would you deal with an _Imbalanced Dataset_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q20: What's the difference between _Random Oversampling_ and _Random Undersampling_ and when they can be used? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q21: How would you use a _Confusion Matrix_ for determining a model performance? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q22: What is *Multidimensional Scaling*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q23: Is _mean imputation_ of missing data acceptable practice? Why or why not? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q24: When would you use *_chi-Square_* or an *_ANOVA_* test? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q25: How would you handle _Missing Data_ and perform _Data Imputation_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q26: Compare _ Causation_ vs _Correlation_ ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q27: Which measures of _Variability_ would you use on your data? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q28: How does an ANOVA test work? ⭐⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n## [[⬆]](#toc) \u003ca name=DecisionTrees\u003eDecision Trees\u003c/a\u003e Interview Questions\n#### Q1: What are *Decision Trees*? ⭐\n##### Answer:\n* ***Decision trees*** is a tool that uses a *tree-like model* of decisions and their possible consequences. If an algorithm only contains *conditional control statements*, decision trees can model that algorithm really well.\n* *Decision trees* are a *non-parametric*, _supervised_ learning method.\n* *Decision trees* are used for *classification* and *regression* tasks.\n* The diagram below shows an example of a decision tree (the dataset used is the Titanic dataset to predict whether a passenger survived or not):\n\n![decision](https://miro.medium.com/max/540/1*XMId5sJqPtm8-RIwVVz2tg.png)\n\n**Source:** _towardsdatascience.com_\n\n#### Q2: Explain the _structure_ of a Decision Tree ⭐⭐\n##### Answer:\nA ***decision tree*** is a ***flowchart-like*** structure in which:\n* Each *internal node* represents the ***test*** on an attribute (e.g. outcome of a coin flip).\n* Each *branch* represents the **_outcome_** of the test.\n* Each *leaf node* represents a ***class label***.\n* The _paths_ from the root to leaf represent the ***classification rules***.\n\n![https://aiaspirant.com/wp-content/uploads/2020/02/dt_struct.png](https://aiaspirant.com/wp-content/uploads/2020/02/dt_struct.png)\n\n**Source:** _en.wikipedia.org_\n\n#### Q3: How are the different nodes of decision trees _represented_? ⭐⭐\n##### Answer:\nA **decision tree** consists of three **types** of nodes:\n* **Decision nodes:** Represented by **squares.** It is a node where a flow branches into several optional branches.\n* **Chance nodes:** Represented by **circles.** It represents the probability of certain results.\n* **End nodes:** Represented by **triangles.** It shows the final outcome of the decision path.\n\n![decision_nodes](https://upload.wikimedia.org/wikipedia/commons/a/ad/Decision-Tree-Elements.png)\n\n**Source:** _en.wikipedia.org_\n\n#### Q4: What are some _advantages_ of using Decision Trees? ⭐⭐\n##### Answer:\n* It is **simple to understand** and interpret. It can be **visualized** easily.\n* It **does not require as much data preprocessing** as other methods.\n* It can handle both **numerical** and **categorical** data.\n* It can handle **multiple output** problems.\n\n**Source:** _scikit-learn.org_\n\n#### Q5: What type of node is considered *Pure*? ⭐⭐\n##### Answer:\n* If the *Gini Index* of the data is `0` then it means that all the elements **belong to a specific class**. When this happens it is said to be *pure*.\n* When all of the data belongs to a single class (*pure*) then the *leaf node* is reached in the tree.\n* The leaf node represents the *class label* in the tree (which means that it gives the final output).\n\n![pure_node](https://miro.medium.com/max/2000/1*k4qcPhr04dHjciI5nFdQrw.png)\n\n**Source:** _medium.com_\n\n#### Q6: How is a _Random Forest_ related to _Decision Trees_? ⭐⭐\n##### Answer:\n* ***Random forest*** is an ***ensemble learning*** method that works by constructing a multitude of ***decision trees***. A random forest can be constructed for both classification and regression tasks.\n* Random forest **outperforms** decision trees, and it also does not have the habit of *overfitting* the data as decision trees do.\n* A decision tree trained on a specific dataset will become very deep and cause overfitting. To create a random forest, decision trees can be trained on different subsets of the training dataset, and then the different decision trees can be averaged with the goal of _decreasing the variance_.\n\n**Source:** _en.wikipedia.org_\n\n#### Q7: What is the difference between *OOB* score and *validation* score? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q8: How would you deal with an _Overfitted Decision Tree_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q9: What are some _disadvantages_ of using Decision Trees and how would you solve them? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q10: What is *Greedy Splitting*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q11: What type of *Cost Functions* do *Greedy Splitting* use? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q12: How would you define the *Stopping Criteria* for decision trees? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q13: Why do you need to *Prune* the decision tree? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q14: What is *Entropy*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q15: How do we _measure_ the Information? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q16: What is *Gini Index* and how is it used in Decision Trees? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q17: What is the *Chi-squared test*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q18: How does the *CART* algorithm produce *Classification Trees*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q19: How does the *CART* algorithm produce *Regression Trees*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q20: What is the difference between *Post-pruning* and *Pre-pruning*? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q21: Compare *Linear Regression* and *Decision Trees* ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q22: What is _Tree Bagging_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q23: What is _Tree Boosting_? ⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q24: How to use _Isolation Forest_ for Anomalies detection? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q25: Imagine that you know there are _outliers_ in your data, would you use _Logistic Regression_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q26: What is the use of *Entropy* pertaining to Decision Trees? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q27: While building Decision Tree how do you choose which attribute to _split_ at each node? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q28: What is difference between _Gini Impurity_ and _Entropy_ in Decision Tree? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q29: When should I use _Gini Impurity_ as opposed to _Information Gain (Entropy)_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q30: Explain the *CHAID* algorithm ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q31: What are some disadvantages of the *CHAID* algorithm? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q32: Explain how can *CART* algorithm performs _Pruning_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q33: Explain how *ID3* produces *classification trees*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q34: How would you compare different _Algorithms_ to build _Decision Trees_? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q35: Compare *ID3* and *C4.5* algorithms ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q36: Compare *C4.5* and *C5.0* algorithms ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q37: What is the relationship between *Information Gain* and *Information Gain Ratio*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q38: How do you _Gradient Boost_ decision trees? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q39: Compare *Decision Trees* and *Logistic Regression* ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q40: What are the differences between *Decision Trees* and *Neural Networks*? ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.mlstack.cafe'\u003eMLStack.Cafe\u003c/a\u003e\n\n#### Q41: Compare *Decision Trees* and *k-Nearest Neighbors* ⭐⭐⭐⭐\nRead answer on 👉 \u003ca href='https://www.m","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faershov24%2Fmachine-learning-ds-interview-questions","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faershov24%2Fmachine-learning-ds-interview-questions","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faershov24%2Fmachine-learning-ds-interview-questions/lists"}