{"id":17672246,"url":"https://github.com/felixfaisal/ml-notes","last_synced_at":"2026-01-08T02:52:19.122Z","repository":{"id":134541923,"uuid":"370663625","full_name":"felixfaisal/ML-Notes","owner":"felixfaisal","description":"This Repository will contain notes of my current machine learning course ","archived":false,"fork":false,"pushed_at":"2021-05-25T12:33:49.000Z","size":24,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-05T17:11:40.079Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/felixfaisal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-25T11:08:24.000Z","updated_at":"2021-05-25T12:33:51.000Z","dependencies_parsed_at":"2023-06-17T18:32:37.423Z","dependency_job_id":null,"html_url":"https://github.com/felixfaisal/ML-Notes","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felixfaisal%2FML-Notes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felixfaisal%2FML-Notes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felixfaisal%2FML-Notes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felixfaisal%2FML-Notes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/felixfaisal","download_url":"https://codeload.github.com/felixfaisal/ML-Notes/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246336267,"owners_count":20761016,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-24T04:07:29.641Z","updated_at":"2026-01-08T02:52:19.096Z","avatar_url":"https://github.com/felixfaisal.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Supervised Learning \n- In supervised learning, the labelled training data provide the basis for learning.\n- The process of learning from the training data by a machine can be related to an expert supervising the learning process of a student.\n- Here the expert is the training data.\n- Training data is the past information with known value of class field or ‘label’.\n- Unsupervised learning uses no labelled data.\n- Semi-supervised learning uses a small amount of labelled data.\n\n## Supervised vs Unsupervised Learning \n![SupvsUnsup](https://i.imgur.com/P6YxakM.png)\n\n## Classification Model \n- When we try to predict a categorical or nominal variable, the problem is known as a classification problem.\n- Here, the problem centres around assigning a label or category or class to the test data on the basis of the label or category or class information imparted by training data.\n- Classification is a type of supervised learning where a target feature, i.e. A categorical type, is predicted for test data on the basis of information obtained from training data.\n- This categorical feature is known as class.\n\n## Classification with learning steps \n![Steps](https://i.imgur.com/nsTz7yl.png)\n\n## Common Classification Algorithms \n1. **k-Nearest Neighbour (kNN)**\n2. **Decision tree**\n3. **Random forest**\n4. **Support Vector Machine (SVM)**\n5. **Naive Bayes classifier**\n\n## Origins of KNN \n- Nearest Neighbors have been used in statistical estimation and  pattern recognition already in the beginning of 1970’s (non-  parametric techniques).\n\n- The method prevailed in several disciplines and still it is one\nof the top 10 Data Mining algorithm.\n- It’s how people judge by observing our peers.\n- We tend to move with people of  similar attributes so does data.\n\n### Definition \n- **K-Nearest Neighbor** is considered a lazy learning algorithm  that classifies data sets based on their similarity with  neighbors.\n\n- “K” stands for number of data set items that are considered for the classification.\n\n-  Ex: Image shows classification for \n            different k-values.\n![KNN](https://i.imgur.com/JM2XRP0.png)\n\n- For the given attributes `A={X1, X2….. XD}` Where **D** is the  dimension of the data, we need to predict the corresponding  classification group `G={Y1,Y2…Yn}` using the proximity  metric over **K** items in **D** dimension that defines the closeness  of association such that `X € RD` and `Yp € G`.\n### That is \n![KNNExampe](https://i.imgur.com/bXXJrjv.png)\n- Attribute A={Color, Outline, Dot}\n- Classification Group,  G={triangle, square}\n- D=3, we are free to choose K value.\n\n## Proximity Metric\n- Definition: Also termed as “Similarity Measure” quantifies the  association among different items.\n- Following is a table of measures for different data items:\n\n![Data Measure](https://i.imgur.com/ALT9ixI.png)\n\n\n## Voronoi Diagram \n- A Voronoi diagram is a partitioning of a plane into regions based on distance to points in a specific subset of the plane.\n- Here, k=1.\n\n![VDiagram](https://i.imgur.com/zSq1cDy.png)\n\n## KNN Example \n![KNNExample](https://i.imgur.com/DnuwSvF.png)\n\n### Proximity Metric \nFor the numeric data let us consider some distance measures\n\n\n- Manhattan Distance \n\n![MtDistance](https://i.imgur.com/TcudVDh.png)\n\nEx: Given X = {1,2} \u0026 Y = {2,5}\nManhattan Distance = dist(X,Y) = |1-2|+|2-5|\n= 1+3\n= 4\n\n- Euclidean Distance \n\n![EuclideanDistance](https://i.imgur.com/NZ2nFre.png)\n\n## KNN in Action \n\n- Consider the following data:  A={weight,color}  G={Apple(A), Banana(B)}\n\n- We need to predict the type of a\nfruit with:  weight = 378  color = red\n\n![Dataset](https://i.imgur.com/X0sRMT1.png)\n\n- Assign color codes to convert into numerical data\n![ColorCode](https://i.imgur.com/Fzf7Ops.png)\n\n- Let’s label Apple as “A” and  Banana as “B”\n\n- Using K=3,\nOur result will be,\n\n![PlotKNN](https://i.imgur.com/AnHHFYq.png)\n\n## KNN Properties \n- K-NN is a lazy algorithm\n\n- The processing defers with respect to K value.\n\n- Result is generated after analysis of stored data.\n\n- It neglects any intermediate values.\n\n### Advantages\n- Can be applied to the data from any distribution,\nfor example, data does not have to be separable with a linear  boundary\n- Very simple and intuitive\n- Good classification if the number of samples is large enough\n\n### Disadvantages\n- Dependent on K Value\n- stage is computationally expensive\n- Need large number of samples for accuracy\n\n## DECISION TREE\n- This is one of the most adopted algorithms for classification.\n- It builds a model in the form of a tree structure.\n- A decision tree is used for multi-dimensional analysis with multiple classes and is characterized by ease of interpretation of rules and fast execution.\n- The goal of decision tree learning is to create a model that predicts the value of the output variable based on the input variables in the feature vector.\n- It contains a decision node and a leaf node.\n- Each decision node corresponds to one of the feature vector.\n- From every node, there are edges to children, wherein there is an edge for each of the possible values of the feature associated with the node.\n- The output variable is determined by following a path that starts at the root and is guided by the values of the input variables.\n- Decision trees can be used for both classification and regression.\n\n## Decision Tree to Play Tennis \n![DTree](https://i.imgur.com/VSciYUA.png)\n\n## Example: Will a Person Buy a computer? \n![CompTree](https://i.imgur.com/K5ZkTXl.png)\n\n## Example: Is a Person Fit? \n![image](https://user-images.githubusercontent.com/42486737/119488205-e7a46280-bd77-11eb-9e88-3abe740781ea.png)\n\n## Example Should loan be sactioned \n![image](https://user-images.githubusercontent.com/42486737/119488250-f559e800-bd77-11eb-983d-91fa500d64a8.png)\n\n## Training Data for GTS Recruitment \n![image](https://user-images.githubusercontent.com/42486737/119488323-06a2f480-bd78-11eb-9ff0-30923ad15832.png)\n\n\n## Entropy of a decision tree \n- Entropy, as it relates to machine learning, is a measure of the randomness in the information being processed. \n- The higher the entropy, the harder it is to draw any conclusions from that information.\n![image](https://user-images.githubusercontent.com/42486737/119488388-1c181e80-bd78-11eb-8142-40957844c2f7.png)\n\n- Ex: For class ‘Job Offered?’ we have two values: Yes and No.\n- Pi values for Yes= 8/18 = 0.44 \u0026 No= 10/18= 0.56\nEntropy(S)     = -0.44 log2(0.44) – 0.56 log2(0.56)\n        = 0.99\n\n## Information Gain on a Decision Tree \n\n- The information gain is created on the basis of the decrease in entropy(S) after a data set is split according to a particular attribute(A).\n- Constructing a decision tree is all about finding an attribute that returns the highest information gain.\n- If information gain is 0, it means that there is no reduction in entropy due to split of the data set according to that particular feature.\n- The maximum amount of information gain which may happen is the entropy of the data set before the split.\n- Information gain for a particular feature A is calculated by the difference in entropy before a split(Sbs) with the entropy after the split(Sas).\n- Information gain(S, A) = Entropy(Sbs) – Entropy(Sas)\n- For weighted summation, the proportion of examples falling into each partition is used as weight.\n- Entropy(Sas) = ∑ (i=1 to n) wi Entropy(pi)\n\n![image](https://user-images.githubusercontent.com/42486737/119488585-5bdf0600-bd78-11eb-90ef-feadc121beeb.png)\n\nc) Splitted data set(based on ‘Communication’)\nCommunication = ‘Good’    Communication = ‘Bad’\nTotal Entropy = 0.63                  Information Gain = 0.36\n\nd) Splitted data set(based on ‘Aptitude’)\nAptitude = ‘High’         Aptitude = ‘Low’\nTotal Entropy = 0.52                  Information Gain = 0.47(Entropy=0)\n\ne) Splitted data set(based on ‘Programming Skills’)\nProgramming Skills = ‘Good’     Programming Skills = ‘Bad’\nTotal Entropy = 0.95                  Information Gain = 0.04\n\n## Avoid Overfitting in Decision Tree Pruning \n- The decision tree algorithm, unless a stopping criterion is applied, may keep growing indefinitely.\n- To prevent a decision tree getting overfitted to the training data, pruning of the decision tree is essential.\n- Pruning a decision tree reduces the size of the tree such that the model is more generalized and can classify unknown and unlabeled data in a better way.\n- Pre-pruning: Stop growing the tree before it reaches perfection.\n- Post-pruning: Allow the tree to grow entirely and then post-prune some of the branches from it.\n\n## Random Forest Model \n- It is an ensemble classifier, i.e., a combining classifier that uses and combines many decision tree classifiers.\n- Ensembling is usually done using the concept of bagging with different feature sets.\n- The reason for using large number of trees in random forest is to train the trees enough such that contribution from each feature comes in a number of models.\n- After the random forest is generated by combining the trees, majority vote is applied to combine the output of the different trees.\n- Ensembled model yields better result than decision trees.\n\n![image](https://user-images.githubusercontent.com/42486737/119497035-b761c180-bd81-11eb-80bd-da270fc008bc.png)\n\n## Random Forst Algorithm \n\nThe algorithm works as follows:\n1. If there are N variables or features in the input data set, select a subset of ‘m’ (m\u003cN) features at random out of the N features.\n2. Use the best split principle on these ‘m’ features to calculate the number of nodes ‘d’.\n3. Keep splitting the nodes to child nodes till the tree is grown to maximum possible extent.\n4. Select a different subset of the training data ‘with replacement’ to train another DT with steps (1) to (3). Repeat this to build and train ‘n’ decision trees.\n5. Final class assignment is done on the basis of the majority votes from the ‘n’ trees.\n\n## Drawbacks of Random Forest Algorithms \n- As it combines many decision trees, it is not easy to understand as a decision tree model.\n- Computationally, it is much more expensive than a simple decision tree.\n\n## Support Vector Machines \n- SVM is a model which can perform linear classification as well as regression.\n- It is based on the concept of a surface called hyperplane, which draws a boundary between data instances plotted on a multi-dimensional feature space.\n- The output prediction is one of the two classes defined in the training data.\n\n![image](https://user-images.githubusercontent.com/42486737/119497422-2e975580-bd82-11eb-841e-cdc39d39e98b.png)\n\n## Principle of Support Vector Machine \n![image](https://user-images.githubusercontent.com/42486737/119497495-45d64300-bd82-11eb-84b8-ee27d5856957.png)\n\n### Scenario 1 : Identify the right hyperplane \n- Here, we have three hyper-planes (A, B and C). Now, identify the right hyper-plane to classify star and circle.\n- Select the hyper-plane which segregates the two classes better?\n![image](https://user-images.githubusercontent.com/42486737/119497669-7b7b2c00-bd82-11eb-9b9d-13df653a4ed1.png)\n- In this scenario, hyper-plane “B” has excellently performed this job\n\n### Scenario 2: Identify the right hyperplane \n- Here, we have three hyper-planes (A, B and C) and all are segregating the classes well. Now, How can we identify the right hyper-plane?\n- Here, maximizing the distances between nearest data point (either class) and hyper-plane will help us to decide the right hyper-plane. This distance is called as Margin.\n\n![image](https://user-images.githubusercontent.com/42486737/119498107-e62c6780-bd82-11eb-87b9-14b07e91b89b.png)\n\n- We can see that the margin for hyper-plane C is high as compared to both A and B. Hence, we name the right hyper-plane as C.\n\n### Scenario 3: Identify the right hyperplane \n\n![image](https://user-images.githubusercontent.com/42486737/119498343-25f34f00-bd83-11eb-81a4-86624ffee48e.png)\n\n- Here hyper-plane B as it has higher margin compared to A.\n- SVM selects the hyper-plane which classifies the classes accurately prior to maximizing margin.\n- Hyper-plane B has a classification error and A has classified all correctly. Therefore, the right hyper-plane is A.\n\n### Scenario 4: Identify the right hyperplane \n\n![image](https://user-images.githubusercontent.com/42486737/119498529-576c1a80-bd83-11eb-92ab-cc6c1c2452ec.png)\n\n- Here, we are unable to segregate the two classes using a straight line, as one of star lies in the territory of other(circle) class as an outlier.\n- SVM has a feature to ignore outliers and find the hyper-plane that has maximum margin. SVM is robust to outliers.\n\n## Strenghts of SVM \n- SVM can be used for both classification \u0026 regression.\n- It is robust, i.e. not much impacted by data with noise or outliers.\n- The prediction results using this model are very strong.\n\n## Weakness of SVM \n- SVM is applicable only for binary classification, i.e. when there are only two classes in the problem.\n- While dealing with high dimensional data, it becomes very complex.\n- It is slow for large dataset, i.e. a data set with more features or instances.\n- It is memory-intensive(throughput is bounded by the device memory bandwidth).\n\n\n\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffelixfaisal%2Fml-notes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffelixfaisal%2Fml-notes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffelixfaisal%2Fml-notes/lists"}