{"id":20356562,"url":"https://github.com/madhurimarawat/machine-learning-using-python","last_synced_at":"2025-04-12T02:51:36.041Z","repository":{"id":186509334,"uuid":"675290169","full_name":"madhurimarawat/Machine-Learning-Using-Python","owner":"madhurimarawat","description":"This repository contains machine learning programs in the Python programming language.","archived":false,"fork":false,"pushed_at":"2025-04-08T15:45:41.000Z","size":2201,"stargazers_count":5,"open_issues_count":0,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-08T16:38:15.170Z","etag":null,"topics":["breast-cancer-dataset","data-cleaning","decision-tree","deep-learning","iris-dataset","kaggle-datasets","knn","machine-learning-algorithms","matplotlib","pandas","pca","python","random-forest","scikit-learn","single-neuron-neural-network","supervised-learning","supervised-learning-algorithms","svm","unsupervised-learning-algorithms","wine-dataset"],"latest_commit_sha":null,"homepage":"https://ml-model-datasets-using-apps-3gy37ndiancjo2nmu36sls.streamlit.app/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/madhurimarawat.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-06T12:37:57.000Z","updated_at":"2025-04-08T15:45:46.000Z","dependencies_parsed_at":"2025-04-08T16:41:34.573Z","dependency_job_id":null,"html_url":"https://github.com/madhurimarawat/Machine-Learning-Using-Python","commit_stats":{"total_commits":35,"total_committers":1,"mean_commits":35.0,"dds":0.0,"last_synced_commit":"1be7c39bbd83e3366609697a00b954347380be2f"},"previous_names":["madhurimarawat/machine-learning-using-python"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madhurimarawat%2FMachine-Learning-Using-Python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madhurimarawat%2FMachine-Learning-Using-Python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madhurimarawat%2FMachine-Learning-Using-Python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madhurimarawat%2FMachine-Learning-Using-Python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/madhurimarawat","download_url":"https://codeload.github.com/madhurimarawat/Machine-Learning-Using-Python/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248509371,"owners_count":21116006,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["breast-cancer-dataset","data-cleaning","decision-tree","deep-learning","iris-dataset","kaggle-datasets","knn","machine-learning-algorithms","matplotlib","pandas","pca","python","random-forest","scikit-learn","single-neuron-neural-network","supervised-learning","supervised-learning-algorithms","svm","unsupervised-learning-algorithms","wine-dataset"],"created_at":"2024-11-14T23:17:02.838Z","updated_at":"2025-04-12T02:51:36.017Z","avatar_url":"https://github.com/madhurimarawat.png","language":"Jupyter Notebook","readme":"# Machine-Learning-Using-Python\nThis repository contains machine learning programs in the Python programming language.\n\u003cbr\u003e\u003cbr\u003e\n\u003cimg src=\"https://i.morioh.com/52c215bc5f.png\" height=400 width=700\u003e\n\n---\n\n# About Python Programming\n- Python is a high-level, general-purpose, and very popular programming language.\u003cbr\u003e\n- Python programming language (latest Python 3) is being used in web development, Machine Learning applications, along with all cutting-edge technology in Software Industry.\u003cbr\u003e\n- Python is available across widely used platforms like Windows, Linux, and macOS.\u003cbr\u003e\n- The biggest strength of Python is huge collection of standard library .\u003cbr\u003e\n\n---\n# Mode of Execution Used \u003cimg src=\"https://th.bing.com/th/id/R.c936445e15a65dfdba20a63e14e7df39?rik=fqWqO9kKIVlK7g\u0026riu=http%3a%2f%2fassets.stickpng.com%2fimages%2f58481537cef1014c0b5e4968.png\u0026ehk=dtrTKn1QsJ3%2b2TFlSfLR%2fxHdNYHdrqqCUUs8voipcI8%3d\u0026risl=\u0026pid=ImgRaw\u0026r=0\" title=\"PyCharm\" alt=\"PyCharm\" width=\"40\" height=\"40\"\u003e\n\n\u003ch2\u003ePycharm\u003c/h2\u003e\n\n- Visit the official website of pycharm: \u003ca href=\"https://www.jetbrains.com/pycharm/\"\u003e\u003cimg src=\"https://th.bing.com/th/id/R.c936445e15a65dfdba20a63e14e7df39?rik=fqWqO9kKIVlK7g\u0026riu=http%3a%2f%2fassets.stickpng.com%2fimages%2f58481537cef1014c0b5e4968.png\u0026ehk=dtrTKn1QsJ3%2b2TFlSfLR%2fxHdNYHdrqqCUUs8voipcI8%3d\u0026risl=\u0026pid=ImgRaw\u0026r=0\" title=\"PyCharm\" alt=\"PyCharm\" width=\"40\" height=\"40\"\u003e\u003c/a\u003e\u003cbr\u003e\n- Download according to the platform that will be used like Linux, Macos or Windows.\u003cbr\u003e\n- Two versions of Pycharm are avilable-\u003cbr\u003e\u003cbr\u003e\n1. Community version \u003cbr\u003e\n- Community version is open source and we can use it for free without any paid plan.\u003cbr\u003e\n- We can download this at the end of pycharm website.\u003cbr\u003e\n- After downloading community version we can directly follow the setup wizard and it will be setup.\u003cbr\u003e\u003cbr\u003e\n\n2.  Professional Version.\u003cbr\u003e\n- This is available at the top of website, we can directly download from there.\u003cbr\u003e\n- After downloading professional version, follow the below steps.\u003cbr\u003e\n- Follow the setup wizard and sign up for the free version (trial version) or else continue with the premium or paid version.\u003cbr\u003e\n\n### Using Pycharm\n- First, in pycharm we have the concept of virtual environment. In virtual environment we can install all the required libraries or frameworks.\u003cbr\u003e\n- Each project has its own virtual environment, so thath we can install requirements like Libraries or Framworks for that project only.\u003cbr\u003e\n- After this we can create a new file, various file types are available in pycharm like script files, text files and also Jupyter Notebooks.\u003cbr\u003e\n- After selecting the required file type, we can continue the execution of that file by saving it and using this shortcut shift+F10 (In Windows).\u003cbr\u003e\n- Output is given in Console while installation happens in terminal in Pycharm.\n\n---\n\n# Machine learning 🤖 🛠🧠\n\n\u003cimg src=\"https://www.analytixlabs.co.in/blog/wp-content/uploads/2018/10/Artboard-20-1.png\" height=400 width=700\u003e\n\n- Machine learning is a method of data analysis that automates analytical model building.\u003cbr\u003e\n- It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.\u003cbr\u003e\n- Machine Learning algorithm learns from experience E with respect to some type of task T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.\u003cbr\u003e\n\n---\n\n# Steps of Machine learning\n\n\u003cimg src=\"https://github.com/madhurimarawat/Machine-Learning-Using-Python/assets/105432776/9e6a29ba-b1b2-4c54-b5c2-1f33f9389bdb\" height=400 width=700\u003e\n\n---\n\n# Types of Machine Learning\n\n\u003cimg src=\"https://github.com/madhurimarawat/Machine-Learning-Using-Python/assets/105432776/5ebb4d39-1515-4d7a-a9c2-ab75242af166\" height=400 width=700\u003e\n\u003cbr\u003e\n\u003ch2\u003e\u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/Supervised%20Learning\"\u003e1. Supervised Learning\u003c/a\u003e\u003c/h2\u003e\n\n- Basically supervised learning is when we teach or train the machine using data that is well-labelled. \u003cbr\u003e\n- Which means some data is already tagged with the correct answer.\u003cbr\u003e\n- After that, the machine is provided with a new set of examples(data) so that the supervised learning algorithm analyses the training data(set of training examples) and produces a correct outcome from labeled data.\u003cbr\u003e\n\n\u003ch3\u003e\u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/blob/main/Supervised%20Learning/ML_KNN-algorithm.ipynb\"\u003ei) K-Nearest Neighbors (KNN) \u003c/a\u003e \u003c/h3\u003e\n\u003cbr\u003e\n\n- K-Nearest Neighbours is one of the most basic yet essential classification algorithms in Machine Learning.\u003cbr\u003e\n- It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining, and intrusion detection..\u003cbr\u003e\n- In this algorithm,we identify category based on neighbors.\u003cbr\u003e\n\n\u003ch3\u003e\u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/blob/main/Supervised%20Learning/ML_SVM-algorithm.ipynb\"\u003eii) Support Vector Machines (SVM) \u003c/a\u003e\u003c/h3\u003e\n\u003cbr\u003e\n\n- The main idea behind SVMs is to find a hyperplane that maximally separates the different classes in the training data. \u003cbr\u003e\n- This is done by finding the hyperplane that has the largest margin, which is defined as the distance between the hyperplane and the closest data points from each class. \u003cbr\u003e\n- Once the hyperplane is determined, new data can be classified by determining on which side of the hyperplane it falls. \u003cbr\u003e\n- SVMs are particularly useful when the data has many features, and/or when there is a clear margin of separation in the data.\u003cbr\u003e\n\n\u003ch3\u003e\u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/blob/main/Supervised%20Learning/ML_Naive-Bayes-Classifier.ipynb\"\u003eiii) Naive Bayes Classifiers\u003c/a\u003e\u003c/h3\u003e\n\u003cbr\u003e\n\n- Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. \u003cbr\u003e\n- It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.\u003cbr\u003e\n- The fundamental Naive Bayes assumption is that each feature makes an independent and equal contribution to the outcome.\u003cbr\u003e\n\n\u003ch3\u003e\u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/blob/main/Supervised%20Learning/ML_Decision-Tree-Algorithm.ipynb\"\u003eiv) Decision Tree\u003c/h3\u003e\u003c/a\u003e\n\u003cbr\u003e\n  \n- It builds a flowchart-like tree structure where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.\u003cbr\u003e\n- It is constructed by recursively splitting the training data into subsets based on the values of the attributes until a stopping criterion is met, such as the maximum depth of the tree or the minimum number of samples required to split a node.\u003cbr\u003e\n- The goal is to find the attribute that maximizes the information gain or the reduction in impurity after the split.\u003cbr\u003e\n\n\u003ch3\u003e\u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/blob/main/Supervised%20Learning/ML_Random-Forest.ipynb\"\u003ev) Random Forest\u003c/a\u003e\u003c/h3\u003e\n\u003cbr\u003e\n\n- It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model.\u003cbr\u003e\n- Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final output.\u003cbr\u003e\n- The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting.\u003cbr\u003e\n\n\u003ch3\u003e\u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/blob/main/Supervised%20Learning/ML_Linear-Regression.ipynb\"\u003evi) Linear Regression\u003c/a\u003e\u003c/h3\u003e\n\u003cbr\u003e\n\n- Regression: It predicts the continuous output variables based on the independent input variable. like the prediction of house prices based on different parameters like house age, distance from the main road, location, area, etc.\u003cbr\u003e\n- It computes the linear relationship between a dependent variable and one or more independent features. \u003cbr\u003e\n- The goal of the algorithm is to find the best linear equation that can predict the value of the dependent variable based on the independent variables.\u003cbr\u003e\n\u003ch3\u003eTypes of Linear Regression\u003c/h3\u003e\n\n\u003ch4\u003e1. Univariate/Simple Linear regression\u003c/h4\u003e\n\n- When the number of the independent feature, is 1 then it is known as Univariate Linear regression.\u003cbr\u003e\n\u003ch4\u003e2. Multivariate/Multiple Linear regression\u003c/h4\u003e\n\n- In the case of more than one feature, it is known as multivariate linear regression.\u003cbr\u003e\n\n\u003ch3\u003e\u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/blob/main/Supervised%20Learning/ML_Logistic-Regression.ipynb\"\u003evii) Logistic Regression\u003c/a\u003e\u003c/h3\u003e\n\u003cbr\u003e\n\n- Logistic regression is a supervised machine learning algorithm mainly used for classification tasks where the goal is to predict the probability that an instance of belonging to a given class or not. \u003cbr\u003e\n- It is a kind of statistical algorithm, which analyze the relationship between a set of independent variables and the dependent binary variables. \u003cbr\u003e\n- It is a powerful tool for decision-making.\u003cbr\u003e\n- For example email spam or not. \u003cbr\u003e\n\n\u003ch3\u003eTypes of Logistic Regression\u003c/h3\u003e\n\u003ch4\u003e1. Binomial Logistic regression\u003c/h4\u003e\n\n- In binomial Logistic regression, there can be only two possible types of the dependent variables, such as 0 or 1, Pass or Fail, etc.\u003cbr\u003e\n\n\u003ch4\u003e2. Multinomial Logistic regression\u003c/h4\u003e\n\n- In multinomial Logistic regression, there can be 3 or more possible unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”.\u003cbr\u003e\n\n\u003ch4\u003e3. Ordinal Logistic regression\u003c/h4\u003e\n\n- In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent variables, such as “low”, “Medium”, or “High”.\u003cbr\u003e\n\n\u003ch2\u003e\u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/tree/main/Unsupervised%20Learning\"\u003e2. Unsupervised Learning\u003c/a\u003e\u003c/h2\u003e\n\n- Unsupervised learning is the training of a machine using information that is neither classified nor labeled and allowing the algorithm to act on that information without guidance.\u003cbr\u003e\n- Here the task of the machine is to group unsorted information according to similarities, patterns, and differences without any prior training of data. \u003cbr\u003e\n- Unsupervised learning models are utilized for three main tasks— association, clustering and dimensionality reduction.\u003cbr\u003e\n\n\u003ch3\u003e\u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/blob/main/Unsupervised%20Learning/Market_Basket_Optimisation.csv\"\u003ei) Association Rules\u003c/a\u003e\u003c/h3\u003e\n\n- An association rule is a rule-based method for finding relationships between variables in a given dataset.\u003cbr\u003e\n- These methods are frequently used for market basket analysis, allowing companies to better understand relationships between different products.\u003cbr\u003e\n- Understanding consumption habits of customers enables businesses to develop better cross-selling strategies and recommendation engines.\u003cbr\u003e\n- Examples of this can be seen in Amazon’s “Customers Who Bought This Item Also Bought” or Spotify’s \"Discover Weekly\" playlist.\u003cbr\u003e\n\n\u003ch3\u003eTypes of Association Rules\u003c/h3\u003e\n\n\u003ch4\u003e\u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/blob/main/Unsupervised%20Learning/ML_Apriori-Algorithm.ipynb\"\u003e1. Apriori Algorithm\u003c/a\u003e\u003c/h4\u003e\n\n- Apriori is an algorithm for frequent item set mining and association rule learning over relational databases.\u003cbr\u003e\n- It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database.\u003cbr\u003e\n- The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database.\u003cbr\u003e\n- This has applications in domains such as market basket analysis.\u003cbr\u003e\n\n\u003ch3\u003e\u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/blob/main/Unsupervised%20Learning/ML_K-means-Clustering-Algorithm.ipynb\"\u003eii) Clustering\u003c/a\u003e\u003c/h3\u003e\n\n- Clustering is a data mining technique which groups unlabeled data based on their similarities or differences.\u003cbr\u003e\n- Clustering algorithms are used to process raw, unclassified data objects into groups represented by structures or patterns in the information.\u003cbr\u003e\n- Clustering algorithms can be categorized into a few types, specifically exclusive, overlapping, hierarchical, and probabilistic.\u003cbr\u003e\n\n\u003ch3\u003eTypes of Clustering\u003c/h3\u003e\n\n\u003ch4\u003e1. K Means Clustering\u003c/h4\u003e\n\n- K-Means Clustering is an unsupervised machine learning algorithm.\u003cbr\u003e\n- Its objective is to group data points into K clusters to minimize the variance within each cluster.\u003cbr\u003e\n- The process involves iteratively assigning data points to the nearest cluster centroid and updating the centroids until convergence.\u003cbr\u003e\n- K-Means is commonly applied in various domains such as customer segmentation, image compression, and anomaly detection.\u003cbr\u003e\n\n\u003ch3\u003eiii) Dimentionality Reduction\u003c/h3\u003e\n\n- Dimensionality reduction is a technique used when the number of features, or dimensions, in a given dataset is too high.\u003cbr\u003e\n- It reduces the number of data inputs to a manageable size while also preserving the integrity of the dataset as much as possible.\u003cbr\u003e\n- It is commonly used in the preprocessing data stage.\u003cbr\u003e\n\n\u003ch3\u003eTypes of Dimentionality Reduction\u003c/h3\u003e\n\n\u003ch4\u003e1. Principal component analysis\u003c/h4\u003e\n\n- Principal component analysis (PCA) is a type of dimensionality reduction algorithm which is used to reduce redundancies and to compress datasets through feature extraction.\u003cbr\u003e\n- This method uses a linear transformation to create a new data representation, yielding a set of \"principal components.\"\u003cbr\u003e\n- The first principal component is the direction which maximizes the variance of the dataset.\u003cbr\u003e\n- While the second principal component also finds the maximum variance in the data, it is completely uncorrelated to the first principal component, yielding a direction that is perpendicular, or orthogonal, to the first component.\u003cbr\u003e\n\n---\n# Dataset Used\n\n\u003ch2\u003eIris Dataset\u003c/h2\u003e\n\n- Iris Dataset is a part of sklearn library.\u003cbr\u003e\n- Sklearn comes loaded with datasets to practice machine learning techniques and iris is one of them. \u003cbr\u003e\n- Iris has 4 numerical features and a tri class target variable.\u003cbr\u003e\n- This dataset can be used for classification as well as clustering.\u003cbr\u003e\n- In this dataset, there are 4 features sepal length, sepal width, petal length and petal width and the target variable has 3 classes namely ‘setosa’, ‘versicolor’, and ‘virginica’.\u003cbr\u003e\n- Objective for a multiclass classifier is to predict the target class given the values for the four features.\u003cbr\u003e\n- Dataset is already cleaned,no preprocessing required.\u003cbr\u003e\n- K-Nearest Neighbor and Support Vector Machine is implemented on this dataset.\u003cbr\u003e\n\n\u003ch2\u003eBreast Cancer Dataset\u003c/h2\u003e\n\n- The breast cancer dataset is a classification dataset that contains 569 samples of malignant and benign tumor cells. \u003cbr\u003e\n- The samples are described by 30 features such as mean radius, texture, perimeter, area, smoothness, etc. \u003cbr\u003e\n- The target variable has 2 classes namely ‘benign’ and ‘malignant’.\u003cbr\u003e\n- Objective for a multiclass classifier is to predict the target class given the values for the features.\u003cbr\u003e\n- Dataset is already cleaned,no preprocessing required.\u003cbr\u003e\n- K-Nearest Neighbor and Support Vector Machine is implemented on this dataset.\u003cbr\u003e\n\n\u003ch2\u003eWine Dataset\u003c/h2\u003e\n\n- The wine dataset is a classic and very easy multi-class classification dataset that is available in the sklearn library.\u003cbr\u003e\n- It contains 178 samples of wine with 13 features and 3 classes.\u003cbr\u003e\n- The goal is to predict the class of wine based on the features.\u003cbr\u003e\n- Dataset is already cleaned,no preprocessing required.\u003cbr\u003e\n- K-Nearest Neighbor and Support Vector Machine is implemented on this dataset.\u003cbr\u003e\n\n\u003ch2\u003eNaive bayes classification data\u003c/h2\u003e\n\n- Dataset is taken from: \u003ca href=\"https://www.kaggle.com/datasets/himanshunakrani/naive-bayes-classification-data\"\u003e\u003cimg src=\"https://cdn4.iconfinder.com/data/icons/logos-and-brands/512/189_Kaggle_logo_logos-1024.png\" height =40 width=40 title=\"Naive bayes classification data\"\u003e \u003c/a\u003e\u003cbr\u003e\n- Contains diabetes data for classification.\u003cbr\u003e\n- The dataset has 3 columns-glucose, blood pressure and diabetes and 995 entries.\u003cbr\u003e\n- Column glucose and blood pressure data is to classify whether the patient has diabetes or not.\u003cbr\u003e\n- Dataset is already cleaned,no preprocessing required.\u003cbr\u003e\n- Naive bayes classifier is implemented on this dataset.\u003cbr\u003e\n\n\u003ch2\u003eRed wine Quality Dataset\u003c/h2\u003e\n\n- Dataset is taken from: \u003ca href=\"https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009\"\u003e\u003cimg src=\"https://cdn4.iconfinder.com/data/icons/logos-and-brands/512/189_Kaggle_logo_logos-1024.png\" height =40 width=40 title=\"Red wine Quality Dataset\" alt=\"Red wine Quality Dataset\"\u003e \u003c/a\u003e\u003cbr\u003e\n\n- Input variables (based on physicochemical tests):\u003cbr\u003e\n\n\u003ctable\u003e\n  \u003ctd\u003e1. fixed acidity \u003c/td\u003e     \u003ctd\u003e2. volatile acidity\u003c/td\u003e    \u003ctd\u003e3. citric acid \u003c/td\u003e  \u003ctd\u003e4. residual sugar\u003c/td\u003e  \u003ctd\u003e5. chlorides\u003c/td\u003e\n\u003ctd\u003e6 - free sulfur dioxide\u003c/td\u003e  \u003ctd\u003e7 - total sulfur dioxide\u003c/td\u003e   \u003ctd\u003e8 - density \u003c/td\u003e     \u003ctd\u003e9 - pH\u003c/td\u003e             \u003ctd\u003e 10 - sulphates\u003c/td\u003e  \u003ctd\u003e11 - alcohol\u003c/td\u003e \u003c/table\u003e\n\n- Output variable (based on sensory data):\u003cbr\u003e\n\n\u003ctable\u003e\u003ctd\u003e12 - quality (score between 0 and 10)\u003c/td\u003e\u003c/table\u003e \n\n- Dataset is already cleaned,no preprocessing required.\u003cbr\u003e\n- Decision Tree and Random Forest is implemented on this dataset.\u003cbr\u003e\n\n\u003ch2\u003eCars Evaluation Dataset\u003c/h2\u003e\n\n- Dataset is taken from: \u003ca href=\"https://www.kaggle.com/datasets/elikplim/car-evaluation-data-set\"\u003e\u003cimg src=\"https://cdn4.iconfinder.com/data/icons/logos-and-brands/512/189_Kaggle_logo_logos-1024.png\" height =40 width=40 title=\"Cars Evaluation Dataset\" alt=\"Cars Evaluation Dataset\"\u003e \u003c/a\u003e\u003cbr\u003e\n- Contains information about cars with respect to features like Attribute Values:\u003cbr\u003e\n\u003ctable\u003e\n\u003ctd\u003e1. buying v-high, high, med, low \u003c/td\u003e\n\u003ctd\u003e2.maint v-high, high, med, low \u003c/td\u003e\n\u003ctd\u003e3.doors 2, 3, 4, 5-more \u003c/td\u003e\n\u003ctd\u003e4. persons 2, 4, more \u003c/td\u003e\n\u003ctd\u003e5. lug_boot small, med, big\u003c/td\u003e  \n\u003ctd\u003e6.safety low, med, high\u003c/td\u003e  \u003c/table\u003e\n\n- Target categories are:\u003cbr\u003e\n\n\u003ctable\u003e\n  \u003ctd\u003e1. unacc 1210 (70.023 %)\u003c/td\u003e\n  \u003ctd\u003e2. acc 384 (22.222 %)\u003c/td\u003e\n  \u003ctd\u003e3. good 69 ( 3.993 %)\u003c/td\u003e\n  \u003ctd\u003e4. v-good 65 ( 3.762 %)\u003c/td\u003e\u003c/table\u003e\n  \n- Contains Values in string format.\u003cbr\u003e\n- Dataset is not cleaned, preprocessing is required.\u003cbr\u003e\n- Random Forest is implemented on this dataset.\u003cbr\u003e\n\n\u003ch2\u003eCensus/Adult Dataset \u003c/h2\u003e\n\n- Dataset is taken from: \u003ca href=\"https://www.kaggle.com/code/prashant111/naive-bayes-classifier-in-python/input\"\u003e\u003cimg src=\"https://cdn4.iconfinder.com/data/icons/logos-and-brands/512/189_Kaggle_logo_logos-1024.png\" height =40 width=40 title=\"Census/Adult Dataset\" alt=\"Census/Adult Dataset\"\u003e \u003c/a\u003e\u003cbr\u003e\n\n- Contains dataset of population in various parameters like employment,marital status,gender,ethnicity etc.\u003cbr\u003e\n- Model need to predict if income is greater than 50K or not.\u003cbr\u003e\n- Contains Values in string format.\u003cbr\u003e\n- Dataset is not cleaned, preprocessing is required.\u003cbr\u003e\n- Naive bayes classifier is implemented on this dataset.\u003cbr\u003e\n\n\u003ch2\u003eSalary Dataset\u003c/h2\u003e\n\n- Dataset is taken from: \u003ca href=\"https://www.kaggle.com/datasets/abhishek14398/salary-dataset-simple-linear-regression\n\"\u003e\u003cimg src=\"https://cdn4.iconfinder.com/data/icons/logos-and-brands/512/189_Kaggle_logo_logos-1024.png\" height =40 width=40 title=\"Salary Dataset\" alt=\"Salary Dataset\"\u003e \u003c/a\u003e\u003cbr\u003e\n\n- Contains Salary data for Regression.\u003cbr\u003e\n- The dataset has 2 columns-Years of Experience and Salary and 30 entries.\u003cbr\u003e\n- Column Years of Experience is used to find regression for Salary.\u003cbr\u003e\n- Dataset is already cleaned,no preprocessing required.\u003cbr\u003e\n- Linear Regression is implemented on this dataset.\u003cbr\u003e\n\n\u003ch2\u003eUSA Housing Dataset\u003c/h2\u003e\n\n- Dataset is taken from: \u003ca href=\"https://www.kaggle.com/code/gantalaswetha/usa-housing-dataset-linear-regression/input\n\"\u003e\u003cimg src=\"https://cdn4.iconfinder.com/data/icons/logos-and-brands/512/189_Kaggle_logo_logos-1024.png\" height =40 width=40 title=\"Housing Dataset\" alt=\"Housing Dataset\"\u003e \u003c/a\u003e\u003cbr\u003e\n\n- Contains Housing data for Regression.\u003cbr\u003e\n- This dataset has multiple columns-Area Population, Address etc and Price(Output) and 5000 entries.\u003cbr\u003e\n- Rest of the Columns are used to find regression for Price.\u003cbr\u003e\n- Dataset is already cleaned,no preprocessing required.\u003cbr\u003e\n- Linear Regression and Principal Component Analysis is implemented on this dataset.\u003cbr\u003e\n\n\u003ch2\u003eCredit Card Fraud Dataset\u003c/h2\u003e\n\n- Dataset is taken from: \u003ca href=\"https://www.kaggle.com/code/janiobachmann/credit-fraud-dealing-with-imbalanced-datasets/input\n\"\u003e\u003cimg src=\"https://cdn4.iconfinder.com/data/icons/logos-and-brands/512/189_Kaggle_logo_logos-1024.png\" height =40 width=40 title=\"Salary Dataset\" alt=\"Salary Dataset\"\u003e \u003c/a\u003e\u003cbr\u003e\n- Contains Fraud data for Classification.\u003cbr\u003e\n- The dataset has 31 columns.\u003cbr\u003e\n- Dataset is already cleaned,no preprocessing required.\u003cbr\u003e\n- Logistic Regression is implemented on this dataset.\u003cbr\u003e\n\n\u003ch2\u003eMarket Bucket Optimization Dataset\u003c/h2\u003e\n\n- Dataset is taken from: \u003ca href=\"https://www.kaggle.com/datasets/dragonheir/basket-optimisation\n\"\u003e\u003cimg src=\"https://cdn4.iconfinder.com/data/icons/logos-and-brands/512/189_Kaggle_logo_logos-1024.png\" height =40 width=40 title=\"Salary Dataset\" alt=\"Salary Dataset\"\u003e \u003c/a\u003e\u003cbr\u003e\n- Contains various product data for Apriori or association algorithm.\u003cbr\u003e\n- The dataset has 20 columns of data about various products.\u003cbr\u003e\n- Dataset is already cleaned,no preprocessing required.\u003cbr\u003e\n- Apriori Algorithm is implemented on this dataset.\u003cbr\u003e\n\n\u003ch2\u003eCIFAR-10 Dataset\u003c/h2\u003e\n\n- CIFAR-10 is a dataset used in computer vision tasks.\u003cbr\u003e\n- It consists of 60,000 color images.\u003cbr\u003e\n- These images are divided into 10 different classes.\u003cbr\u003e\n- Each class contains 6,000 images.\u003cbr\u003e\n- The dataset is typically split into 50,000 training images and 10,000 test images.\u003cbr\u003e\n- Common classes in CIFAR-10 include airplanes, automobiles, birds, cats, dogs, and more.\u003cbr\u003e\n- The primary purpose of CIFAR-10 is for image classification and object recognition.\u003cbr\u003e\n- Researchers and developers often use it to benchmark and evaluate machine learning and deep learning algorithms.\u003cbr\u003e\n- Neural Network is implemented on this dataset.\u003cbr\u003e\n\n\u003ch2\u003eMall Customers Dataset\u003c/h2\u003e\n\n- Dataset is taken from: \u003ca href=\"https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-python/data\"\u003e\u003cimg src=\"https://cdn4.iconfinder.com/data/icons/logos-and-brands/512/189_Kaggle_logo_logos-1024.png\" height =40 width=40 title=\"Housing Dataset\" alt=\"Housing Dataset\"\u003e \u003c/a\u003e\u003cbr\u003e\n- Contains Mall Customers data for Clustering.\u003cbr\u003e\n- Gender, Age, Annual Income (k$) and Spending Score (1-100) columns are used to cluster data points.\u003cbr\u003e\n- Dataset is already cleaned,no preprocessing required.\u003cbr\u003e\n- K Means Clustering is implemented on this dataset.\n\n---\n\n### \u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/tree/main/Deep%20Learning\"\u003eDeep Learning 🤖🛠🧠🕸️\u003c/a\u003e\n- Deep learning is a subset of machine learning, which is essentially a neural network with three or more layers.\u003cbr\u003e\n- These neural networks attempt to simulate the behavior of the human brain—albeit far from matching its ability—allowing it to “learn” from large amounts of data.\u003cbr\u003e\n- While a neural network with a single layer can still make approximate predictions, additional hidden layers can help to optimize and refine for accuracy.\u003cbr\u003e\n- Deep learning drives many artificial intelligence (AI) applications and services that improve automation, performing analytical and physical tasks without human intervention.\u003cbr\u003e\n- Deep learning technology lies behind everyday products and services (such as digital assistants, voice-enabled TV remotes, and credit card fraud detection) as well as emerging technologies (such as self-driving cars).\u003cbr\u003e\n\n---\n\n# Libraries Used 📚 💻\n\u003cp\u003eShort Description about all libraries used.\u003c/p\u003e\nTo install python library this command is used- \u003cbr\u003e\u003cbr\u003e\n\n```\npip install library_name\n```\n\n\u003cul\u003e\n\u003cli\u003eNumPy (Numerical Python) – Enables with collection of mathematical functions\nto operate on array and matrices. \u003c/li\u003e\n  \u003cli\u003ePandas (Panel Data/ Python Data Analysis) - This library is mostly used for analyzing,\ncleaning, exploring, and manipulating data.\u003c/li\u003e\n  \u003cli\u003eMatplotlib - It is a data visualization and graphical plotting library.\u003c/li\u003e\n\u003cli\u003eScikit-learn - It is a machine learning library that enables tools for used for many other\nmachine learning algorithms such as classification, prediction, etc.\u003c/li\u003e\n  \u003cli\u003eMlxtend (machine learning extensions)- It is a library of extension and helper modules for Python's data analysis and machine learning libraries.\u003c/li\u003e\n  \u003cli\u003eTensorFlow (tf)- TensorFlow is an open-source machine learning framework developed by Google.\u003c/li\u003e\n\u003cli\u003eKeras- Keras is an open-source deep learning framework that serves as an interface for TensorFlow and other backends, making it easier to build and train neural networks.\u003c/li\u003e\n\u003c/ul\u003e\n\n---\n### Additional Resources 🧮📚📓🌐\n\n1. p2j- This python library is used to convert python script files to jupyter notebooks.The syntax is\n    \n  ```\np2j python_script.py\n  ```\nWhere script.py is the name of script file.\u003cbr\u003e\u003cbr\u003e\n- After executing this command in the console or command prompt of the file location Jupyter notebook will be written in the same location.\n\n2. Flask - This Python framwork is used to deploy machine learning models.\u003cbr\u003e\u003cbr\u003e\n   If you want to see introductory codes to flask, visit my repository: https://github.com/madhurimarawat/Machine-Learning-Projects-In-Python\n\n3. Streamlit - This framework is used to create website using python without having to worry about frontend.\u003cbr\u003e\u003cbr\u003e\n   I deployed my ML models that I made in this repository using streamlit:\u003cbr\u003e\nVisit Website from : \u003ca href=\"https://ml-model-datasets-using-apps-3gy37ndiancjo2nmu36sls.streamlit.app/\"\u003eML Algorithms on Inbuilt and Kaggle Datasets\u003c/a\u003e\u003cbr\u003e\u003cbr\u003e\nTo See codes: https://github.com/madhurimarawat/ML-Model-Datasets-Using-Streamlits\n\u003cbr\u003e\u003cbr\u003eAlso if you want to see introductory codes to streamlit, visit my repository:  https://github.com/madhurimarawat/Streamlit-Programs\n\n\n---\n\n## Thanks for Visiting 😄\n\n- Drop a 🌟 if you find this repository useful.\u003cbr\u003e\u003cbr\u003e\n- If you have any doubts or suggestions, feel free to reach me.\u003cbr\u003e\u003cbr\u003e\n📫 How to reach me:  \u0026nbsp; [![Linkedin Badge](https://img.shields.io/badge/-madhurima-blue?style=flat\u0026logo=Linkedin\u0026logoColor=white)](https://www.linkedin.com/in/madhurima-rawat/) \u0026nbsp; \u0026nbsp;\n\u003ca href =\"mailto:rawatmadhurima@gmail.com\"\u003e\u003cimg src=\"https://github.com/madhurimarawat/Machine-Learning-Using-Python/assets/105432776/b6a0873a-e961-42c0-8fbf-ab65828c961a\" height=35 width=30 title=\"Mail Illustration\" alt=\"Mail Illustration📫\" \u003e \u003c/a\u003e\u003cbr\u003e\u003cbr\u003e\n- **Contribute and Discuss:** Feel free to open \u003ca href= \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/issues\"\u003eissues 🐛\u003c/a\u003e, submit \u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/pulls\"\u003epull requests 🛠️\u003c/a\u003e, or start \u003ca href = \"https://github.com/madhurimarawat/Machine-Learning-Using-Python/discussions\"\u003ediscussions 💬\u003c/a\u003e to help improve this repository!\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmadhurimarawat%2Fmachine-learning-using-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmadhurimarawat%2Fmachine-learning-using-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmadhurimarawat%2Fmachine-learning-using-python/lists"}