{"id":19469716,"url":"https://github.com/niloth-p/density-based-outlier-detection","last_synced_at":"2025-04-25T11:33:25.205Z","repository":{"id":130152621,"uuid":"131353477","full_name":"Niloth-p/Density-Based-Outlier-Detection","owner":"Niloth-p","description":"An implementation of a density based outlier detection method - the Local Outlier Factor Technique, to find frauds in credit card transactions. For detecting both local and global outliers.","archived":false,"fork":false,"pushed_at":"2020-08-08T22:26:19.000Z","size":29,"stargazers_count":5,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2023-10-04T12:44:59.964Z","etag":null,"topics":["accuracy","density","distancematrix","k-nearest-neighbors","local-outlier-factor","local-reachability-density","mahalanobis-distance","outlier-detection","performance-metrics","precision-recall-curve"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Niloth-p.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-04-27T23:26:34.000Z","updated_at":"2023-10-04T12:45:02.398Z","dependencies_parsed_at":"2023-04-01T05:33:09.476Z","dependency_job_id":null,"html_url":"https://github.com/Niloth-p/Density-Based-Outlier-Detection","commit_stats":null,"previous_names":["niloth-p/density-based-outlier-detection"],"tags_count":null,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Niloth-p%2FDensity-Based-Outlier-Detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Niloth-p%2FDensity-Based-Outlier-Detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Niloth-p%2FDensity-Based-Outlier-Detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Niloth-p%2FDensity-Based-Outlier-Detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Niloth-p","download_url":"https://codeload.github.com/Niloth-p/Density-Based-Outlier-Detection/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224000812,"owners_count":17239000,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accuracy","density","distancematrix","k-nearest-neighbors","local-outlier-factor","local-reachability-density","mahalanobis-distance","outlier-detection","performance-metrics","precision-recall-curve"],"created_at":"2024-11-10T18:53:41.801Z","updated_at":"2024-11-10T18:53:43.219Z","avatar_url":"https://github.com/Niloth-p.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Density-Based-Outlier-Detection\n## Description\n\n    An implementation of a density based outlier detection method - the Local Outlier Factor Technique,\n    to find frauds in credit card transactions (here) using Python. For detecting both local and global outliers.\n\n## Dataset Used:\n    \n    German Credit Data\n    Professor Dr. Hans Hofmann  \n    Institut f\"ur Statistik und \"Okonometrie  \n    Universit\"at Hamburg  \n    numerical attributes - file \"german.data-numeric\", by Strathclyde University \n    (1 = Good, 2 = Bad)\n\n## How to run:\n\n    After setting the correct values of the global variables\n    python Outlier.py\n\n## Global variables:\n    \n    first_time - If run for the first time, the distance matrix is computed after reading the data from the dataset, and is written to a pickle file called \"distancematrix\". If it is set to false, the distance matrix is directly read from the pickle file.\n\n    filename - The name of the data file\n\n    parameters - a list of pairs of k and O values, for which the outlier detection is done\n\n## Parameters: \n   \n    k - to get kNN, kdist, and hence LOF (here, it doesn't seem to affect the accuracy!?)\n    O - Number of outliers (higher the number of outliers, higher the precision and recall seem to be)\n    N - len(data) (= 1000 for the given dataset)\n\n    The Precision averages around 0.62, and the Recall linearly increases with increase in O (!?)\n\n    The plotting of the PR curve has been commented out.\n\n## Functions:\n### def readData():\n    \n    Reads data from text file and stores as data frame using pandas.\n    df(X) is taken as the entire table except the last column containing the classification of points as outliers or not\n    And the last column is taken as Y for measuring the accuracy of the training\n    The data is normalized\n\n### def mahalanobisdist(a, b):\n    \n    Calculates the mahalanobis distance between 2 points of the data\n    d(x,y) = sqrt((x-y)T . S^-1 . (x-y))\n    Sinv, the inverse of the covariance matrix is computed without issue of singularity arising, using pinv()\n    \n### def createDistanceMatrix(data, first_timeval, N):\n    \n    Computes the distance matrix (the Mahalanobis distance between all pairs of points) and \n    writes to to a pickle file to save time on future runs, which is indicated by the global variable first_time\n    \n### def getLRD(N, distancematrix, k, data):\n    \n    Finds, for each point,\n    1. The KNN\n    2. The kdistance for each point i.e the distance to its kthNN,\n    2. The number of points that fall within the k-distance neighbourhood (Nk)\n    3. Reachability distances (betw the point in focus and all other points(1 at a time))\n    i.e max{k-distance of point in focus, distance between point in focus and the other point in consideration}\n    4. LRD (local reachability density)\n    i.e Nk/(sum of reachability distances)\n    Lower the LRD, higher the LOF =\u003e considering the LRD value for detecting outliers\n    \n### def getReachabilityDistances(N, data, kdist, distancematrix):\n    \n    Calculates the reachability distance between all pairs of points\n    \n### def getAccuracy(outliers, Y, N, PrecisionList, RecallList):\n    \n    Gets the performance metrics of the outlier detection done,\n    in terms of Accuracy, Precision, Recall, F1-Score\n    using true and false +ves and -ves, by comparing the obtained classification vs the given good and bad points\n    True +ve : Good point classified good\n    True -ve : Bad point classified bad\n    False +ve : Bad point classified good\n    False -ve : Good point classified bad\n    All 4 of these types are not equally important. \n    For example, false -ves are more acceptable than false +ves.\n    Accuracy = (tp + tn)/(tp + tn + fp + fn)\n    Precision = tp/(tp + fp)\n    Recall = tp/(tp + fn)\n    F1 = HM of Precision and Recall\n\n### def main():\n    \n    Calls the functions \n    1.to get distance matrix,\n    2.the LRD, \n    3.the 1st O points after sorting of LRD and \n    4.gets the performance metrics\n    \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fniloth-p%2Fdensity-based-outlier-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fniloth-p%2Fdensity-based-outlier-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fniloth-p%2Fdensity-based-outlier-detection/lists"}