{"id":29892425,"url":"https://github.com/mfelici/mad","last_synced_at":"2025-10-18T09:06:46.959Z","repository":{"id":193924406,"uuid":"246406472","full_name":"mfelici/mad","owner":"mfelici","description":"Vertica anomaly detection UDx based on Median Absolute Deviation","archived":false,"fork":false,"pushed_at":"2020-03-10T21:22:13.000Z","size":125,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2023-09-10T20:26:37.876Z","etag":null,"topics":["anomaly","cpp","detection","sql","timeseries","udx","userdefined-functions","vertica"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mfelici.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-03-10T20:56:12.000Z","updated_at":"2023-09-10T20:26:43.022Z","dependencies_parsed_at":"2023-09-10T20:36:47.121Z","dependency_job_id":null,"html_url":"https://github.com/mfelici/mad","commit_stats":null,"previous_names":["mfelici/mad"],"tags_count":null,"template":null,"template_full_name":null,"purl":"pkg:github/mfelici/mad","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mfelici%2Fmad","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mfelici%2Fmad/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mfelici%2Fmad/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mfelici%2Fmad/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mfelici","download_url":"https://codeload.github.com/mfelici/mad/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mfelici%2Fmad/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268157504,"owners_count":24204755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-01T02:00:08.611Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly","cpp","detection","sql","timeseries","udx","userdefined-functions","vertica"],"created_at":"2025-08-01T02:00:54.724Z","updated_at":"2025-10-18T09:06:41.913Z","avatar_url":"https://github.com/mfelici.png","language":"C++","readme":"﻿## What is MAD()\nMAD() is a Vertica User Defined Transform Function using the [Median Absolute Deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation). to implement a robust anomaly detection system.\n\nWe often have to deal with signals changing their statistical properties over time. Think for example to a room exposed to sunlight... we can expect the average temperature to decrease during the nights. To find \"anomalies\" in these cases we have to \"follow\" the *natural* drift of the signal.\n![graph with outliers](images/outliers.png)\n\nOne possible approach to find the anomalies \"following\" the signal is to compare its values with the Moving Average of the last N measures. The problem with Moving Averages is that the anomaly we want to identify influences the Average itself.\n\nSuppose you have a signal like this:\n\n![sample signal](images/graph.png)\n\nAs you can see we have a few anomalies (outliers) in this graph. Now if we analyze data distribution with mean and we set a threshold at two standard deviations (2\u0026sigma;) , which should account 95% of the variations in a normal distribution, we get:\n\n![mean and 2sigma](images/mean.png)\n\nas you can see  only one data point exceed our threshold. But if we replace the *mean* with the *median* and the 2\u0026sigma;  threshold with 2mad, we will get them all:\n\n![median and 2mad](images/mad.png)\n\nMedian Absolute Deviation is defined as:\n\n**MAD= *Const* * *median* ( |X\u003csub\u003ei\u003c/sub\u003e - X\u003csub\u003emed\u003c/sub\u003e )**\n\nWhere:\n- **X\u003csub\u003ei\u003c/sub\u003e** is the i-th value in the data block\n- **X\u003csub\u003emed\u003c/sub\u003e** is the median of the data block\n- **Const** is a *scale factor* depending on input data distribution (1.4826 for \"normal\" distribution)\n\nAs we want to *follow* the signal to calculate the *moving* Median Absolute Deviation we should also set the **size** of the \"rolling  window\" we will use calculate our MAD. \n\n## How MAD() works.\nThe ```mad()``` function uses the following syntax:\n```sql\nSELECT mad(\u003ccol\u003e)  OVER([PARTITION BY \u003ccol1\u003e] \n                         ORDER BY \u003ccol2\u003e \n                         [USING PARAMETERS\n                             [setsize=N] [, const=M]);                            \n```\nand returns the following columns:\n- ```rownum``` row number starting from 1 (it can be used to join mad() output with the source table (see Makefile)\n- ```median```  for the last ```setsize``` rows\n- ```mad``` (Median Absolute Deviation)  for the last ```setsize``` rows;\n- ```cutoff``` calculated as  ```abs(value-median)/mad``` of the current row with median and mad on the *rolling window* of the last ```setsize``` rows.\n\nThe Default Values for the function parameters, if not specified, are:\n- ```const``` = 1.4826 (scale factor fr normal distribution)\n-  ```setsize``` = 10\n\n## How to install MAD()\nPlease check (and eventually change) the Makefile before using it. Pay special attention to the C++ compiler executable name. Then:\n\n- First step is to compile the source code: ```make```\n- Then - as dbadmin - deploy the code in Vertica: ```make deploy```\n- You can run ```make test``` to check everything is ok\n\n\nExpected compile/deploy/test output:\n```bash\n$ make\ng++-4.8 -O3 -D HAVE_LONG_INT_64 -std=c++11 -Wall -shared -Wno-unused-value -fPIC -I /opt/vertica/sdk/include -o /tmp/lmad.so lmad.cpp /opt/vertica/sdk/include/Vertica.cpp\n$ make deploy\nCREATE OR REPLACE LIBRARY lmad AS '/tmp/lmad.so' LANGUAGE 'C++';\nCREATE LIBRARY\nCREATE OR REPLACE ANALYTIC FUNCTION mad AS LANGUAGE 'C++' NAME 'MadFactory' LIBRARY lmad ;\nCREATE ANALYTIC FUNCTION\nCREATE OR REPLACE ANALYTIC FUNCTION cutoff AS LANGUAGE 'C++' NAME 'CutoffFactory' LIBRARY lmad ;\nCREATE ANALYTIC FUNCTION\nGRANT EXECUTE ON ANALYTIC FUNCTION mad(x FLOAT) TO PUBLIC ;\nGRANT PRIVILEGE\nGRANT EXECUTE ON ANALYTIC FUNCTION cutoff(x FLOAT) TO PUBLIC ;\nGRANT PRIVILEGE\n$ make test\n           ts       | val  |   mad   | cutoff  | outlier\n--------------------+------+---------+---------+---------\n2019-10-11 09:12:00 |  3.0 |         |         |\n2019-10-11 09:12:01 |  4.0 |         |         |\n2019-10-11 09:12:02 |  5.0 |         |         |\n2019-10-11 09:12:03 |  6.0 |         |         |\n2019-10-11 09:12:04 |  7.0 |  1.4826 |  1.3490 |  \n2019-10-11 09:12:05 | 21.0 |  1.4826 | 10.1174 | *\n2019-10-11 09:12:06 | 10.0 |  2.9652 |  1.0117 |  \n2019-10-11 09:12:07 |  9.0 |  2.9652 |  0.0000 |  \n2019-10-11 09:12:08 | 24.0 |  4.4478 |  3.1476 | *\n2019-10-11 09:12:09 |  3.0 | 10.3782 |  0.6745 |  \n2019-10-11 09:12:10 |  3.0 |  8.8956 |  0.6745 |  \n2019-10-11 09:12:11 |  5.0 |  2.9652 |  0.0000 |  \n2019-10-11 09:12:12 |  8.0 |  2.9652 |  1.0117 |  \n2019-10-11 09:12:13 | 31.0 |  2.9652 |  8.7684 | *\n2019-10-11 09:12:14 |  8.0 |  4.4478 |  0.0000 |  \n2019-10-11 09:12:15 | 23.0 |  4.4478 |  3.3725 | *\n2019-10-11 09:12:16 |  9.0 |  1.4826 |  0.0000 |  \n2019-10-11 09:12:17 |  4.0 |  7.4130 |  0.6745 |  \n2019-10-11 09:12:18 |  3.0 |  5.9304 |  0.8431 |  \n2019-10-11 09:12:19 |  2.0 |  2.9652 |  0.6745 |  \n2019-10-11 09:12:20 |  1.0 |  1.4826 |  1.3490 |  \n(21 rows)\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmfelici%2Fmad","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmfelici%2Fmad","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmfelici%2Fmad/lists"}