{"id":22072393,"url":"https://github.com/jadelhelm/autoad","last_synced_at":"2026-02-27T18:09:31.738Z","repository":{"id":265310491,"uuid":"867556447","full_name":"JAdelhelm/AutoAD","owner":"JAdelhelm","description":"AutoAD - A framework for the rapid detection of anomalies in (big) datasets","archived":false,"fork":false,"pushed_at":"2024-12-04T06:59:21.000Z","size":2540,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-23T19:26:09.720Z","etag":null,"topics":["anomaly-detection","anomalydetection","automated","preprocessing","preprocessing-pipeline","pyod","python","sklearn","sklearn-pipeline"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JAdelhelm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-04T09:38:10.000Z","updated_at":"2025-01-05T11:37:17.000Z","dependencies_parsed_at":"2024-11-29T09:11:55.544Z","dependency_job_id":null,"html_url":"https://github.com/JAdelhelm/AutoAD","commit_stats":null,"previous_names":["jadelhelm/autoad"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JAdelhelm%2FAutoAD","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JAdelhelm%2FAutoAD/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JAdelhelm%2FAutoAD/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JAdelhelm%2FAutoAD/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JAdelhelm","download_url":"https://codeload.github.com/JAdelhelm/AutoAD/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248075519,"owners_count":21043602,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","anomalydetection","automated","preprocessing","preprocessing-pipeline","pyod","python","sklearn","sklearn-pipeline"],"created_at":"2024-11-30T21:12:27.060Z","updated_at":"2026-02-27T18:09:31.709Z","avatar_url":"https://github.com/JAdelhelm.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AutoAD -  A framework for the rapid detection of anomalies in (big) datasets\n\n\n\n\n\n## Basic Usage\n\n```bash\nconda create -n auto_ad python=3.11\nconda activate auto_ad\n\ncd AutoAD\n\npip install -r requirements.txt\n```\n---\n\n## Examples\n\nAlso Checkout the Examples in ``examples.ipynb``.\n\n````python\nimport pandas as pd\nimport numpy as np\nfrom sklearn import set_config\n\nset_config(transform_output=\"pandas\")\n\nX_train = pd.DataFrame({\n\n    'ID': [1, 2, 3, 4],                 \n    'Name': ['Alice', 'Alice', 'Alice', \"Alice\"],  \n    'Rank': ['A','B','C','D'],\n    'Age': [25, 30, 35, 40],                 \n    'Salary': [50000.00, 60000.50, 75000.75, 8_000], \n    'Hire Date': pd.to_datetime(['2020-01-15', '2019-05-22', '2018-08-30', '2021-04-12']), \n    'Is Manager': [False, True, False, \"\"]  \n})\nX_test = pd.DataFrame({\n\n    'ID': [1, 2, 3, 4],                 \n    'Name': ['Alice', 'Alice', 'Alice', \"Bob\"],  \n    'Rank': ['A','B','C','D'],\n    'Age': [25, 30, 35, np.nan],                 \n    'Salary': [50000.00, 60000.50, 75000.75, 8_000_000], \n    'Hire Date': pd.to_datetime(['2020-01-15', '2019-05-22', '2018-08-30', '2021-04-12']), \n    'Is Manager': [False, True, False, \"\"]  \n})\n\n########################################\nimport pdb\nfrom autoad.autoad import AutoAD\nfrom pyod.models.iforest import IForest\n\n\npipeline_ad = AutoAD()\n\n\npipeline_ad.fit(X=X_train, clf_ad=IForest())\nX_transformed = pipeline_ad.transform(X=X_test)\nX_transformed\n````\n\n## Highlights ⭐\n\n\n#### 📌 Implementation of univariate methods / *Detection of univariate anomalies*\nBoth methods (MOD Z-Value and Tukey Method) are resilient against outliers, ensuring that the position measurement will not be biased. They also support multivariate anomaly detection algorithms in identifying univariate anomalies.\n\n#### 📌 BinaryEncoder instead of OneHotEncoder for nominal columns / *Big Data and Performance*\n   Newest research shows similar results for encoding nominal columns with significantly fewer dimensions.\n   - (John T. Hancock and Taghi M. Khoshgoftaar. \"Survey on categorical data for neural networks.\" In: Journal of Big Data 7.1 (2020), pp. 1–41.), Tables 2, 4\n   - (Diogo Seca and João Mendes-Moreira. \"Benchmark of Encoders of Nominal Features for Regression.\" In: World Conference on Information Systems and Technologies. 2021, pp. 146–155.), P. 151\n\n#### 📌 Transformation of time series and standardization / *Normalization for better prediction results*\n\n#### 📌 Labeling of NaN values instead of removing them / *No loss of information*\n\n\n\n---\n\n\n\n\n\n## Pipeline - Built-in Logic\n\u003c!-- ![Logic of Pipeline](./AutoAD/autoad/img/decision_rules.png) --\u003e\n![Logic of Pipeline](https://github.com/JAdelhelm/AutoAD/blob/a51a441dc9b0c3502fd1056f9fddbfbf5d661308/AutoAD/autoad/img/decision_rules.png) \n\n- I used sklearn's Pipeline and Transformer concept to create this preprocessing pipeline\n    - Pipeline: https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html\n    - Transformer: https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html\n\n\n\n\n\n---\n\n\u003c!--\n### Reference\n- https://www.researchgate.net/publication/379640146_Detektion_von_Anomalien_in_der_Datenqualitatskontrolle_mittels_unuberwachter_Ansatze (German Thesis)\n--\u003e\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjadelhelm%2Fautoad","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjadelhelm%2Fautoad","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjadelhelm%2Fautoad/lists"}