{"id":13471884,"url":"https://github.com/logpai/loglizer","last_synced_at":"2026-02-20T17:11:42.800Z","repository":{"id":8549331,"uuid":"58811148","full_name":"logpai/loglizer","owner":"logpai","description":"A machine learning toolkit for log-based anomaly detection [ISSRE'16]","archived":false,"fork":false,"pushed_at":"2024-04-24T05:32:52.000Z","size":11102,"stargazers_count":1376,"open_issues_count":32,"forks_count":435,"subscribers_count":91,"default_branch":"master","last_synced_at":"2025-09-22T16:49:49.572Z","etag":null,"topics":["aiops","anomaly-detection","failure-diagnosis","log-analysis","machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/logpai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-05-14T14:08:20.000Z","updated_at":"2025-09-08T10:38:44.000Z","dependencies_parsed_at":"2024-11-29T04:01:48.113Z","dependency_job_id":null,"html_url":"https://github.com/logpai/loglizer","commit_stats":{"total_commits":117,"total_committers":6,"mean_commits":19.5,"dds":0.4358974358974359,"last_synced_commit":"ac67f9727acb660687a77b9c8042553aaa185cd3"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/logpai/loglizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logpai%2Floglizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logpai%2Floglizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logpai%2Floglizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logpai%2Floglizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/logpai","download_url":"https://codeload.github.com/logpai/loglizer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logpai%2Floglizer/sbom","scorecard":{"id":597433,"data":{"date":"2025-08-11","repo":{"name":"github.com/logpai/loglizer","commit":"ac67f9727acb660687a77b9c8042553aaa185cd3"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.9,"checks":[{"name":"Code-Review","score":0,"reason":"Found 1/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":9,"reason":"1 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2020-73"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 1 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-20T23:29:56.926Z","repository_id":8549331,"created_at":"2025-08-20T23:29:56.927Z","updated_at":"2025-08-20T23:29:56.927Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29658171,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-20T16:33:43.953Z","status":"ssl_error","status_checked_at":"2026-02-20T16:33:43.598Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aiops","anomaly-detection","failure-diagnosis","log-analysis","machine-learning"],"created_at":"2024-07-31T16:00:50.044Z","updated_at":"2026-02-20T17:11:42.774Z","avatar_url":"https://github.com/logpai.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","AI for *Ops"],"sub_categories":["Observability \u0026 Monitoring with AI"],"readme":"\u003cp align=\"center\"\u003e \u003ca href=\"https://github.com/logpai\"\u003e \u003cimg src=\"https://github.com/logpai/logpai.github.io/blob/master/img/logpai_logo.jpg\" width=\"425\"\u003e\u003c/a\u003e\u003c/p\u003e\n\n\n# loglizer\n\n\n**Loglizer is a machine learning-based log analysis toolkit for automated anomaly detection**. \n\u003e Loglizer是一款基于AI的日志大数据分析工具, 能用于自动异常检测、智能故障诊断等场景\n  \n\nLogs are imperative in the development and maintenance process of many software systems. They record detailed\nruntime information during system operation that allows developers and support engineers to monitor their systems and track abnormal behaviors and errors. Loglizer provides a toolkit that implements a number of machine-learning based log analysis techniques for automated anomaly detection. \n\n:telescope: If you use loglizer in your research for publication, please kindly cite the following paper.\n+ Shilin He, Jieming Zhu, Pinjia He, Michael R. Lyu. [Experience Report: System Log Analysis for Anomaly Detection](https://jiemingzhu.github.io/pub/slhe_issre2016.pdf), *IEEE International Symposium on Software Reliability Engineering (ISSRE)*, 2016. [[Bibtex](https://dblp.org/rec/bibtex/conf/issre/HeZHL16)][[中文版本](https://github.com/AmateurEvents/article/issues/2)]\n**(ISSRE Most Influential Paper)**\n\n## Framework\n\n![Framework of Anomaly Detection](/docs/img/framework.png)\n\nThe log analysis framework for anomaly detection usually comprises the following components:\n\n1. **Log collection:** Logs are generated at runtime and aggregated into a centralized place with a data streaming pipeline, such as Flume and Kafka. \n2. **Log parsing:** The goal of log parsing is to convert unstructured log messages into a map of structured events, based on which sophisticated machine learning models can be applied. The details of log parsing can be found at [our logparser project](https://github.com/logpai/logparser).\n3. **Feature extraction:** Structured logs can be sliced into short log sequences through interval window, sliding window, or session window. Then, feature extraction is performed to vectorize each log sequence, for example, using an event counting vector. \n4. **Anomaly detection:** Anomaly detection models are trained to check whether a given feature vector is an anomaly or not.\n\n\n## Models\n\nAnomaly detection models currently available:\n\n| Model | Paper reference |\n| :--- | :--- |\n| **Supervised models** |\n| LR | [**EuroSys'10**] [Fingerprinting the Datacenter: Automated Classification of Performance Crises](https://www.microsoft.com/en-us/research/wp-content/uploads/2009/07/hiLighter.pdf), by Peter Bodík, Moises Goldszmidt, Armando Fox, Hans Andersen. [**Microsoft**] |\n| Decision Tree | [**ICAC'04**] [Failure Diagnosis Using Decision Trees](http://www.cs.berkeley.edu/~brewer/papers/icac2004_chen_diagnosis.pdf), by Mike Chen, Alice X. Zheng, Jim Lloyd, Michael I. Jordan, Eric Brewer. [**eBay**] |\n| SVM | [**ICDM'07**] [Failure Prediction in IBM BlueGene/L Event Logs](https://www.researchgate.net/publication/4324148_Failure_Prediction_in_IBM_BlueGeneL_Event_Logs), by Yinglung Liang, Yanyong Zhang, Hui Xiong, Ramendra Sahoo. [**IBM**]|\n| **Unsupervised models** |\n| LOF | [**SIGMOD'00**] [LOF: Identifying Density-Based Local Outliers](), by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander. |\n| One-Class SVM | [**Neural Computation'01**] [Estimating the Support of a High-Dimensional Distribution](), by John Platt, Bernhard Schölkopf, John Shawe-Taylor, Alex J. Smola, Robert C. Williamson. |\n| Isolation Forest | [**ICDM'08**] [Isolation Forest](https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf), by Fei Tony Liu, Kai Ming Ting, Zhi-Hua Zhou. |\n| PCA | [**SOSP'09**] [Large-Scale System Problems Detection by Mining Console Logs](http://iiis.tsinghua.edu.cn/~weixu/files/sosp09.pdf), by Wei Xu, Ling Huang, Armando Fox, David Patterson, Michael I. Jordan. [**Intel**] |\n| Invariants Mining | [**ATC'10**] [Mining Invariants from Console Logs for System Problem Detection](https://www.usenix.org/legacy/event/atc10/tech/full_papers/Lou.pdf), by Jian-Guang Lou, Qiang Fu, Shengqi Yang, Ye Xu, Jiang Li. [**Microsoft**]|\n| Clustering | [**ICSE'16**] [Log Clustering based Problem Identification for Online Service Systems](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/ICSE-2016-2-Log-Clustering-based-Problem-Identification-for-Online-Service-Systems.pdf), by Qingwei Lin, Hongyu Zhang, Jian-Guang Lou, Yu Zhang, Xuewei Chen. [**Microsoft**]|\n| DeepLog (coming)| [**CCS'17**] [DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning](https://www.cs.utah.edu/~lifeifei/papers/deeplog.pdf), by Min Du, Feifei Li, Guineng Zheng, Vivek Srikumar. |\n| AutoEncoder (coming)| [**Arxiv'18**] [Anomaly Detection using Autoencoders in High Performance Computing Systems](https://arxiv.org/abs/1811.05269), by Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini. |\n\n\n## Log data\nWe have collected a set of labeled log datasets in [loghub](https://github.com/logpai/loghub) for research purposes. If you are interested in the datasets, please follow the link to submit your access request.\n\n## Install\n```bash\ngit clone https://github.com/logpai/loglizer.git\ncd loglizer\npip install -r requirements.txt\n```\n\n## API usage\n\n```python\n# Load HDFS dataset. If you would like to try your own log, you need to rewrite the load function.\n(x_train, y_train), (x_test, y_test) = dataloader.load_HDFS(...)\n\n# Feature extraction and transformation\nfeature_extractor = preprocessing.FeatureExtractor()\nfeature_extractor.fit_transform(...) \n\n# Model training\nmodel = PCA()\nmodel.fit(...)\n\n# Feature transform after fitting\nx_test = feature_extractor.transform(...)\n# Model evaluation with labeled data\nmodel.evaluate(...)\n\n# Anomaly prediction\nx_test = feature_extractor.transform(...)\nmodel.predict(...) # predict anomalies on given data\n```\n\nFor more details, please follow [the demo](./docs/demo.md) in the docs to get started. Please note that all ML models are not magic, you need to figure out how to tune the parameters in order to make them work on your own data. \n\n## Benchmarking results \n\nIf you would like to reproduce the following results, please run [benchmarks/HDFS_bechmark.py](./benchmarks/HDFS_bechmark.py) on the full HDFS dataset (HDFS100k is for demo only).\n\n|       |            | HDFS |     |\n| :----:|:----:|:----:|:----:|\n| **Model** | **Precision** | **Recall** | **F1** |\n| LR| 0.955 |\t0.911 |\t0.933 |\n| Decision Tree | 0.998 |\t0.998 |\t0.998 |\n| SVM| 0.959 |\t0.970 |\t0.965 |\n| LOF | 0.967 | 0.561 | 0.710 |\n| One-Class SVM | 0.995 | 0.222| 0.363 |\n| Isolation Forest |  0.830 | 0.776 | 0.802 |\n| PCA | 0.975 | 0.635 | 0.769|\n| Invariants Mining | 0.888 | 0.945 | 0.915|\n| Clustering | 1.000 | 0.720 | 0.837 |\n\n## Contributors\n+ [Shilin He](https://shilinhe.github.io), The Chinese University of Hong Kong\n+ [Jieming Zhu](https://jiemingzhu.github.io), The Chinese University of Hong Kong, currently at Huawei Noah's Ark Lab\n+ [Pinjia He](https://pinjiahe.github.io/), The Chinese University of Hong Kong, currently at ETH Zurich\n\n\n## Feedback\nFor any questions or feedback, please post to [the issue page](https://github.com/logpai/loglizer/issues/new). \n\n\n## History\n* May 14, 2016: initial commit \n* Sep 21, 2017: update code and readme \n* Mar 21, 2018: rewrite most of the code and add detailed comments\n* Feb 18, 2019: restructure the repository with hands-on demo\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flogpai%2Floglizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flogpai%2Floglizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flogpai%2Floglizer/lists"}