{"id":13474379,"url":"https://github.com/WeibinMeng/FT-Tree","last_synced_at":"2025-03-26T21:31:28.398Z","repository":{"id":201876496,"uuid":"153569366","full_name":"WeibinMeng/FT-Tree","owner":"WeibinMeng","description":"IWQoS 2017: A toolkit for log template extraction","archived":false,"fork":false,"pushed_at":"2022-09-21T03:38:27.000Z","size":433,"stargazers_count":155,"open_issues_count":3,"forks_count":28,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-30T07:47:42.829Z","etag":null,"topics":["log-analysis","log-template"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WeibinMeng.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-10-18T05:42:36.000Z","updated_at":"2024-10-01T08:39:09.000Z","dependencies_parsed_at":"2024-03-28T17:02:02.374Z","dependency_job_id":null,"html_url":"https://github.com/WeibinMeng/FT-Tree","commit_stats":null,"previous_names":["weibinmeng/ft-tree"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeibinMeng%2FFT-Tree","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeibinMeng%2FFT-Tree/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeibinMeng%2FFT-Tree/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeibinMeng%2FFT-Tree/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WeibinMeng","download_url":"https://codeload.github.com/WeibinMeng/FT-Tree/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245738630,"owners_count":20664320,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["log-analysis","log-template"],"created_at":"2024-07-31T16:01:11.931Z","updated_at":"2025-03-26T21:31:28.020Z","avatar_url":"https://github.com/WeibinMeng.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Paper\n\nOur paper is published on IEEE/ACM International Symposium on Quality of Service ([IWQoS 2017](http://iwqos2017.ieee-iwqos.org/),) and [IEEE Access 2020](http://ieeeaccess.ieee.org/). The information can be found here:\n\n* Shenglin Zhang, Weibin Meng, Jiahao Bu, Sen Yang, Ying Liu, Dan Pei, Jun(Jim) Xu, Yu Chen, Hui Dong, Xianping Qu, Lei Song. **Syslog Processing for Switch Failure Diagnosis and Prediction in Datacenter Networks**.  Vilanova i la Geltrú, Barcelona, Spain, 14-16 June 2017.[paper link](https://netman.aiops.org/wp-content/uploads/2015/12/IWQOS_2017_zsl.pdf)\n* Shenglin Zhang, Ying Liu, Weibin Meng, Jiahao Bu, Sen Yang, Yongqian sun, Dan Pei, Jun Xu, Yuzhi Zhang, Lei Sone, Ming Zhang. **Efficient and Robust Syslog Parsing for Network Devices in Datacenter Networks.**  [paper link](https://netman.aiops.org/wp-content/uploads/2020/02/FT-tree-IEEE-Access20.pdf)\n\n## Environment： \n\tpython3, pygraphviz (if draw tree)\n\n## Quick Start:\n### Train：\n* python main\\_train.py -train\\_log\\_path training.log -out\\_seq_path output.seq  -templates output.template\n\t* Parameters：\n\t\t* -train\\_log\\_path： rawlog path\n\t\t* -out\\_seq_path：template index file\n\t\t* -templates：template file\n\n\n### Match：\n* python main_match.py -templates ./output.template -logs training.log\n\t* Parameters：\n\t\t* -templates： template path\n\t\t* -logs：logs which need to match\n\t\n\n## 训练\u0026匹配日志整合：\n* 运行命令：python main\\_train.py -train\\_log\\_path training.log -out\\_seq_path output.seq  -templates output.template\n\t* 参数解释：\n\t\t* -train\\_log\\_path： 训练所需要的原始日志\n\t\t* -out\\_seq_path：日志匹配完之后的编号序列 \n\t\t* -templates：输出的模板文件 \n\t* **注意：使用该算法时，最好将数据集中前面几列时间、消息类型的信息删掉，要不然比较乱**。\n\n\n## 每个文件的作用\n### 训练日志模板：\n* 输出文件：模板、单词词频列表\n\t* 运行脚本的的命令：\n\t\t* python ft\\_tree.py -FIRST\\_COL 0 -NO\\_CUTTING 1 -CUTTING\\_PERCENT 0.3 -data\\_path ./training.log -template_path ./output.template -fre\\_word\\_path ./output.fre -picture\\_path ./tree.png -leaf\\_num 4 -short\\_threshold 5 -plot\\_flag 1\n\t* 参数样例：\n\t   * FIRST\\_COL 每行日志从第几列作为输入，默认为0\n\t   * NO\\_CUTTING = 0 #初步设定1时，是前30% 不剪枝 ,全局开关， 当其为0时，全局按照min_threshold剪枝\n\t   * CUTTING\\_PERCENT =0.6 #前百分之多少是不剪枝的 \n\t\t* train\\_log\\_path='input.txt'\n\t   *   template\\_path = \"./logTemplate.txt\" #模板\n\t    *   fre\\_word\\_path = \"./fre_word.txt\"   #\n\t    *   leaf\\_num = 4 #剪枝数\n\t    *    picture\\_path = './tree.png'\n\t    *  short\\_threshold = 2 #过滤掉长度小于5的日志\n\t    *  plot\\_flag 默认为0，不画图，若为1，则将ft\\_tree画出来，会同时画出“短模板”（蓝色）和“剪枝结点”(红色)\n\n\t\n### 匹配ft-tree的日志模板:\n* 运行脚本的的命令：\n\t* python3 matchTemplate.py -short\\_threshold 5 -leaf\\_num 6 -template\\_path ./output.template -fre\\_word_path ./output.fre -log\\_path ./training.log -out\\_seq\\_path ./output.seq -plot\\_flag 0 -CUTTING\\_PERCENT 0.3 -NO\\_CUTTING 1 **-match\\_model 1**\n\t\t\n* 参数样例：\n\t*\tshort\\_threshold = 2 #过滤掉长度小于5的日志\n\t*  leaf\\_num 增量学习时的剪枝阈值。(如果将6改成10，可以通过样例数据看出不同匹配机制中的不同效果，即LearnTemplateByIntervals会对新来的数据做剪枝)\n\t*  template\\_path = './output.template'\n\t*  fre\\_word\\_path = './output.fre'\n\t*  runtime\\_log\\_path = './new.log'\n\t*  out\\_seq\\_path = './output.seq'\n\t*  plot\\_flag 0为不画图，1为画图，默认为0。（如树太大不要画图，会卡死）\n\t*  CUTTING\\_PERCENT 指定每条日志的前百分之几的单词不剪枝，增量学习时会用到，正常匹配用不到\n\t*  NO\\_CUTTING 是否每条日志的前几个单词不剪枝，0为正常剪枝，1为不剪枝，默认为1。增量学习时会用到，正常匹配用不到\n\t*  match\\_model 1:正常匹配日志  2:单条增量学习\u0026匹配 3:批量增量学习\u0026匹配\n* 增量学习模板：\n\t* matchLogsAndLearnTemplateOneByOne()函数  单条匹配，如果匹配不到，则学习新的模板。会将新学到的模板插入到模板文件的最后。\n\t* matchLogsFromFile() 函数，正常匹配日志，如果匹配不到，则为模板序号为0\n\t* LearnTemplateByIntervals(）函数， 将一时段的日志作为输入，基于以前的模板增量学习，新添加的日志模板也会按照设定的阈值剪枝，最终将新学到的模板插入到模板文件的最后。\n\t\t\t例如在样例数据中，假设新来的日志为newlogs.dat， 原始的模板树为Trace\\_train.png,当剪枝k=6时（如图reBuildTree\\_k6），会剪枝，当阈值为10时（如图reBuildTree\\_k10），会保留一些变量\n\n\n### 日志模板按照原始日志单词顺序排序:\n将模板中的单词按照原日志中的单词顺序排列,得到**正序模板**\n\n* 运行脚本的的命令：\n\t*  python3 orderWords.py -templates ./output.template -sequences ./output.seq -rawlog ./training.log -order\\_templates ./output.template\\_order\n\n### 按照正序模板匹配日志:\n按照日志原先的单词顺序匹配\n\n* 运行脚本的的命令：\n\t* python3 matchTemplate.py -short\\_threshold 5 -leaf\\_num 6 -template\\_path ./output.template\\_order -log\\_path ./training.log -out\\_seq\\_path ./output2.seq -plot\\_flag 1 -CUTTING\\_PERCENT 0.3 -NO\\_CUTTING 1 **-match\\_model 4**\n\t\n### splitTimeWindows.py:\n 模板分析：切分时间窗口，然后统计正常时段、异常时段、全部时段中出现top10的模板，并且画图\n\n\n### countFreTemplates.py:\n 模板分析：输出前10个常出现的模板，以及每个模板对应的日志\n \n \nThis code was completed by [@Weibin Meng](https://github.com/WeibinMeng).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FWeibinMeng%2FFT-Tree","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FWeibinMeng%2FFT-Tree","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FWeibinMeng%2FFT-Tree/lists"}