Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zhangruochi/logcluster
https://github.com/zhangruochi/logcluster
Last synced: 2 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/zhangruochi/logcluster
- Owner: zhangruochi
- Created: 2018-01-30T01:53:14.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2019-03-04T01:47:36.000Z (over 5 years ago)
- Last Synced: 2024-04-16T03:56:53.512Z (7 months ago)
- Language: Python
- Size: 12.7 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## log clustering project
### file specification
**cluster.py**
- implement log clustering
- conmmand: python3 cluster.py
- to change the parameters, you need modify the [CLUSTER] section in config.cof**counting.py**
- an assistant algorithm for cluster.py, you can use it to find the abnormal logs which have some lower-frequency words
- command: python3 counting.py
- to change the parameters, you need modify the [COUNTING] section in config.cof**database.py**
- the result of clustering is saved in a database(pickle), you can access it to get more details.
- command: python3 database.py database_name**config.cof**
- the config file### parameters specification of config.cof
- **filename**:
1. string
2. the name of log file you want to deal with- **clusters**:
1. int
2. the number of cluster you want to get- **max**:
1. float in range [0.0, 1.0] or int
2. When building the vocabulary ignore terms that have a document frequency strictly higher than the given threshold (corpus-specific stop words). If float, the parameter represents a proportion of documents, integer absolute counts.- **min**:
1. float in range [0.0, 1.0] or int
2. When building the vocabulary ignore terms that have a document frequency strictly lower than the given threshold. This value is also called cut-off in the literature. If float, the parameter represents a proportion of documents, integer absolute counts.- **chars**:
1. list of string
2. the chars not used for split log- **threshold**:
1. int
2. if the frequent of a word is lower than hreshold, we define this word as low-frequency words- **num**:
1. int
2. if the number of low-frequency words in a log is larger than this parameter, we define this log as abnormal