https://github.com/mrrazor22/draindotnet
C# port of LogPai’s Drain log parser - fast, reliable log template mining for .NET
https://github.com/mrrazor22/draindotnet
csharp dotnet drain log-analysis log-mining log-templating logging logpai logparser
Last synced: 29 days ago
JSON representation
C# port of LogPai’s Drain log parser - fast, reliable log template mining for .NET
- Host: GitHub
- URL: https://github.com/mrrazor22/draindotnet
- Owner: MrRazor22
- License: mit
- Created: 2025-09-07T20:59:37.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2025-10-05T16:39:41.000Z (4 months ago)
- Last Synced: 2025-12-21T18:51:25.543Z (about 2 months ago)
- Topics: csharp, dotnet, drain, log-analysis, log-mining, log-templating, logging, logpai, logparser
- Language: C#
- Homepage: https://github.com/logpai/logparser
- Size: 1.33 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# DrainDotNet
DrainDotNet is a C# port and improvement of [LogPai’s Drain log parser](https://github.com/logpai/logparser), with several improvements to make it faster, more reliable, and more user-friendly. It takes raw logs and automatically groups them into templates so you can easily see log patterns.
## Key Improvements over the original Drain
- Core + Wrapper Split: The code is cleanly split into:
- `DrainCore` → the pure clustering algorithm (tree, similarity, templates). No I/O.
- `LogParser` → a wrapper that handles regex-based parsing, preprocessing, saving to CSV, and reloading later.
This makes it easier to maintain and test the core logic separately.
- **UniqueEventPatterns**: You can provide regex patterns that mark certain tokens as *important*. If a log contains these tokens and they change, DrainDotNet will always create a new event/template instead of merging them. This gives you more control over clustering.
- **Faster Parameter Extraction**: The original Drain used regex-heavy logic for extracting parameters. DrainDotNet uses a simpler, token-based method that:
- Runs much faster (no heavy regex overhead).
- Handles tricky cases like `time: 15> ms`, which used to confuse Drain and produce broken templates like `time: <*>>`.
- **Edge Case Handling**: Robust against logs with odd punctuation or mixed tokens.
- **Strongly Typed Output**: `Parse()` returns a `List` in code (with `LineId`, `Content`, `EventId`, `EventTemplate`, `ParameterList`, and extra fields), so you don’t have to re-parse CSVs if you want to use results directly.
- **Optional Auto-Save**: Results are saved to CSV by default. You can disable this with `autoSave: false` if you only want in-memory results.
- **Deterministic Reloading**: Use ReloadResults() to rehydrate ParsedLog objects from CSV after an app restart.
- **MD5 Hash Event IDs**: Templates get stable 8-character Event IDs. Collisions are theoretically possible, but for typical datasets (even 100k+ templates) it’s practically safe.
## How to use
1. Put your log file in the `data` folder (see `Program.cs` for path).
2. Build and run the project.
3. Results will be written into the `outputDir` path specified:
- `*_structured.csv` — each log line matched with a template (includes `ParameterList`).
- `*_templates.csv` — unique log templates with counts.
Or use directly in code:
```csharp
using DrainDotNet;
var logFormat = "
var parser = new LogParser(logFormat, indir: "./data/", outdir: "./result/");
// Parse logs and also save CSVs (default)
var parsedLogs = parser.Parse("HDFS.log");
// Parse logs but keep results in memory only
var parsedInMemory = parser.Parse("HDFS.log", autoSave: false);
// Reload results later (if auto saved) from saved CSVs
var reloaded = parser.ReloadResults("HDFS.log");
```
## Using as a CLI Tool **(new)**
DrainDotNet is also available as a .NET global tool, so you can parse logs directly from the command line without writing code.
### Install
```sh
dotnet tool install -g DrainDotNet.Tool
```
### Usage
```sh
draindotnet parse --log --format "" [--indir ] [--out ]
```
## Example (HDFS sample)
```sh
draindotnet parse --log HDFS_2k.log --format "
This will generate:
- HDFS_2k.log_structured.csv → structured logs with parameters
- HDFS_2k.log_templates.csv → unique log templates with counts
## License
Apache 2.0 (same as the original Drain).