Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jtmoon79/super-speedy-syslog-searcher
Speedily search and merge log messages by datetime
https://github.com/jtmoon79/super-speedy-syslog-searcher
log log-parser log-parsing logging logs merge rust sort syslog syslog-messages syslog-parser
Last synced: 12 days ago
JSON representation
Speedily search and merge log messages by datetime
- Host: GitHub
- URL: https://github.com/jtmoon79/super-speedy-syslog-searcher
- Owner: jtmoon79
- License: mit
- Created: 2021-09-18T07:50:30.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-23T06:34:16.000Z (7 months ago)
- Last Synced: 2024-04-23T11:08:37.971Z (7 months ago)
- Topics: log, log-parser, log-parsing, logging, logs, merge, rust, sort, syslog, syslog-messages, syslog-parser
- Language: Rust
- Homepage:
- Size: 19.8 MB
- Stars: 17
- Watchers: 1
- Forks: 0
- Open Issues: 78
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Super Speedy Syslog Searcher! (`s4`)
Speedily search and merge log messages by datetime.
[![MSRV](https://img.shields.io/crates/msrv/super_speedy_syslog_searcher/0.7.75?logo=rust&logoColor=800000&cacheSeconds=6000)](https://github.com/jtmoon79/super-speedy-syslog-searcher/blob/0.7.75/Cargo.toml#L21)
[![License](https://img.shields.io/crates/l/super-speedy-syslog-searcher?style=flat-square)](https://github.com/jtmoon79/super-speedy-syslog-searcher/blob/main/LICENSE.txt)
[![docs.rs](https://img.shields.io/docsrs/super_speedy_syslog_searcher/0.7.75?badge.svg&style=flat-square&logo=docsdotrs)](https://docs.rs/super_speedy_syslog_searcher/0.7.75/)[![crates.io version](https://img.shields.io/crates/v/super-speedy-syslog-searcher.svg?style=flat-square&logo=rust&logoColor=800000?branch=0.7.75&version=0.7.75)](https://crates.io/crates/super-speedy-syslog-searcher/0.7.75)
[![crates.io downloads](https://img.shields.io/crates/d/super-speedy-syslog-searcher.svg?style=flat-square&logo=rust&logoColor=800000)](https://crates.io/crates/super-speedy-syslog-searcher#:~:text=Downloads%20all%20time)
[![crates.io downloads (version)](https://img.shields.io/crates/dv/super_speedy_syslog_searcher/0.7.75?style=flat-square&logo=rust&logoColor=800000)](https://crates.io/crates/super-speedy-syslog-searcher/0.7.75)
[![CHANGELOG](https://img.shields.io/badge/CHANGELOG-blue?style=flat-square&logo=keep-a-changelog&logoColor=FFFFFF&color=E05735)](https://github.com/jtmoon79/super-speedy-syslog-searcher/blob/main/CHANGELOG.md#0775)
[![lib.rs](https://img.shields.io/badge/lib.rs-white?style=flat-square&logo=rust&logoColor=202020)](https://lib.rs/crates/super_speedy_syslog_searcher/)[![Build status](https://img.shields.io/github/actions/workflow/status/jtmoon79/super-speedy-syslog-searcher/rust.yml?branch=0.7.75&style=flat-square&logo=github&logoColor=000000)](https://github.com/jtmoon79/super-speedy-syslog-searcher/actions?query=workflow%3Arust)
[![coveralls.io](https://img.shields.io/coverallsCoverage/github/jtmoon79/super-speedy-syslog-searcher?style=flat-square&logo=coveralls&logoColor=b94947&branch=0.7.75&version=0.7.75)](https://coveralls.io/github/jtmoon79/super-speedy-syslog-searcher?branch=0.7.75)
[![Commits since](https://img.shields.io/github/commits-since/jtmoon79/super-speedy-syslog-searcher/0.7.75.svg?logo=github&logoColor=000000)](https://github.com/jtmoon79/super-speedy-syslog-searcher/commits/main)_Super Speedy Syslog Searcher_ (`s4`) is a command-line tool to search
and merge varying log messages from varying log files, sorted by datetime.
Datetime filters may be passed to narrow the search to a datetime range.`s4` can read standardized log message formats like RFC 3164 and RFC 5424
("syslog"),
Red Hat Audit logs, strace output, and can read many non-standardized ad-hoc log
message formats, including multi-line log messages.
It also parses binary accounting records acct, lastlog, and utmp
(`acct`, `pacct`, `lastlog`, `utmp`, `utmpx`, `wtmp`),
systemd journal logs (`.journal`), and Microsoft Event Logs (`.evtx`).
`s4` can read logs that are compressed (`.bz2`, `.gz`, `.lz4`, `.xz`), or archived logs (`.tar`).`s4` aims to be very fast.
---
- [Use](#use)
- [Install `super_speedy_syslog_searcher`](#install-super_speedy_syslog_searcher)
- [allocator `mimalloc` or `jemalloc`](#allocator-mimalloc-or-jemalloc)
- [Alpine](#alpine)
- [Debian and Ubuntu](#debian-and-ubuntu)
- [OpenSUSE](#opensuse)
- [Red Hat and CentOS](#red-hat-and-centos)
- [feature `mimalloc` on Windows](#feature-mimalloc-on-windows)
- [Run `s4`](#run-s4)
- [`--help`](#--help)
- [About](#about)
- [Why `s4`?](#why-s4)
- [Features](#features)
- [File name guessing](#file-name-guessing)
- [Directory walks](#directory-walks)
- [Limitations](#limitations)
- [Hacks](#hacks)
- [More](#more)
- [Comparisons](#comparisons)
- [General Features](#general-features)
- [Formal Log DateTime Supported](#formal-log-datetime-supported)
- [Other Log or File Formats Supported](#other-log-or-file-formats-supported)
- [Archive Formats Supported](#archive-formats-supported)
- [Speed Comparison](#speed-comparison)
- [Building locally](#building-locally)
- [Parsing `.journal` files](#parsing-journal-files)
- [Requesting Support For DateTime Formats; your particular log file](#requesting-support-for-datetime-formats-your-particular-log-file)
- ["syslog" and other project definitions](#syslog-and-other-project-definitions)
- [syslog](#syslog)
- [log message](#log-message)
- [logging chaos: the problem `s4` solves](#logging-chaos-the-problem-s4-solves)
- [open-source software examples](#open-source-software-examples)
- [nginx webserver](#nginx-webserver)
- [Debian 11](#debian-11)
- [binary files](#binary-files)
- [commercial software examples](#commercial-software-examples)
- [Synology DiskStation](#synology-diskstation)
- [Mac OS 12](#mac-os-12)
- [Microsoft Windows 10](#microsoft-windows-10)
- [Summary](#summary)
- [Further Reading](#further-reading)
- [Stargazers](#stargazers)---
## Use
### Install `super_speedy_syslog_searcher`
Assuming [rust is installed], run
```lang-text
cargo install --locked super_speedy_syslog_searcher
```A C compiler is required.
[rust is installed]: https://www.rust-lang.org/tools/install
#### allocator `mimalloc` or `jemalloc`
The default allocator is the System allocator.
Allocator [`mimalloc`] is feature `mimalloc` and allocator [`jemalloc`] is feature `jemalloc`.
Allocator `mimalloc` [is the fastest according to `mimalloc` project benchmarks].
`jemalloc` is also very good.
`mimalloc`
```lang-text
cargo install --locked super_speedy_syslog_searcher --features mimalloc
```Error `Bus error` is a known issue on some `aarch64-unknown-linux-gnu` systems.
```lang-text
$ s4 --version
Bus error
```Either use `jemalloc` or the default System allocator.
`jemalloc`
```lang-text
cargo install --locked super_speedy_syslog_searcher --features jemalloc
```
[`jemalloc`]: http://jemalloc.net/
[`mimalloc`]: https://microsoft.github.io/mimalloc/bench.html
[is the fastest according to `mimalloc` project benchmarks]: https://github.com/microsoft/mimalloc#Performance
Here are the packages for building `super_speedy_syslog_searcher` with `jemalloc` or `mimalloc`
on various Operating Systems.##### Alpine
```lang-text
apk add gcc make musl-dev
```##### Debian and Ubuntu
```lang-text
apt install gcc make libc6-dev
```or
```lang-text
apt install build-essential
```##### OpenSUSE
```lang-text
zypper install gcc glibc-devel make
```##### Red Hat and CentOS
```lang-text
yum install gcc glibc-devel make
```##### feature `mimalloc` on Windows
Compiling `mimalloc` on Windows requires `lib.exe` which is part of _Visual Studio Build Tools_.
Instructions at [rustup.rs].[rustup.rs]: https://rustup.rs/
### Run `s4`
For example, print all the log messages in syslog files under `/var/log/`
```lang-text
s4 /var/log
```On Windows, print the ad-hoc logs under `C:\Windows\Logs`
```lang-text
s4.exe C:\Windows\Logs
```On Windows, print all `.log` files under `C:\Windows` (with the help of Powershell)
```lang-powershell
Get-ChildItem -Filter '*.log' -File -Path "C:\Windows" -Recurse -ErrorAction SilentlyContinue `
| Select-Object -ExpandProperty FullName `
| s4.exe -
```β’ note that UTF-16 encoded logs cannot be parsed, see [Issue #16]
β’ note that opening too many files causes error _too many files open_, see [Issue #270], so `Get-ChildItem -Filter` lessens the number of files opened by `s4.exe`On Windows, print the [Windows Event logs]
```lang-text
s4.exe C:\Windows\System32\winevt\Logs
```Print the log messages after January 1, 2022 at 00:00:00
```lang-text
s4 /var/log -a 20220101
```Print the log messages from January 1, 2022 00:00:00 to January 2, 2022
```lang-text
s4 /var/log -a 20220101 -b 20220102
```or
```lang-text
s4 /var/log -a 20220101 -b @+1d
```Print the log messages on January 1, 2022, from 12:00:00 to 16:00:00
```lang-text
s4 /var/log -a 20220101T120000 -b 20220101T160000
```Print the record-keeping log messages from up to a day ago
(with the help of `find`)```lang-text
find /var -xdev -type f \( \
-name 'lastlog' \
-or -name 'wtmp' \
-or -name 'wtmpx' \
-or -name 'utmp' \
-or -name 'utmpx' \
-or -name 'acct' \
-or -name 'pacct' \
\) \
2>/dev/null \
| s4 - -a=-1d
```Print the journal log messages from up to an hour ago,
prepending the journal file name
(with the help of `find`)```lang-text
find / -xdev -name '*.journal' -type f 2>/dev/null \
| s4 - -a=-1h -n
```Print only the log messages that occurred two days ago
(with the help of GNU `date`)```lang-text
s4 /var/log -a $(date -d "2 days ago" '+%Y%m%d') -b @+1d
```Print only the log messages that occurred two days ago during the noon hour
(with the help of GNU `date`)```lang-text
s4 /var/log -a $(date -d "2 days ago 12" '+%Y%m%dT%H%M%S') -b @+1h
```Print only the log messages that occurred two days ago during the noon hour in
Bengaluru, India (timezone offset +05:30) and prepended with equivalent UTC
datetime (with the help of GNU `date`)```lang-text
s4 /var/log -u -a $(date -d "2 days ago 12" '+%Y%m%dT%H%M%S+05:30') -b @+1h
```[Windows Event logs]: https://github.com/libyal/libevtx/blob/126297f7f0e325f9e2cd27b0b60d3cf02ffdfd04/documentation/Windows%20XML%20Event%20Log%20(EVTX).asciidoc
[Issue #16]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/16
[Issue #270]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/270### `--help`
```lang-text
Speedily search and merge log messages by datetime.
DateTime filters may be passed to narrow the search.
s4 aims to be very fast.Usage: s4 [OPTIONS] ...
Arguments:
... Path(s) of log files or directories.
Directories will be recursed. Symlinks will be followed.
Paths may also be passed via STDIN, one per line. The user must
supply argument "-" to signify PATHS are available from STDIN.Options:
-a, --dt-after
DateTime Filter After: print log messages with a datetime that is at
or after this datetime. For example, "20200102T120000" or "-5d".
-b, --dt-before
DateTime Filter Before: print log messages with a datetime that is at
or before this datetime.
For example, "2020-01-03T23:00:00.321-05:30" or "@+1d+11h"
-t, --tz-offset
Default timezone offset for datetimes without a timezone.
For example, log message "[20200102T120000] Starting service" has a
datetime substring "20200102T120000".
That datetime substring does not have a timezone offset
so this TZ_OFFSET value would be used.
Example values, "+12", "-0800", "+02:00", or "EDT".
To pass a value with leading "-" use "=" notation, e.g. "-t=-0800".
If not passed then the local system timezone offset is used.
[default: -07:00]
-z, --prepend-tz
Prepend a DateTime in the timezone PREPEND_TZ for every line.
Used in PREPEND_DT_FORMAT.
-u, --prepend-utc
Prepend a DateTime in the UTC timezone offset for every line.
This is the same as "--prepend-tz Z".
Used in PREPEND_DT_FORMAT.
-l, --prepend-local
Prepend DateTime in the local system timezone offset for every line.
This is the same as "--prepend-tz +XX" where +XX is the local system
timezone offset.
Used in PREPEND_DT_FORMAT.
-d, --prepend-dt-format
Prepend a DateTime using the strftime format string.
If PREPEND_TZ is set then that value is used for any timezone offsets,
i.e. strftime "%z" "%:z" "%Z" values, otherwise the timezone offset value
is the local system timezone offset.
[Default: %Y%m%dT%H%M%S%.3f%z]
-n, --prepend-filename
Prepend file basename to every line.
-p, --prepend-filepath
Prepend file full path to every line.
-w, --prepend-file-align
Align column widths of prepended data.
--prepend-separator
Separator string for prepended data.
[default: :]
--separator
An extra separator string between printed log messages.
Per log message not per line of text.
Accepts a basic set of backslash escape sequences,
e.g. "\0" for the null character, "\t" for tab, etc.
--journal-output
The format for .journal file log messages.
Matches journalctl --output options.
[default: short]
[possible values: short, short-precise, short-iso, short-iso-precise,
short-full, short-monotonic, short-unix, verbose, export, cat]
-c, --color
Choose to print to terminal using colors.
[default: auto]
[possible values: always, auto, never]
--blocksz
Read blocks of this size in bytes.
May pass value as any radix (hexadecimal, decimal, octal, binary).
Using the default value is recommended.
Most useful for developers.
[default: 65536]
-s, --summary
Print a summary of files processed to stderr.
Most useful for developers.
-h, --help
Print help
-V, --version
Print versionGiven a file path, the file format will be processed based on a best guess of
the file name.
If the file format is not guessed then it will be treated as a UTF8 text file.
Given a directory path, found file names that have well-known non-log file name
extensions will be skipped.DateTime Filters may be strftime specifier patterns:
"%Y%m%dT%H%M%S*"
"%Y-%m-%d %H:%M:%S*"
"%Y-%m-%dT%H:%M:%S*"
"%Y/%m/%d %H:%M:%S*"
"%Y%m%d"
"%Y-%m-%d"
"%Y/%m/%d"
"+%s"
Each * is an optional trailing 3-digit fractional sub-seconds,
or 6-digit fractional sub-seconds, and/or timezone.Pattern "+%s" is Unix epoch timestamp in seconds with a preceding "+".
For example, value "+946684800" is be January 1, 2000 at 00:00, GMT.DateTime Filters may be custom relative offset patterns:
"+DwDdDhDmDs" or "-DwDdDhDmDs"
"@+DwDdDhDmDs" or "@-DwDdDhDmDs"Custom relative offset pattern "+DwDdDhDmDs" and "-DwDdDhDmDs" is the offset
from now (program start time) where "D" is a decimal number.
Each lowercase identifier is an offset duration:
"w" is weeks, "d" is days, "h" is hours, "m" is minutes, "s" is seconds.
For example, value "-1w22h" is one week and twenty-two hours in the past.
Value "+30s" is thirty seconds in the future.Custom relative offset pattern "@+DwDdDhDmDs" and "@-DwDdDhDmDs" is relative
offset from the other datetime.
Arguments "-a 20220102 -b @+1d" are equivalent to "-a 20220102 -b 20220103".
Arguments "-a @-6h -b 20220101T120000" are equivalent to
"-a 20220101T060000 -b 20220101T120000".Without a timezone, the Datetime Filter is presumed to be the local
system timezone.Command-line passed timezones may be numeric timezone offsets,
e.g. "+09:00", "+0900", or "+09", or named timezone offsets, e.g. "JST".
Ambiguous named timezones will be rejected, e.g. "SST".--prepend-tz and --dt-offset function independently:
--dt-offset is used to interpret processed log message datetime stamps that
do not have a timezone offset.
--prepend-tz affects what is pre-printed before each printed log message line.--separator accepts backslash escape sequences:
"\0", "\a", "\b", "\e", "\f", "\n", "\r", "\\", "\t", "\v"Resolved values of "--dt-after" and "--dt-before" can be reviewed in
the "--summary" output.s4 uses file naming to determine the file type.
s4 can process files compressed and named .bz2, .gz, .lz4, .xz, and files
archived within a .tar file.Log messages from different files with the same datetime are printed in order
of the arguments from the command-line.Datetimes printed for .journal file log messages may differ from datetimes
printed by program journalctl.
See Issue #101DateTime strftime specifiers are described at
https://docs.rs/chrono/latest/chrono/format/strftime/DateTimes supported are only of the Gregorian calendar.
DateTimes supported language is English.
Further background and examples are at the project website:
https://github.com/jtmoon79/super-speedy-syslog-searcher/Is s4 failing to parse a log file? Report an Issue at
https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/new/choose
```---
## About
### Why `s4`?
_Super Speedy Syslog Searcher_ (`s4`) is meant to aid Engineers in reviewing
varying log files in a datetime-sorted manner.
The primary use-case is to aid investigating problems wherein the time of
a problem occurrence is known and there are many available logs
but otherwise there is little source evidence.Currently, log file formats vary widely. _Most_ logs are an ad-hoc format.
Even separate log files on the same system for the same service may have
different message formats!
Sorting these logged messages by datetime may be prohibitively difficult.
The result is an engineer may have to "hunt and peck" among many log files,
looking for problem clues around some datetime; so tedious!Enter _Super Speedy Syslog Searcher_ π¦Έ βΌ
`s4` will print log messages from multiple log files in datetime-sorted order.
A "window" of datetimes may be passed, to constrain the period of printed
messages. This will assist an engineer that, for example, needs to view all
log messages that occurred two days ago between 12:00 and 12:05 among log files taken from multiple
systems.The ulterior motive for _Super Speedy Syslog Searcher_ was the [primary
developer](https://github.com/jtmoon79) wanted an excuse to learn rust π¦,
and wanted to create an open-source tool for a recurring need of some
Software Test Engineers πSee the real-world example rationale in the section below,
[_logging chaos: the problem `s4` solves_].[_logging chaos: the problem `s4` solves_]: #logging-chaos-the-problem-s4-solves
### Features
- Parses:
- Ad-hoc log messages using formal datetime formats:
- [Internet Message Format (RFC 2822)]
e.g. _Wed, 1 Jan 2020 22:00:00 PST messageβ¦_
- [The BSD syslog Protocol (RFC 3164)]
e.g. _\<8\>Jan 1 22:00:00 messageβ¦_
- [Date and Time on the Internet: Timestamps (RFC 3339)]
e.g. _2020-01-01T22:00:00-08:00 messageβ¦_
- [The Syslog Protocol (RFC 5424)]
e.g. _2020-01-01T22:00:00-08:00 messageβ¦_
- [ISO 8601]
e.g. _2020-01-01T22:00:00-08:00 messageβ¦_, _20200101T220000-0800 messageβ¦_, etc. \[1\]
- [Red Hat Audit Log] files
- [strace] output files with options `-ttt` or `--timestamps`,
i.e. Unix epoch plus optional milliseconds, microseconds, or nanoseconds
- binary user accounting records files
([`acct`, `pacct`], [`lastlog`], [`utmp`, `utmpx`])
from multiple Operating Systems and CPU architectures
- binary [Windows Event Log] files
- binary [systemd journal] files with printing options matching [`journalctl`]
- many varying text log messages with ad-hoc datetime formats
- multi-line log messages
- Inspects `.tar` archive files for parseable log files \[2\]
- Can process `.bz2`, `.gz`, `.lz4`, or `.xz` containing log files.
- Tested against "in the wild" log files from varying sources
(see project path [`./logs/`])
- Prepends datetime and file paths, for easy programmatic parsing or
visual traversal of varying log messages
- [Comparable speed as GNU `grep` and `sort`](#speed-comparison)
- Processes invalid UTF-8
- Accepts arbitrarily large files see _Hacks_[`acct`, `pacct`]: https://www.man7.org/linux/man-pages/man5/acct.5.html
[`lastlog`]: https://man.netbsd.org/lastlog.5
[`utmp`, `utmpx`]: https://en.wikipedia.org/w/index.php?title=Utmp&oldid=1143684808#utmpx,_wtmpx_and_btmpx
[Internet Message Format (RFC 2822)]: https://www.rfc-editor.org/rfc/rfc2822#section-3.3
[The BSD syslog Protocol (RFC 3164)]: https://www.rfc-editor.org/rfc/rfc3164#section-4.1.2
[Date and Time on the Internet: Timestamps (RFC 3339)]: https://www.rfc-editor.org/rfc/rfc3339#section-5.8
[The Syslog Protocol (RFC 5424)]: https://www.rfc-editor.org/rfc/rfc5424#section-6.2.3
[ISO 8601]: https://en.wikipedia.org/w/index.php?title=ISO_8601&oldid=1113067353#General_principles
[Red Hat Audit Log]: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/security_guide/sec-understanding_audit_log_files
[strace]: https://www.man7.org/linux/man-pages/man1/strace.1.html
[Windows Event Log]: https://learn.microsoft.com/en-us/windows/win32/wes/windows-event-log
[systemd journal]: https://systemd.io/JOURNAL_FILE_FORMAT/
[`journalctl`]: https://www.man7.org/linux/man-pages/man1/journalctl.1.html
[`./logs/`]: https://github.com/jtmoon79/super-speedy-syslog-searcher/tree/main/logs#### File name guessing
Given a file path, `s4` will attempt to parse it. The type of file must be in
the name. Guesses are made about files with non-standard names.For example, standard file name `utmp` will always be treated as a `utmp` record
file. But non-standard name `log.utmp.1` is guessed to be a `utmp` record file.
Similar guesses are applied to `lastlog`, `wtmp`, `acct`, `pacct`,
`journal`, and `evtx` files.
When combined with compression or archive file name extensions,
e.g. `.bz2`, `.gz`, `.lz4`, or `.xz`, then `s4` makes a best attempt at
guessing the compression or archive type and the file within the archive based
on the name.
For example, `user.journal.gz` is guessed to be a systemd journal file within a
gzip compressed file. However, if that same file is named something unusual like
`user.systemd-journal.gz` then it is guessed to be a text log file within a gzip
compressed file.When a file type cannot be guessed then it is treated as a UTF8 text log file.
For example, a file name just `unknown` is not any obvious type so it is attempted
to be parsed as a UTF8 ad-hoc text log file.`tar` files are inspected for parseable files.\[2\]
#### Directory walks
Given a directory path, `s4` will walk the directory and all subdirectories and
follow symbolic links and cross file system paths.
`s4` will ignore files with extensions that are known to be non-log files.
For example, files with extensions `.dll`, `.mp3`, `.png`, or `.so`, are
unlikely to be log files and so are not processed.So given a file `/tmp/file.mp3`, an invocation of `s4 /tmp` will not attempt
to process `file.mp3`. An invocation of `s4 /tmp/file.mp3` will attempt to
process `file.mp3`. It will be treated as a UTF8 text log file.### Limitations
- Only processes UTF-8 or ASCII encoded syslog files. ([Issue #16])
- Cannot process multi-file `.gz` files (only processes first stream found).
([Issue #8])
- Cannot process multi-file `.xz` files (only processes first stream found).
([Issue #11])
- Cannot process `.zip` archives ([Issue #39])
- \[1\] ISO 8601
- ISO 8601 forms recognized (using [ISO descriptive format])
- `YYYY-MM-DDThh:mm:ss`
- `YYYY-MM-DDThhmmss`
- `YYYYMMDDThhmmss`
(may use date-time separator character `'T'` or character blank space `' '`)
- ISO 8601 forms not recognized:
- Absent seconds
- [_Ordinal dates_], i.e. "day of the year", format `YYYY-DDD`, e.g. `"2022-321"`
- [_Week dates_], i.e. "week-numbering year", format `YYYY-Www-D`, e.g. `"2022-W25-1"`
- times [without minutes and seconds] (i.e. only `hh`)
- \[2\] Cannot process archive files or compressed files within
other archive files or compressed files ([Issue #14])
e.g. cannot process `logs.tar.xz`, nor file `log.gz` within `logs.tar`[Issue #8]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/8
[Issue #11]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/11
[Issue #14]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/14
[Issue #12]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/12
[Issue #39]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/39
[Issue #86]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/86
[ISO descriptive format]: https://en.wikipedia.org/w/index.php?title=ISO_8601&oldid=1114310323#Calendar_dates
[_Ordinal dates_]: https://en.wikipedia.org/w/index.php?title=ISO_8601&oldid=1114310323#Ordinal_dates
[_Week dates_]: https://en.wikipedia.org/w/index.php?title=ISO_8601&oldid=1114310323#Week_dates
[without minutes and seconds]: https://en.wikipedia.org/w/index.php?title=ISO_8601&oldid=1114310323#Times### Hacks
- Entire `.bz2` files are read once before processing ([Issue #300])
- Entire `.lz4` files are read once before processing ([Issue #293])
- Entire `.xz` files are read into memory before printing ([Issue #12])
- Entire `.evtx` files are read into memory before printing ([Issue #86])
- Entire files within a `.tar` file are read into memory before printing ([Issue #13])
- Entire [user accounting record files are read into memory] before printing
- Compressed `.journal` and `.evtx` files are extracted to a temporary file ([Issue #284])[user accounting record files are read into memory]: https://docs.rs/super_speedy_syslog_searcher/0.6.70/s4lib/readers/fixedstructreader/struct.FixedStructReader.html#summary-of-operation
[Issue #13]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/13
[Issue #284]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/284
[Issue #293]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/293
[Issue #300]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/300
---
## More
### Comparisons
An overview of features of varying log mergers including GNU tools.
- GNU `grep` piped to GNU `sort`
- _Super Speedy Syslog Searcher_; `s4`
- [_lnav_](https://github.com/tstack/lnav); `lnav`
- [_logmerger_](https://github.com/ptmcg/logmerger); `logmerger`
- [_Toolong_](https://github.com/Textualize/toolong); `tl`
- [_logdissect_](https://github.com/dogoncouch/logdissect); `logdissect.py`|Symbol| |
|- |-|
|β |_Yes_ |
|⬀ |_Most_ |
|β |_Some_ |
|β |_No_ |
|β |_with an accompanying GNU program_ |
|! |_with user input_ |
|βΌ |_with complex user input_ |---
#### General Features
|Program |Source|CLI|TUI|Interactive|live tail|merge varying log formats|datetime search range|
|- |- |- |- |- |- |- |- |
|`grep \| sort` |C |β |β |β |β `tail`|β |βΌ|
|`s4` |Rust |β |β |β |β |β |β|
|`lnav` |C++ |β |β |β |β |β |βΌ|
|`logmerger` |Python|β |β |β |β |βΌ |β|
|`tl` |Python|β |β |β |β |β |β|
|`logdissect.py`|Python|β |β |β |β |β |β|---
#### Formal Log DateTime Supported
|Program |RFC 2822|RFC 3164|RFC 3339|RFC 5424|ISO 8601|
|- |- |- |- |- |- |
|`grep \| sort` |β |βΌ |! |! |! |
|`s4` |β |β |β |β |β |
|`lnav` |βΌ |β |β |β |β |
|`logmerger` |β |β |! |! |β |
|`tl` |β |β |β |β |β |β’ [RFC 2822]: _Internet Message Format: Date and Time Specification_; e.g. _Wed, 1 Jan 2020 22:00:00 PST messageβ¦_
β’ [RFC 3164]: _The BSD syslog Protocol: HEADER Part of a syslog Packet_; e.g. _\<8\>Jan 1 22:00:00 messageβ¦_
β’ [RFC 3339]: _Date and Time on the Internet: Internet Date/Time Format_; e.g. e.g. _2020-01-01T22:00:00-08:00 messageβ¦_
β’ [RFC 5424]: _The Syslog Protocol: TIMESTAMP_; e.g. _2020-01-01T22:00:00-08:00 messageβ¦_
β’ [ISO 8601]: _Data elements and interchange formats β Information interchange β Representation of dates and times_; e.g. _2020-01-01T22:00:00-08:00 messageβ¦_, _20200101T220000-0800 messageβ¦_, etc.
[RFC 2822]: https://www.rfc-editor.org/rfc/rfc2822#section-3.3
[RFC 3164]: https://www.rfc-editor.org/rfc/rfc3164#section-4.1.2
[RFC 3339]: https://www.rfc-editor.org/rfc/rfc3339#section-5.8
[RFC 5424]: https://www.rfc-editor.org/rfc/rfc5424#section-6.2.3---
#### Other Log or File Formats Supported
Binary formats supported:
|Program |journal|`acct`/`lastlog`/`utmp`|`.evtx`|`.pcap`/`.pcapng`|`.jsonl`|
|- |- |- |- |- |- |
|`grep \| sort` |β |β |β |β |β |
|`s4` |β |β |β |[β](https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/255)|β |
|`lnav` |β |β |β |β |β |
|`logmerger` |β |β |β |β |β |
|`tl` |β |β |β |β |β |Ad-hoc text formats:
|Program |Ad-hoc text formats|Red Hat Audit Log|strace|Apache Common Log Format|
|- |- |- |- |- |
|`grep \| sort` |βΌ |! |β |βΌ |
|`s4` |β |β |β |β |
|`lnav` |βΌ |βΌ |βΌ |β |
|`logmerger` |βΌ |βΌ |β |βΌ |
|`tl` |β |β |β |β |All programs besides `s4` fail to merge different text log formats.
---
#### Archive Formats Supported
|Program |`.gz` |`.lz` |`.lz4` |`.bz` |`.bz2` |`.xz` |`.tar`|`.zip`|
|- |- |- |- |- |- |- |- |- |
|`grep \| sort` |β `zgrep`|β `lz`|β `lz4` |β `bzip`|β `bzip2` |β `xz` |β |β |
|`s4` |β |β |β |β |β |β |β |[β](https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/39)|
|`lnav` |β |β |β |? |β |β |β |β |
|`logmerger` |β |β |β |β |β |β |β |β |
|`tl` |β |β |β |β |β |β |β |β |
|`logdissect.py`|β |β |β |β |β |β |β |β |---
#### Speed Comparison
A comparison of merging three large log files running on Ubuntu 22 on WSL2.
The three log files have 5000 lines, 2158138 bytes (β2.1 MB) each, with high-plane unicode.
Each program had 30 runs except `toolong`.|Command |Mean (ms) |Min (ms) |Max (ms) |Max RSS (KB)|CPU %|
|:--- |---: |---: |---: |---: |---: |
|`grep \| sort` |16.5 Β± 0.6 |15.7 |18.6 |5512 |41% |
|`s4 (system)` |37.0 Β± 1.8 |34.3 |40.9 |48060 |182% |
|`s4 (jemalloc)`|37.2 Β± 2.0 |33.9 |43.0 |71536 |165% |
|`s4 (mimalloc)`|32.0 Β± 2.1 |27.4 |36.1 |75776 |182% |
|`lnav` |155.9 Β± 1.8 |153.0 |162.7 |37320 |94% |
|`logmerger` |779.3 Β± 10.4 |760.3 |803.2 |55288 |99% |
|`toolong` | | | |53208 |40% |β’ _Mean_ is mean runtime in milliseconds
β’ _Min_ is minimum runtime in milliseconds
β’ _Max_ is maximum runtime in milliseconds
β’ _Max RSS_ is maximum Resident Set Size in Kilobytes
β’ _CPU %_ is an average of CPU used over the runtimePrograms tested:
- GNU `grep` 3.7, GNU `sort` 8.32
- `s4` 0.7.75
- `logmerger` 0.9.0 on Python 3.10.12
- `tl` 1.5.0 on Python 3.10.12Using `hyperfine` to measure timing and GNU `time` to measure RSS and CPU.
See directory results in [`compare-log-mergers.txt`].
[`compare-log-mergers.txt`]: https://github.com/jtmoon79/super-speedy-syslog-searcher/tree/0.7.75/releases/0.7.75
---
### Building locally
See section [_Install `super_speedy_syslog_searcher`_].
[_Install `super_speedy_syslog_searcher`_]: #install-super_speedy_syslog_searcher
### Parsing `.journal` files
Requires `libsystemd` to be installed to use `libsystemd.so` at runtime.
### Requesting Support For DateTime Formats; your particular log file
If you have found a log file that _Super Speedy Syslog Searcher_ does not parse
then you may create a [new Issue type _Feature request (datetime format)_].Here is [an example user-submitted Issue].
[new Issue type _Feature request (datetime format)_]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/new/choose
[an example user-submitted Issue]: https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/81### "syslog" and other project definitions
#### syslog
In this project, the term "_syslog_" is used generously to refer to any
log message that has a datetime stamp on the first line of log text.Technically, "_syslog_" is [defined among several RFCs]
proscribing fields, formats, lengths, and other technical constraints.
In this project, the term "_syslog_" is interchanged with "_log_".The term "_sysline_" refers to a one log message which may comprise
multiple text lines.See [docs section _Definitions of data_] for more project definitions.
[defined among several RFCs]: https://en.wikipedia.org/w/index.php?title=Syslog&oldid=1219545533#Internet_standard_documents
[docs section _Definitions of data_]: https://docs.rs/super_speedy_syslog_searcher/latest/s4lib/data/index.html#### log message
A "log message" is a single log entry for any type of logging scheme;
an entry in a utmpx file, an entry in a systemd journal, an entry in a
Windows Event Log, a formal RFC 5424 syslog message, or an ad-hoc log message.---
## logging chaos: the problem `s4` solves
In practice, most log file formats are an ad-hoc format. And among formally
defined log formats, there are many variations. The result is merging varying
log messages by datetime is prohibitively tedious.
If an engineer is investigating a problem that is symptomatic among many log
files then the engineer must "hunt and peck" among those many log files.
Log files can not be merged for a single coherent view.The following real-world example log files are available in project directory
`./logs`.### open-source software examples
#### nginx webserver
For example, the open-source nginx web server
[logs access attempts in an ad-hoc format] in the file `access.log````text
192.168.0.115 - - [08/Oct/2022:22:26:35 +0000] "GET /DOES-NOT-EXIST HTTP/1.1" 404 0 "-" "curl/7.76.1" "-"
```which is an entirely dissimilar log format to the neighboring nginx log file,
`error.log````text
2022/10/08 22:26:35 [error] 6068#6068: *3 open() "/usr/share/nginx/html/DOES-NOT-EXIST" failed (2: No such file or directory), client: 192.168.0.115, server: _, request: "GET /DOES-NOT-EXIST HTTP/1.0", host: "192.168.0.100"
```nginx is following the bad example set by the apache web server.
#### Debian 11
Here are log snippets from a Debian 11 host.
file `/var/log/alternatives.log`
```text
update-alternatives 2022-10-10 23:59:47: run with --quiet --remove rcp /usr/bin/ssh
```file `/var/log/dpkg.log`
```text
2022-10-10 15:15:02 upgrade gpgv:amd64 2.2.27-2 2.2.27-2+deb11u1
```file `/var/log/kern.log`
```text
Oct 10 23:07:16 debian11-b kernel: [ 0.10034] Linux version 5.10.0-11-amd64
```file `/var/log/unattended-upgrades/unattended-upgrades-shutdown.log`
```text
2022-10-10 23:07:16,775 WARNING - Unable to monitor PrepareForShutdown() signal, polling instead.
```#### binary files
And then there are binary files, such as the `wtmp` file on Linux and other
Unix Operating Systems.
Using tool `utmpdump`, a `utmp` record structure is converted to text like:```text
[7] [12103] [ts/0] [user] [pts/0] [172.1.2.1] [172.1.2.2] [2023-03-05T23:12:36,270185+00:00]
```And from a _systemd_ `.journal` file, read using `journalctl`
```text
Mar 03 10:26:10 host systemd[1]: Started OpenBSD Secure Shell server.
ββ Subject: A start job for unit ssh.service has finished successfully
ββ Defined-By: systemd
ββ Support: http://www.ubuntu.com/support
ββ
ββ A start job for unit ssh.service has finished successfully.
ββ
ββ The job identifier is 120.
Mar 03 10:31:23 host sshd[4559]: Accepted login for user1 from 172.1.2.1 port 51730 ssh2
```Try merging those two log messages by datetime using GNU `grep`, `sort`, `sed`,
or `awk`!Additionally, if the `wtmp` file is from a different architecture
or Operating System, then the binary record structure is likely not parseable
by the resident `utmpdump` tool. What then!?### commercial software examples
Commercial software and computer hardware vendors nearly always use
ad-hoc log message formatting that is even more unpredictable among each log
file on the same system.#### Synology DiskStation
Here are log file snippets from a Synology DiskStation host.
file `DownloadStation.log`
```text
2019/06/23 21:13:34 (system) trigger DownloadStation 3.8.13-3519 Begin start-stop-status start
```file `sfdisk.log`
```text
2019-04-06T01:07:40-07:00 dsnet sfdisk: Device /dev/sdq change partition.
```file `synobackup.log`
```text
info 2018/02/24 02:30:04 SYSTEM: [Local][Backup Task Backup1] Backup task started.
```(yes, those are tab characters)
#### Mac OS 12
Here are log file snippets from a Mac OS 12.6 host.
file `/var/log/system`
```text
Oct 11 15:04:55 localhost syslogd[110]: Configuration Notice:
ASL Module "com.apple.cdscheduler" claims selected messages.
Those messages may not appear in standard system log files or in the ASL database.
```file `/var/log/wifi`
```text
Thu Sep 21 23:05:35.850 Usb Host Notification NOT activated
```file `/var/log/fsck_hs.log`
```text
/dev/rdisk2s2: fsck_hfs started at Thu Sep 21 21:31:05 2023
QUICKCHECK ONLY; FILESYSTEM CLEAN
```file `/var/log/anka.log`
```text
Fri Sep 22 00:06:05 UTC 2023: Checking /Library/Developer/CoreSimulator/Volumes/watchOS_20S75...
```file `/var/log/displaypolicyd.log`
```text
2023-09-15 04:26:56.330256-0700: Started at Fri Sep 15 04:26:56 2023
```file `/var/log/com.apple.xpc.launchd/launchd.log.1`
```text
2023-10-26 16:56:23.287770 : swap enabled
```file `/var/log/asl/logs/aslmanager.20231026T170200+00`
```text
Oct 26 17:02:00: aslmanager starting
```Did you also notice how the log file _names_ differ in unexpected ways?
#### Microsoft Windows 10
Here are log snippets from a Windows 10 host.
file `${env:SystemRoot}\debug\mrt.log`
```text
Microsoft Windows Malicious Software Removal Tool v5.83, (build 5.83.13532.1)
Started On Thu Sep 10 10:08:35 2020
```file `${env:SystemRoot}\comsetup.log`
```text
COM+[12:24:34]: ********************************************************************************
COM+[12:24:34]: Setup started - [DATE:05,27,2020 TIME: 12:24 pm]
```file `${env:SystemRoot}\DirectX.log`
```text
11/01/19 20:03:40: infinst: Installed file C:\WINDOWS\system32\xactengine2_1.dll
```file `${env:SystemRoot}/Microsoft.NET/Framework/v4.0.30319/ngen.log`
```text
09/15/2022 14:13:22.951 [515]: 1>Warning: System.IO.FileNotFoundException: Could not load file or assembly
```file `${env:SystemRoot}/Performance/WinSAT/winsat.log`
```text
68902359 (21103) - exe\logging.cpp:0841: --- START 2022\5\17 14:26:09 PM ---
68902359 (21103) - exe\main.cpp:4363: WinSAT registry node is created or present
```(yes, it reads hour `14`, and `PM`β¦ π)
### Summary
This chaotic logging approach is typical of commercial and open-source software,
__*AND IT'S A MESS!*__
Attempting to merge log messages by their natural sort mechanism,
a datetime stamp, is difficult to impossible.Hence the need for _Super Speedy Syslog Searcher_! π¦Έ
`s4` merges varying log files into a single coherent datetime-sorted log.[logs access attempts in an ad-hoc format]: https://docs.nginx.com/nginx/admin-guide/monitoring/logging/#setting-up-the-access-log
---
## Further Reading
- [`CHANGELOG.md`]
- [`Extended-Thoughts.md`][`CHANGELOG.md`]: ./CHANGELOG.md
[`Extended-Thoughts.md`]: ./Extended-Thoughts.md## Stargazers
[![Stargazers over time](https://starchart.cc/jtmoon79/super-speedy-syslog-searcher.svg?variant=adaptive)](https://starchart.cc/jtmoon79/super-speedy-syslog-searcher)
---