Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/darule0/yarndiff
A rudimentary command line utility for contrasting Apache Yarn container logs.
https://github.com/darule0/yarndiff
diff difference diffing hadoop hadoop-mapreduce hive log4j mapreduce pig spark yarn yarn2
Last synced: 10 days ago
JSON representation
A rudimentary command line utility for contrasting Apache Yarn container logs.
- Host: GitHub
- URL: https://github.com/darule0/yarndiff
- Owner: darule0
- License: apache-2.0
- Created: 2021-11-02T22:21:36.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-01-08T19:06:55.000Z (10 months ago)
- Last Synced: 2024-10-10T20:40:52.656Z (about 1 month ago)
- Topics: diff, difference, diffing, hadoop, hadoop-mapreduce, hive, log4j, mapreduce, pig, spark, yarn, yarn2
- Language: Shell
- Homepage:
- Size: 59.6 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# yarndiff
A rudimentary command line utility for contrasting Apache Yarn container logs.## Motivation
I have been troubleshooting Apache Yarn application issues full-time since around 2015. When a yarn application slows down or stops working, I try to find out more information such as: Were there any new errors or other kinds of log messages in the container logs which were not there before?Yarn logs from two runs of the same application cannot be contrasted using a general purpose diff tool as there would be thousands of changes detected which are not useful for troubleshooting.
I have decided to automate this part my job function in the form of a bash script which examines yarn logs and identifies differences which I find useful when troubleshooting yarn application performance and functionality problems.
## Description
yarndiff is a Linux command line utility which contrasts yarn logs from two runs of a yarn application and displays a sample log entry for each kind of log entry that is found to be unique to either of the log files.For example, if a yarn application has been running without problems for years and then suddenly slows down or stops working, then I will pass in the container logs from both a known working run as well as the container logs from the run which had problems. With a little luck, the yarndiff output helps guide me towards the root cause and solution.
## Online Installation w/ CI
```consolemkdir ~/bin
chmod u+rx ~/bin
wget -O ~/bin/yarndiff https://github.com/darule0/yarndiff/blob/main/yarndiff?raw=true
chmod u+rx ~/bin/yarndiff
source ~/.profile```
## Offline Installation w/o CI
```consolesudo mkdir /opt/yarndiff
sudo chmod o+rx /opt/yarndiff
sudo git clone https://github.com/darule0/yarndiff.git /opt/yarndiff
sudo chmod o+rx /opt/yarndiff/yarndiff.sh
sudo ln -s /opt/yarndiff/yarndiff.sh /usr/bin/yarndiff```
## How to obtain container logs for a yarn application run?
The logs switch on the yarn command can be used to obtain container logs for a recently run yarn application.
```console
yarn logs -applicationId APP_ID > APP_ID_yarn.log
```## Tutorial
```console# install yarndiff w/ CI
mkdir ~/bin
chmod u+rx ~/bin
wget -O ~/bin/yarndiff https://github.com/darule0/yarndiff/blob/main/yarndiff?raw=true
chmod u+rx ~/bin/yarndiff# display yarndiff usage
yarndiff# contrast container logs from a two runs of the same yarn application
yarndiff container_log1 container_log2```
![alt text](https://raw.githubusercontent.com/darule0/yarndiff/main/yarndiff.png)
## Directories Used
| directory | purpose |
| :--- | :--- |
| $HOME/.yarndiff.dd4b66ed-a43d-48ec-8e32-1b901bc8ea8e | The latest yarndiff is automatically downlaoded here when Online Installation w/ CI. |
| $HOME/.yarndiff | Intermediate data for yarndiff processing. |## Container Log Parsing Logic
| special entry | purpose |
| :--- | :--- |
| container.log.file | name of input log4j file |
| container.count | number of lines which begin with the word "Container: "|| pseudo code: regular expression list generation derived from each container log file |
| :--- |
```console
for each line in a container log
keep only lines which contain CRIT or ERROR or WARN or INFO or DEBUG or TRACE
remove lines which contain the phrase "has been replaced by"
replace all of the following characters with a the wildcard character '.'
0123456789^$*+-?()[]{}|—/\\
keep only the first 80 characters
sort the regular expressions
remove any duplicate regular expressions
```
| pseudo code: contrast and use both regular expressions lists |
| :--- |
```console
diff regular expression lists
for both regular expression lists
obtain sample from applicable container log
obtain number of matches from applicable container log
```