Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mrbaseman/parse_yaml

a simple yaml parser implemented in bash
https://github.com/mrbaseman/parse_yaml

Last synced: 1 day ago
JSON representation

a simple yaml parser implemented in bash

Awesome Lists containing this project

README

        

# parse_yaml
a simple yaml parser implemented in bash

`parse_yaml` provides a bash function that allows parsing simple YAML files.
The output is shell code that defines shell variables which contain the parsed values.
`bash` doesn't support multidimensional arrays. Therefore a separate variable is created for each value, and the name of the variable consists of the names of all levels in the yaml file, glued together with a separator character which defaults to `_`.

## Usage
first source the script that defines `parse_yaml`
```
source parse_yaml.sh
```
then, you can parse yaml files and assign shell variables
```
eval $(parse_yaml sample.yml)
```
or postprocess the output by other shell scripts or tools that take their input from stdin
```
parse_yaml sample.yml | some_script
```
also, you can load defaults from one yaml file and overwrite the values with the ones of a specific file
```
eval $(parse_yaml defaults.yml)
eval $(parse_yaml sample.yml)
```
a prefix can be supplied as second argument. This prefix may also be an empty string, which allows you to supply a third argument which changes the separator string (e.g. from underscore to dash):
```
eval $(parse_yaml sample.yml "" "-")
```

## A simple example input file:
```
---
global:
input:
- "main.c"
- "main.h"
flags: [ "-O3", "-fpic" ]
sample_input:
- { property1: value1, property2: value2 }
- { property1: "value 3", property2: 'value 4' }
licence: |
this is published under
open source license
in the hope that it would
be useful
...
```
and here the parsed output:
```
global_input_1="main.c"
global_input_2="main.h"
global_flags_1="-O3"
global_flags_2="-fpic"
global_sample_input_1_property1="value1"
global_sample_input_1_property2="value2"
global_sample_input_2_property1="value 3"
global_sample_input_2_property2="value 4"
global_licence="this is published under\nopen source license\nin the hope that it would \nbe useful\n"
__=" global"
global_=" global_input global_flags global_sample_input global_licence"
global_flags_=" global_flags_1 global_flags_2"
global_input_=" global_input_1 global_input_2"
global_sample_input_=" global_sample_input_1 global_sample_input_2"
global_sample_input_1_=" global_sample_input_1_property1 global_sample_input_1_property2"
global_sample_input_2_=" global_sample_input_2_property1 global_sample_input_2_property2"
```
Apart from the values themselves, there are also lists of the variable names that live below each level (for instance `$global_flags_` contains two variable names, `global_flags_1` and `global_flags_2`. These names can be used to iterate over all members of `$global_flags`:
```
for f in $global_flags_ ; do eval echo \$f \$${f} ; done
```
produces the following output
```
global_flags_1 -O3
global_flags_2 -fpic
```

For more examples see the sample.yml file included in the src directory.

## Features
The following yaml features are currently supported:

* comments (`# comment`)
* dictionaries, mappings or collections (`key: value`) with indentation to denote the level
* short notation of ditctionaries (` dict: { key: value, ... }`)
* lists or sequences (`- entry`) with indentation to denote the level
* short notation of lists (`list: [ value, ... ]`)
* unordered lists or sometimes called sets (`? entry`)
* keys may be single words (i.e. containing only alphanumeric characters)
* values (strings) can be enclosed in single or double quotes
* multiline values (`multiline: | ...`) where the following lines are indented one level deeper than the key
* wrapped content (`wrapped: > ...`) where line breaks are converted to spaces and empty lines to newlines
* plain and quoted multiline flow scalars are supported ( `key: ...` where `...` is a quoted or unquoted string that may span multiple lines).
* anchors (`&anchor`) and references to them (`*anchor`) are supported, to some extend even in a nested way

## Known limitations

* special characters are interpreted by the bash. Backticks `\`...\`` and expressions starting with `$` which trigger command substitution or parameter expansion may cause unwanted effects - use with caution!
* directives and document boundaries (`---`, `...`) are simply ignored
* the parsed data is put into shell variables and thus multidimensional arrays can not be used. For each value on each level a separate shell variable is defined.
* comments may not be correctly filtered out if quotes are used on the same line inside and outside of the comment
* yaml tags (`!tag`) and types (`!!type`), especially `!!binary` are not supported.
* complex mapping keys (e.g. sequences as an index of a mapping) are not supported.
* keys in mappings are assumed to be words (consisting of alphanumerical characters), dots are replaced by the separator character, but other special characters are not supported.
* unordered lists are converted to ordered lists for simplicity
* strings enclosed in quotes should work, but when double- and single quotes are nested in a too complex manner, the regex used for parsing might not correctly capture the value
* multiple quotes inside a string are not correctly "unfolded". Single quotes in a string should be preserved such that they are properly escaped in the variable assignment. There might be problems with quotes masked by backslash in a quoted string.
* plain and quoted multi-line flow scalars produce output for each line to be appended
* anchors are not fully dereferenced twice, i.e. when an anchor is defined and it contains references to other anchors, those are dereferenced when the anchor is processed. If those anchors are re-defined later on, and the main anchor that contains the references on the re-defined anchors, is later dereferenced, it still contains the outdated values.
* if a quoted string starts with a `'*'` character and an anchor exists which is denoted by the following characters in the string, this is currently treated as a dereference, even if the string is enclosed in single quotes

## credits:
this work is based on Stefan Farestam's [answer on stackoverflow](https://stackoverflow.com/questions/5014632/how-can-i-parse-a-yaml-file-from-a-linux-shell-script)