https://github.com/theasp/parseit

Parseit - Parseit is command line tool to parse data using EBNF or ABNF using the excellent Instaparse library, and serializing the result into JSON, EDN, YAML or Transit format
https://github.com/theasp/parseit

abnf ebnf instaparse

Last synced: 2 months ago
JSON representation

Parseit - Parseit is command line tool to parse data using EBNF or ABNF using the excellent Instaparse library, and serializing the result into JSON, EDN, YAML or Transit format

Host: GitHub
URL: https://github.com/theasp/parseit
Owner: theasp
License: apache-2.0
Created: 2019-12-20T20:23:16.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-12-05T02:42:48.000Z (over 2 years ago)
Last Synced: 2025-04-03T08:51:25.991Z (3 months ago)
Topics: abnf, ebnf, instaparse
Language: Clojure
Size: 376 KB
Stars: 14
Watchers: 2
Forks: 0
Open Issues: 5
Metadata Files:
- Readme: README.org
- License: LICENSE

Awesome Lists containing this project

README

        #+TITLE: Parseit

* Introduction

Parseit is command line tool to parse data using [[https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form][EBNF]] or [[https://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_form][ABNF]] using the excellent [[https://github.com/Engelberg/instaparse][Instaparse]] library, and serializing the result into [[https://www.json.org/json-en.html][JSON]], [[https://github.com/edn-format/edn][EDN]], [[https://yaml.org/][YAML]] or [[https://github.com/cognitect/transit-format][Transit]] format.

* Usage

#+begin_example

Usage: parseit [options]  [input]

       parseit [options] --preset  [input]

       parseit --help

Options

  -p, --preset PRESET           none         Preset grammar to use

  -f, --format FORMAT           json-pretty  Select the output format

  -S, --start RULE                           Start processing at this rule

  -t, --transform RULE:TYPE[+]               Transform a rule into a type, and keep wrapped with + suffix

  -T, --no-standard-tx                       Do use the standard transformations

  -a, --all                                  Return all parses rather than the best match

  -s, --split REGEX                          Process input as a stream, parsing each chunk seperated by REGEX

  -l, --split-lines                          Split on newlines, same as --split '(?<=\r?\n)'

  -e, --encoding ENCODING       utf8         Use the specified encoding when reading the input, or raw

  -X, --style TYPE              hiccup       Build the parsed tree in the style of hiccup or enlive

  -h, --help                                 Help

Output Formats

  edn              Extensible Data Format

  edn-pretty       Extensible Data Format (pretty)

  json             JavaScript Object Notation

  json-pretty      JavaScript Object Notation (pretty)

  transit          Transit JSON

  transit-verbose  Transit JSON Verbose

  yaml             YAML Ain't Markup Language

Transformation Types

  array, list, unwrap, vec        Create list from children

  decimal, double, float, number  Convert to floating point

  dict, map, object               Create map from children

  first                           No conversion, just the first item

  int, integer                    Convert to integer

  keyword                         Create a keyword (only useful for EDN or Transit)

  map-kv                          Transform a list of key value pairs into a map

  merge                           Merge multiple maps into a single map

  nil, null                       Convert to nil

  str, string, text               Convert to string

Presets

  csv     Comma Seperated Value

  group   NSS group(5), i.e. /etc/group

  hosts   NSS hosts(5), i.e. /etc/hosts

  hpl     High Performance Linpack benchmark results

  passwd  NSS passwd(5), i.e. /etc/passwd

#+end_example

* Example

** Parsing ~passwd~

You can parse ~passwd~ entries using the following as ~passwd.ebnf~:

#+NAME: passwd-ebnf

#+HEADER: :exports code

#+HEADER: :results silent

#+HEADER: :tangle passwd.ebnf

#+BEGIN_SRC conf

passwd = (user )*

user = name  pw  uid  gid  gecos  home  shell

name = STRING

pw = STRING

uid = INTEGER

gid = INTEGER

gecos = STRING?

home = STRING

shell = STRING

STRING = #'[^:\r\n]+'

INTEGER = #'[0-9]+'

SEP = ':'

EOL = #'(?:\r\n|\r|\n)'

#+END_SRC

We will use ~getent passwd root~ to get the entry for the root user:

#+begin_example

$ getent passwd root

root:x:0:0:root:/root:/bin/bash

#+end_example

We can then pipe that into Parseit and see the result:

#+begin_example

$ getent passwd root | parseit passwd.ebnf 

["passwd",["user",["name","root"],["pw","x"],["uid",0],["gid",0],["gecos","root"],["home","/root"],["shell","/bin/bash"]]]

#+end_example

Not a bad start!  You will notice that the UID and GID values were converted into integers.  There is a library of standard transformations and the ~INTEGER~ rule will be transformed into an integer, and the ~STRING~ rule will be turned into a string.  You can avoid this by using ~--no-standard-tx~:

#+begin_example

$ getent passwd root | node target/parseit.js --no-standard-tx passwd.ebnf 

["passwd",["user",["name",["STRING","root"]],["pw",["STRING","x"]],["uid",["INTEGER","0"]],["gid",["INTEGER","0"]],["gecos",["STRING","root"]],["home",["STRING","/root"]],["shell",["STRING","/bin/bash"]]]]

#+end_example

If you use the options ~--transform user:map~ and ~--transform passwd:list~, Parseit will turn the ~user~ rule into a map (dictionary) and ~passwd~ rule into a list of users:

#+BEGIN_EXAMPLE

$ getent passwd root | parseit --transform user:map --transform passwd:list passwd.ebnf 

[{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}]

#+END_EXAMPLE

You don't actually need the EBNF file or any of those options though!  There is a ~passwd~ preset which does it all for your:

#+begin_example

$ getent passwd root | parseit --preset passwd

[{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}]

#+end_example

We are parsing a single user, but it's currently being returned in an array.  We can tell Parseit to start parsing at a specific rule using the argument ~--start RULE~:

#+begin_example

$ getent passwd root | parseit --preset passwd --start user

Parse error at line 1, column 23:

root:x:0:0:root:/root:/bin/bash

                      ^

Expected:

#"[^:\r\n]+" (followed by end-of-string)

#+end_example

Hey, that didn't work!  The problem is that the input ends with a newline and the user rule does not allow that.  We can use the ~--split REGEX~ argument to split the input on newlines.

#+begin_example

$ getent passwd root | parseit --preset passwd --start user --split '\n'

{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}

#+end_example

With ~--split~, each chunk will be processed as they are read. You can see this by doing:

#+begin_example

$ (getent passwd root; sleep 10; getent passwd bin) | parseit --preset passwd --start user --split '\n'

{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}

{"name":"bin","pw":"x","uid":2,"gid":2,"gecos":"bin","home":"/bin","shell":"/usr/sbin/nologin"}

#+end_example

Maybe you don't like reading JSON?  You can use the YAML output format to make it more readable:

#+begin_example

$ getent passwd root | parseit --preset passwd --format yaml

---

- name: root

  pw: x

  uid: 0

  gid: 0

  gecos: root

  home: /root

  shell: /bin/bash

#+end_example

* Grammar

Parseit uses [[https://github.com/Engelberg/instaparse][Instaparse]], so the [[https://github.com/Engelberg/instaparse#notation][notation section of the tutorial]] has a good description of the grammar syntax.  Keep in mind that you will not need to escape strings as you would in Clojure as the grammar will be read out of a text file.

* Building

This will install Shadow CLJS and then build the JavaScript as ~target/parseit.js~ and a native executable (using nexe) as ~parseit~:

#+begin_example

$ npm install -g shadow-cljs

$ npm install --save-dev shadow-cljs

$ shadow-cljs release cli

#+end_example

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/theasp/parseit

Awesome Lists containing this project

README