Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/theasp/parseit
Parseit - Parseit is command line tool to parse data using EBNF or ABNF using the excellent Instaparse library, and serializing the result into JSON, EDN, YAML or Transit format
https://github.com/theasp/parseit
abnf ebnf instaparse
Last synced: about 2 months ago
JSON representation
Parseit - Parseit is command line tool to parse data using EBNF or ABNF using the excellent Instaparse library, and serializing the result into JSON, EDN, YAML or Transit format
- Host: GitHub
- URL: https://github.com/theasp/parseit
- Owner: theasp
- License: apache-2.0
- Created: 2019-12-20T20:23:16.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2022-12-05T02:42:48.000Z (about 2 years ago)
- Last Synced: 2024-01-30T04:20:26.735Z (11 months ago)
- Topics: abnf, ebnf, instaparse
- Language: Clojure
- Size: 376 KB
- Stars: 12
- Watchers: 2
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.org
- License: LICENSE
Awesome Lists containing this project
README
#+TITLE: Parseit
* Introduction
Parseit is command line tool to parse data using [[https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form][EBNF]] or [[https://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_form][ABNF]] using the excellent [[https://github.com/Engelberg/instaparse][Instaparse]] library, and serializing the result into [[https://www.json.org/json-en.html][JSON]], [[https://github.com/edn-format/edn][EDN]], [[https://yaml.org/][YAML]] or [[https://github.com/cognitect/transit-format][Transit]] format.* Usage
#+begin_example
Usage: parseit [options] [input]
parseit [options] --preset [input]
parseit --helpOptions
-p, --preset PRESET none Preset grammar to use
-f, --format FORMAT json-pretty Select the output format
-S, --start RULE Start processing at this rule
-t, --transform RULE:TYPE[+] Transform a rule into a type, and keep wrapped with + suffix
-T, --no-standard-tx Do use the standard transformations
-a, --all Return all parses rather than the best match
-s, --split REGEX Process input as a stream, parsing each chunk seperated by REGEX
-l, --split-lines Split on newlines, same as --split '(?<=\r?\n)'
-e, --encoding ENCODING utf8 Use the specified encoding when reading the input, or raw
-X, --style TYPE hiccup Build the parsed tree in the style of hiccup or enlive
-h, --help HelpOutput Formats
edn Extensible Data Format
edn-pretty Extensible Data Format (pretty)
json JavaScript Object Notation
json-pretty JavaScript Object Notation (pretty)
transit Transit JSON
transit-verbose Transit JSON Verbose
yaml YAML Ain't Markup LanguageTransformation Types
array, list, unwrap, vec Create list from children
decimal, double, float, number Convert to floating point
dict, map, object Create map from children
first No conversion, just the first item
int, integer Convert to integer
keyword Create a keyword (only useful for EDN or Transit)
map-kv Transform a list of key value pairs into a map
merge Merge multiple maps into a single map
nil, null Convert to nil
str, string, text Convert to stringPresets
csv Comma Seperated Value
group NSS group(5), i.e. /etc/group
hosts NSS hosts(5), i.e. /etc/hosts
hpl High Performance Linpack benchmark results
passwd NSS passwd(5), i.e. /etc/passwd
#+end_example* Example
** Parsing ~passwd~You can parse ~passwd~ entries using the following as ~passwd.ebnf~:
#+NAME: passwd-ebnf
#+HEADER: :exports code
#+HEADER: :results silent
#+HEADER: :tangle passwd.ebnf
#+BEGIN_SRC conf
passwd = (user )*
user = name pw uid gid gecos home shellname = STRING
pw = STRING
uid = INTEGER
gid = INTEGER
gecos = STRING?
home = STRING
shell = STRING
STRING = #'[^:\r\n]+'
INTEGER = #'[0-9]+'
SEP = ':'
EOL = #'(?:\r\n|\r|\n)'
#+END_SRCWe will use ~getent passwd root~ to get the entry for the root user:
#+begin_example
$ getent passwd root
root:x:0:0:root:/root:/bin/bash
#+end_exampleWe can then pipe that into Parseit and see the result:
#+begin_example
$ getent passwd root | parseit passwd.ebnf
["passwd",["user",["name","root"],["pw","x"],["uid",0],["gid",0],["gecos","root"],["home","/root"],["shell","/bin/bash"]]]
#+end_exampleNot a bad start! You will notice that the UID and GID values were converted into integers. There is a library of standard transformations and the ~INTEGER~ rule will be transformed into an integer, and the ~STRING~ rule will be turned into a string. You can avoid this by using ~--no-standard-tx~:
#+begin_example
$ getent passwd root | node target/parseit.js --no-standard-tx passwd.ebnf
["passwd",["user",["name",["STRING","root"]],["pw",["STRING","x"]],["uid",["INTEGER","0"]],["gid",["INTEGER","0"]],["gecos",["STRING","root"]],["home",["STRING","/root"]],["shell",["STRING","/bin/bash"]]]]
#+end_exampleIf you use the options ~--transform user:map~ and ~--transform passwd:list~, Parseit will turn the ~user~ rule into a map (dictionary) and ~passwd~ rule into a list of users:
#+BEGIN_EXAMPLE
$ getent passwd root | parseit --transform user:map --transform passwd:list passwd.ebnf
[{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}]
#+END_EXAMPLEYou don't actually need the EBNF file or any of those options though! There is a ~passwd~ preset which does it all for your:
#+begin_example
$ getent passwd root | parseit --preset passwd
[{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}]
#+end_exampleWe are parsing a single user, but it's currently being returned in an array. We can tell Parseit to start parsing at a specific rule using the argument ~--start RULE~:
#+begin_example
$ getent passwd root | parseit --preset passwd --start user
Parse error at line 1, column 23:
root:x:0:0:root:/root:/bin/bash
^
Expected:
#"[^:\r\n]+" (followed by end-of-string)
#+end_exampleHey, that didn't work! The problem is that the input ends with a newline and the user rule does not allow that. We can use the ~--split REGEX~ argument to split the input on newlines.
#+begin_example
$ getent passwd root | parseit --preset passwd --start user --split '\n'
{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}
#+end_exampleWith ~--split~, each chunk will be processed as they are read. You can see this by doing:
#+begin_example
$ (getent passwd root; sleep 10; getent passwd bin) | parseit --preset passwd --start user --split '\n'
{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}
{"name":"bin","pw":"x","uid":2,"gid":2,"gecos":"bin","home":"/bin","shell":"/usr/sbin/nologin"}
#+end_exampleMaybe you don't like reading JSON? You can use the YAML output format to make it more readable:
#+begin_example
$ getent passwd root | parseit --preset passwd --format yaml
---
- name: root
pw: x
uid: 0
gid: 0
gecos: root
home: /root
shell: /bin/bash
#+end_example* Grammar
Parseit uses [[https://github.com/Engelberg/instaparse][Instaparse]], so the [[https://github.com/Engelberg/instaparse#notation][notation section of the tutorial]] has a good description of the grammar syntax. Keep in mind that you will not need to escape strings as you would in Clojure as the grammar will be read out of a text file.
* Building
This will install Shadow CLJS and then build the JavaScript as ~target/parseit.js~ and a native executable (using nexe) as ~parseit~:
#+begin_example
$ npm install -g shadow-cljs
$ npm install --save-dev shadow-cljs
$ shadow-cljs release cli
#+end_example