{"id":19427254,"url":"https://github.com/theasp/parseit","last_synced_at":"2025-09-01T07:35:43.268Z","repository":{"id":42959105,"uuid":"229331070","full_name":"theasp/parseit","owner":"theasp","description":"Parseit - Parseit is command line tool to parse data using EBNF or ABNF using the excellent Instaparse library, and serializing the result into JSON, EDN, YAML or Transit format","archived":false,"fork":false,"pushed_at":"2022-12-05T02:42:48.000Z","size":385,"stargazers_count":14,"open_issues_count":5,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-24T17:47:22.190Z","etag":null,"topics":["abnf","ebnf","instaparse"],"latest_commit_sha":null,"homepage":null,"language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/theasp.png","metadata":{"files":{"readme":"README.org","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-20T20:23:16.000Z","updated_at":"2025-02-24T06:50:14.000Z","dependencies_parsed_at":"2023-01-23T07:16:00.014Z","dependency_job_id":null,"html_url":"https://github.com/theasp/parseit","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/theasp/parseit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/theasp%2Fparseit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/theasp%2Fparseit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/theasp%2Fparseit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/theasp%2Fparseit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/theasp","download_url":"https://codeload.github.com/theasp/parseit/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/theasp%2Fparseit/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267272276,"owners_count":24062439,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-26T02:00:08.937Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abnf","ebnf","instaparse"],"created_at":"2024-11-10T14:11:01.595Z","updated_at":"2025-07-27T00:04:56.247Z","avatar_url":"https://github.com/theasp.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"#+TITLE: Parseit\n\n* Introduction\nParseit is command line tool to parse data using [[https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form][EBNF]] or [[https://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_form][ABNF]] using the excellent [[https://github.com/Engelberg/instaparse][Instaparse]] library, and serializing the result into [[https://www.json.org/json-en.html][JSON]], [[https://github.com/edn-format/edn][EDN]], [[https://yaml.org/][YAML]] or [[https://github.com/cognitect/transit-format][Transit]] format.\n\n* Usage\n#+begin_example\nUsage: parseit [options] \u003cgrammar\u003e [input]\n       parseit [options] --preset \u003cpreset\u003e [input]\n       parseit --help\n\nOptions\n  -p, --preset PRESET           none         Preset grammar to use\n  -f, --format FORMAT           json-pretty  Select the output format\n  -S, --start RULE                           Start processing at this rule\n  -t, --transform RULE:TYPE[+]               Transform a rule into a type, and keep wrapped with + suffix\n  -T, --no-standard-tx                       Do use the standard transformations\n  -a, --all                                  Return all parses rather than the best match\n  -s, --split REGEX                          Process input as a stream, parsing each chunk seperated by REGEX\n  -l, --split-lines                          Split on newlines, same as --split '(?\u003c=\\r?\\n)'\n  -e, --encoding ENCODING       utf8         Use the specified encoding when reading the input, or raw\n  -X, --style TYPE              hiccup       Build the parsed tree in the style of hiccup or enlive\n  -h, --help                                 Help\n\nOutput Formats\n  edn              Extensible Data Format\n  edn-pretty       Extensible Data Format (pretty)\n  json             JavaScript Object Notation\n  json-pretty      JavaScript Object Notation (pretty)\n  transit          Transit JSON\n  transit-verbose  Transit JSON Verbose\n  yaml             YAML Ain't Markup Language\n\nTransformation Types\n  array, list, unwrap, vec        Create list from children\n  decimal, double, float, number  Convert to floating point\n  dict, map, object               Create map from children\n  first                           No conversion, just the first item\n  int, integer                    Convert to integer\n  keyword                         Create a keyword (only useful for EDN or Transit)\n  map-kv                          Transform a list of key value pairs into a map\n  merge                           Merge multiple maps into a single map\n  nil, null                       Convert to nil\n  str, string, text               Convert to string\n\nPresets\n  csv     Comma Seperated Value\n  group   NSS group(5), i.e. /etc/group\n  hosts   NSS hosts(5), i.e. /etc/hosts\n  hpl     High Performance Linpack benchmark results\n  passwd  NSS passwd(5), i.e. /etc/passwd\n#+end_example\n\n* Example\n** Parsing ~passwd~\n\nYou can parse ~passwd~ entries using the following as ~passwd.ebnf~:\n#+NAME: passwd-ebnf\n#+HEADER: :exports code\n#+HEADER: :results silent\n#+HEADER: :tangle passwd.ebnf\n#+BEGIN_SRC conf\npasswd = (user \u003cEOL\u003e)*\nuser = name \u003cSEP\u003e pw \u003cSEP\u003e uid \u003cSEP\u003e gid \u003cSEP\u003e gecos \u003cSEP\u003e home \u003cSEP\u003e shell\n\nname = STRING\npw = STRING\nuid = INTEGER\ngid = INTEGER\ngecos = STRING?\nhome = STRING\nshell = STRING\nSTRING = #'[^:\\r\\n]+'\nINTEGER = #'[0-9]+'\nSEP = ':'\nEOL = #'(?:\\r\\n|\\r|\\n)'\n#+END_SRC\n\nWe will use ~getent passwd root~ to get the entry for the root user:\n#+begin_example\n$ getent passwd root\nroot:x:0:0:root:/root:/bin/bash\n#+end_example\n\nWe can then pipe that into Parseit and see the result:\n#+begin_example\n$ getent passwd root | parseit passwd.ebnf \n[\"passwd\",[\"user\",[\"name\",\"root\"],[\"pw\",\"x\"],[\"uid\",0],[\"gid\",0],[\"gecos\",\"root\"],[\"home\",\"/root\"],[\"shell\",\"/bin/bash\"]]]\n#+end_example\n\nNot a bad start!  You will notice that the UID and GID values were converted into integers.  There is a library of standard transformations and the ~INTEGER~ rule will be transformed into an integer, and the ~STRING~ rule will be turned into a string.  You can avoid this by using ~--no-standard-tx~:\n#+begin_example\n$ getent passwd root | node target/parseit.js --no-standard-tx passwd.ebnf \n[\"passwd\",[\"user\",[\"name\",[\"STRING\",\"root\"]],[\"pw\",[\"STRING\",\"x\"]],[\"uid\",[\"INTEGER\",\"0\"]],[\"gid\",[\"INTEGER\",\"0\"]],[\"gecos\",[\"STRING\",\"root\"]],[\"home\",[\"STRING\",\"/root\"]],[\"shell\",[\"STRING\",\"/bin/bash\"]]]]\n#+end_example\n\nIf you use the options ~--transform user:map~ and ~--transform passwd:list~, Parseit will turn the ~user~ rule into a map (dictionary) and ~passwd~ rule into a list of users:\n#+BEGIN_EXAMPLE\n$ getent passwd root | parseit --transform user:map --transform passwd:list passwd.ebnf \n[{\"name\":\"root\",\"pw\":\"x\",\"uid\":0,\"gid\":0,\"gecos\":\"root\",\"home\":\"/root\",\"shell\":\"/bin/bash\"}]\n#+END_EXAMPLE\n\nYou don't actually need the EBNF file or any of those options though!  There is a ~passwd~ preset which does it all for your:\n#+begin_example\n$ getent passwd root | parseit --preset passwd\n[{\"name\":\"root\",\"pw\":\"x\",\"uid\":0,\"gid\":0,\"gecos\":\"root\",\"home\":\"/root\",\"shell\":\"/bin/bash\"}]\n#+end_example\n\nWe are parsing a single user, but it's currently being returned in an array.  We can tell Parseit to start parsing at a specific rule using the argument ~--start RULE~:\n#+begin_example\n$ getent passwd root | parseit --preset passwd --start user\nParse error at line 1, column 23:\nroot:x:0:0:root:/root:/bin/bash\n                      ^\nExpected:\n#\"[^:\\r\\n]+\" (followed by end-of-string)\n#+end_example\n\nHey, that didn't work!  The problem is that the input ends with a newline and the user rule does not allow that.  We can use the ~--split REGEX~ argument to split the input on newlines.\n#+begin_example\n$ getent passwd root | parseit --preset passwd --start user --split '\\n'\n{\"name\":\"root\",\"pw\":\"x\",\"uid\":0,\"gid\":0,\"gecos\":\"root\",\"home\":\"/root\",\"shell\":\"/bin/bash\"}\n#+end_example\n\nWith ~--split~, each chunk will be processed as they are read. You can see this by doing:\n#+begin_example\n$ (getent passwd root; sleep 10; getent passwd bin) | parseit --preset passwd --start user --split '\\n'\n{\"name\":\"root\",\"pw\":\"x\",\"uid\":0,\"gid\":0,\"gecos\":\"root\",\"home\":\"/root\",\"shell\":\"/bin/bash\"}\n{\"name\":\"bin\",\"pw\":\"x\",\"uid\":2,\"gid\":2,\"gecos\":\"bin\",\"home\":\"/bin\",\"shell\":\"/usr/sbin/nologin\"}\n#+end_example\n\nMaybe you don't like reading JSON?  You can use the YAML output format to make it more readable:\n#+begin_example\n$ getent passwd root | parseit --preset passwd --format yaml\n---\n- name: root\n  pw: x\n  uid: 0\n  gid: 0\n  gecos: root\n  home: /root\n  shell: /bin/bash\n#+end_example\n\n* Grammar\n\nParseit uses [[https://github.com/Engelberg/instaparse][Instaparse]], so the [[https://github.com/Engelberg/instaparse#notation][notation section of the tutorial]] has a good description of the grammar syntax.  Keep in mind that you will not need to escape strings as you would in Clojure as the grammar will be read out of a text file.\n\n* Building\nThis will install Shadow CLJS and then build the JavaScript as ~target/parseit.js~ and a native executable (using nexe) as ~parseit~:\n#+begin_example\n$ npm install -g shadow-cljs\n$ npm install --save-dev shadow-cljs\n$ shadow-cljs release cli\n#+end_example\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftheasp%2Fparseit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftheasp%2Fparseit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftheasp%2Fparseit/lists"}