{"id":18422467,"url":"https://github.com/sri-csl/pvspackrat","last_synced_at":"2026-01-30T14:19:56.779Z","repository":{"id":137662398,"uuid":"229364998","full_name":"SRI-CSL/PVSPackrat","owner":"SRI-CSL","description":"PVS proofs for PEG grammars and Packrat parsers.","archived":false,"fork":false,"pushed_at":"2024-12-18T14:17:13.000Z","size":4889,"stargazers_count":4,"open_issues_count":1,"forks_count":3,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-06-04T07:27:00.709Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SRI-CSL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-21T02:20:55.000Z","updated_at":"2025-05-12T07:43:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"fba95a7c-d7c1-41bf-97a6-bdd352faa570","html_url":"https://github.com/SRI-CSL/PVSPackrat","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SRI-CSL/PVSPackrat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SRI-CSL%2FPVSPackrat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SRI-CSL%2FPVSPackrat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SRI-CSL%2FPVSPackrat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SRI-CSL%2FPVSPackrat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SRI-CSL","download_url":"https://codeload.github.com/SRI-CSL/PVSPackrat/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SRI-CSL%2FPVSPackrat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28914048,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-30T12:13:43.263Z","status":"ssl_error","status_checked_at":"2026-01-30T12:13:22.389Z","response_time":66,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T04:30:14.635Z","updated_at":"2026-01-30T14:19:56.761Z","avatar_url":"https://github.com/SRI-CSL.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# PEG parser generator\n\n## Overview\n\nThis repository contains explores verified PEG parsing in PVS. It contains two main parts: static analysis of grammars (to ensure termination) and several variants of parser generators (memoization, semantic interpretation, tail-recursive), all shown equivalent to each other. A third part dedicated to applications and examples is still in-progress. This repository is linked to this [CPP2020 paper](https://dl.acm.org/doi/10.1145/3372885.3373836)\n\n### 1. Static analysis \n\nPeg grammars are unambiguous by default. However, they might not be complete, i.e. terminating on all inputs. A simple subset of complete grammars is *wellformed* grammars. The `wf_peg` theory (in `delta.pvs`) presents an implementation of a wellformedness check with the associated properties. The `static_analysis` theory (in `static_analysis.pvs`) refines the wellformedness check to allow a reordering of grammar nodes based on a topological ordering.\n\n### 2. Parser generators\n\nSeveral parser generators are defined, all shown equivalent:\n\n* `parsing` is a straightforward, reference implementation (in `parser.pvs`)\n* `packrat_parser` is a *memoized* version of the reference parser (in `packrat_parser.pvs`)\n* `sempp` is a memoized parser with *semantic actions* (in `semantic_parser.pvs`)\n* `sempp_tl` is a memoized *tail-recursive* parser with semantic actions (in `semantic_tlparser.pvs`)\n\n### 3. Applications - in-progress\n\n* A small *arithmetic expressions* parser is defined in `ascii.pvs`. The semantic actions compute the result of the expression.\n* A small *json* parser is defined in `json.pvs`.\n\nBoth applications have a `correct_grammar` theorem to show that they are wellformed\n\n## Detailed description\n\n#### nT_pred_order.pvs\n*Technical definitions used for predicate manipulations*\n\nHere are defined a boolean triple `nTinst` (short for *non-terminal properties instance*) to represent the grammar properties (can fail, can succeed without consuming something, can succeed consuming something) of a given grammar node. To store the properties of all the non-terminal interpretations, we use the `nTprop` object. We define an order on both, and the usual properties : transitivity and reflexivity, plus a distributivity of the order on `nTprop` over instances of `nTinst`.\n\n#### array_sum.pvs\n*Technical definitions used for counting the number of times a predicate is satisfied in a `nTprop` object*\n\nThe `aux` function is just a tail-recursive function that iterates over the array. Two main results are shown : the aux function is growing (for the nonTerminal ordering) and injective.\n\n#### delta.pvs\n*Main file for peg grammar definition, properties computation and wellformedness*\n\n##### `peg THEORY`\nThis theory is parameterized by the type of the terminals, their ordering, and the number of nonterminals\nThe `peg` datatype is made of constructors that directly represent the possible patterns of a peg grammar.\nThe `pegMeasure` is a simple `reduce_nat`. Three results are shown : the measure is growing and injective regarding the *subterm* relationship, and the *subterm* relationship is transitive.\n\n##### `wf_peg THEORY`\nThis theory is parameterized by the same parameters as `peg`.\nThis theory aims at expressing what a wellformed grammar is, based on properties calculation.\nThe `grammar_props` function recursively computes the properties of a grammar node based on the *already-known* properties of the non-terminals (given as an argument). Then we can compute all accessible new properties, adding them as we go over all the non terminals with the `recompute_nonTerminals_properties` function. The fix point of accessible properties is obtained with the `compute_properties` function, that calls the previous one over and over again until no new properties are found. The obtained set of properties, of type `fix_point` is *coherent* (not contradicted by iteself) and *maximal* (recomputing properties based on it cannot lead to new ones).\n\nTwo types of wellformedness are defined :\n\n* **complete** : a completely-wellformed grammar node is a node where no subterm is of the form `star(e)` or `plus(e)` with `e` being able to succeed without consuming any input (thus looping forever). To say it in other words : *a completely-wellformed grammar does not have structural loops*\n* **strong** : a strongly-wellformed grammar node (relative to a non terminal A) is a node that only uses non terminals that are strictly less than A, unless a sequence is found, with the first argument not being able to succeed without consuming any input. Then the check switches to the complete wellformedness. To say it in other words : *a strongly wellformed grammar is allowed to use higher non terminals only when we are sure that it consumed at least one character*.\n\nThe `grammar_wellformedness` function works for both kinds by switching the last argument. A *wellformed set of non terminals* (`WF_nT`) is defined.\n\n#### pre_ast.pvs\n*Definition of an abstract syntax tree (ast) structure*\nThe datatype `pre_ast` aims at capturing all parsing paths possible. A few structural conditions are already included in the constructors (for example, the leftmost branch correspond to a path that the parser has to explore, and thus cannot contain any skip node).\n\n#### ast.pvs\n*Main file for ast properties : definition of a meaningful and a wellformed ast*\n\nThe `astType` captures the fact that a tree is either a coherent proof of success or failure, or is incoherent (undefined). The check is done by the recursive `astType?` function. A `astMeaningful` tree is a tree that is either of type `success` or `failure`.\n\nThe `astWellformed?` function recursively checks that the tree correspond to a valid proof of parse. Two results are shown : a wellformed tree is also meaningful, and a wellformed tree starting at `s` and ending at `e` as to verify `s \u003c= e`.\n\n#### lex3.pvs\n*Technical definition of triple lexical ordering*\n\n#### lex4.pvs\n*Technical definition of quadruple lexical ordering*\n\n#### parser.pvs\n*Main file for the reference parser generator definition*\n\nThe argument of the parser are the following :\n* `P_exp` : the set of non terminal interpretations, of type `WF_nT` (strongly wellformed)\n\n* `A` : the current non terminal\n* `G` : the current grammar node (as to be a subterm of `P_exp(A)`)\n* `inp`: the input array\n* `b` : the bound of the input\n* `s` : the current index\n* `s_T` : the index when the parse of the current nonTerminal started\n\nThe output type carries a lot of information. A returned tree T verifies :\n\n* `s(T) = s` : it starts at the current index\n* `e(T) \u003c= b` : it ends before or at the bound\n* if G was a star node, then T is too\n* if G was a plus node, then T is too\n* T is not a skip (a skip does not represent a parsing result)\n* if T is a success, and did not consume anything, it implies that G had the property `P_0`\n* if T is a success, and did consume something, it implies that G had the property `P_\u003e0`\n* if T is a failure, it implies that G had the property `P_f`\n\nThe parsing termination is proved using a quadruple lexicographic order :\n\n * the parser goes down the grammar (`pegMeasure(G)`), unless it finds a star, a plus or a nonTerminal operator\n * if a star or a plus is found, then the grammar node stays constant, but at least one character must be consumed and thus, `s` increases.\n * if a nonTerminal node is found, `s` does not change, and the new grammar node might be anything. But, as `P_exp` is strongly wellformed, we either have a non terminal strictly lower than the current one, or at least one character was consumed, so `s` is greater than `s_T` (so the recursive call with `s_T` set as `s` is legit)\n\nMost of the proof work is done in the tccs.\n\n#### packrat_parser.pvs\n*Definition of a parser generator that uses memoization*\n\nThe packrat_parser theory has a similar parser with a result that is a table mapping each start index and nonterminal to an AST that matches the one returned by the reference parser parsing, and an updated table.\n\n#### semantic_interp.pvs\n*Definition of the semantic interpretation of result tree*\n\n#### semantic_parser.pvs\n*Definition of a packrat parser with semantic interpretation built-in*\n\n#### semantic_tlparser.pvs\n*Tail-recursive version, written with continuations*\nThe proof trick is to have the continuation applied to the result of the parser being equal to the result of the reference parser \n\n#### static_analysis.pvs\n*Wellformedness up-to reordering of non-terminal nodes*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsri-csl%2Fpvspackrat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsri-csl%2Fpvspackrat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsri-csl%2Fpvspackrat/lists"}