{"id":26247904,"url":"https://github.com/dokwork/parcom","last_synced_at":"2026-05-18T13:05:58.695Z","repository":{"id":278528448,"uuid":"935901305","full_name":"dokwork/parcom","owner":"dokwork","description":"Parser combinators for Zig, ready to parse on-the-fly. Consume input, not memory.","archived":false,"fork":false,"pushed_at":"2025-03-12T11:39:45.000Z","size":64,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-12T12:28:22.159Z","etag":null,"topics":["parser","parser-combinators","zig","zig-library","ziglang"],"latest_commit_sha":null,"homepage":"https://dokwork.github.io/parcom/index.html","language":"Zig","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dokwork.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-20T07:51:40.000Z","updated_at":"2025-03-10T06:39:28.000Z","dependencies_parsed_at":"2025-02-23T14:23:36.991Z","dependency_job_id":"54d349ab-8d24-4d52-a812-a0bc5c0887be","html_url":"https://github.com/dokwork/parcom","commit_stats":null,"previous_names":["vladimir-popov/parcom","dokwork/parcom"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dokwork%2Fparcom","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dokwork%2Fparcom/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dokwork%2Fparcom/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dokwork%2Fparcom/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dokwork","download_url":"https://codeload.github.com/dokwork/parcom/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243419127,"owners_count":20287806,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["parser","parser-combinators","zig","zig-library","ziglang"],"created_at":"2025-03-13T14:15:55.943Z","updated_at":"2025-12-29T13:24:39.948Z","avatar_url":"https://github.com/dokwork.png","language":"Zig","funding_links":[],"categories":["Zig"],"sub_categories":[],"readme":"# parcom\n\n[![parcom ci](https://github.com/dokwork/parcom/actions/workflows/ci.yml/badge.svg)](https://github.com/dokwork/parcom/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/dokwork/parcom/branch/main/graph/badge.svg?token=OP8OVU42LV)](https://codecov.io/gh/dokwork/parcom)\n![zig version](https://img.shields.io/badge/zig%20version-0.14.0-fcca77)\n\n_Consume input, not memory._\n\nThis library provides an implementation of the parser combinators.\n\n`Parcom` offers two options for consuming data:\n - parse the entire input string at once,\n - or consume and parse byte by byte from `AnyReader`.\n\nWhen the input is a reader, `Parcom` works as a buffered reader. It reads few\nbytes to the buffer and then parse them.\n\n## Installation\n\nFetch `Parcom` from github:\n```sh\nzig fetch --save git+https://github.com/dokwork/parcom\n```\nCheck that it was added to the list of dependencies in your `build.zig.zon` file:\n```zig\n...\n    .dependencies = .{\n        .parcom = .{\n            .url = \"git+https://github.com/dokwork/parcom#b93b8fb14f489007f27d42f8254f12b7d57d07da\",\n            .hash = \"parcom-0.3.0-Hs8wfHFUAQBhhH-swYl1wrMLSh76uApvVzYBl56t90Ua\",\n        },\n    },\n```\nAdd `Parcom` module to your `build.zig`:\n```zig\n    const parcom = b.dependency(\"parcom\", .{\n        .target = target,\n        .optimize = optimize,\n    });\n    ...\n    exe.root_module.addImport(\"parcom\", parcom.module(\"parcom\"));\n```\n\n## Documentation\n[https://dokwork.github.io/parcom/index.html](https://dokwork.github.io/parcom/index.html)\n\n## Examples\n\n - [The parser of a math expression](examples/expression.zig)\n - [The json parser](examples/json.zig)\n\n## Quick start\n\nLet's create a parser, which will parse and execute a simple math expression with follow\ngrammar:\n```\n# The `number` is a sequence of unsigned integer numbers\nNumber := [0-9]+\n# The `value` is a `number` or an `expression` in brackets\nValue  := Number / '(' Expr ')'\n# The `sum` is an operation of adding or substraction of two or more values\nSum    := Value (('+' / '-') Value)*\n# The `expression` is result of evaluation the combination of values and operations\nExpr   := evaluate(Sum)\n```\nOur parser will be capable of parsing and evaluating mathematical expressions\nthat include addition and subtraction operations, unsigned integers, and nested\nexpressions within brackets.\n\n### A short API overview\n\nThree different types of parser implementations exist:\n\n - The base parser implementations contain the logic for parsing input and serve\n   as the fundamental building blocks;\n - The `ParserCombinator`provides methods to combine parsers and create new ones;\n - The `TaggedParser` erases the type of the underlying parser and simplifies\n   the parser's type declaration.\n\nEvery parser provides the type of the parsing result as a constant `ResultType:\ntype`.\n\nThe result of parsing by any parser can be a value of type `ResultType` in successful\ncase, or `null` if parsing was failed. In successful case not whole input can be\nconsumed. If you have to be sure, that every byte was consumed and parsed, use the\n[`end()`](https://dokwork.github.io/parcom/index.html#parcom.end) parser explicitly.\n\n### Base parser\n\nThe `number` from the grammar above is a sequence of symbols from the range ['0', '9'].\nParcom has a constructor of the parser of bytes in a range, but we will create\nour own parser starting from the base parser `AnyChar`. `AnyChar` is a simplest\nparser consumed the input. It returns the next byte from the input, or\n`null` if the input is empty.\n\nTo parse only numeric symbols we should provide a classifier - function that\nreceives the result of a parser and returns true only if it is an expected value:\n```zig\nconst parcom = @import(\"parcom\");\n\n// ResultType: u8\nconst num_char = parcom.anyChar().suchThat({}, struct {\n    fn condition(_: void, ch: u8) bool {\n        return switch (ch) {\n            '0' ... '9' =\u003e true,\n            else =\u003e false,\n        };\n    }\n}.condition);\n```\nEvery function required i combinators in `Parcom` library has a `context` parameter.\nThat gives more flexibility for possible implementations of that functions.\n\n### Repeat parsers\n\nNext, we should continue applying our parser until we encounter the first\nnon-numeric symbol or reach the end of the input. To achieve this, we need to\nstore the parsed results. The simplest solution is to use a sentinel array:\n```zig\n// ResultType: [10:0]u8\nconst number = num_char.repeatToSentinelArray(.{ .max_count = 10 });\n```\nBut that option is available only for parsers with scalar result types. For more\ngeneral cases a regular array can be used. If you know exact count of elements\nin the parsed sequence, you can specified it to have an array with exact length\nas result:\n```zig\n// ResultType: [3]u8\nconst number = num_char.repeatToArray(3);\n```\nHowever, this is a rare case. More often, the exact number of elements is\nunknown, but the maximum number can be estimated: \n```zig\n// ResultType: struct { [10]u8, usize }\nconst number = num_char.repeatToArray(.{ .max_count = 10 });\n```\nIn such cases, the result is a tuple consisting of the array and a count of the\nparsed items within it.\n\nFor cases, when impossible to predict the maximum count we can allocate a slice\nto store the parsed results:\n```zig\n// ResultType: []u8\nconst number = num_char.repeat(allocator, .{});\n\n// Don't forget to free the memory, allocated for the slice!\n```\nor use an arbitrary storage and a function to add an item to it:\n```zig\nvar list = std.ArrayList(u8).init(allocator);\ndefer list.deinit();\n// ResultType: *std.ArrayList(u8)\nconst p = anyChar().repeatTo(\u0026list, .{}, std.ArrayList(u8).append);\n```\n\nNotice, that no matter which combinator you use to collected repeated numbers,\nyou have to set the `.min_count` to 1, because of empty collection of chars is\nnot a number!\n```zig\n// ResultType: []u8\nconst number = num_char.repeat(allocator, .{ .min_count = 1 });\n```\n\n**RepeatOptions**\n\nAll repeated combinators except the `repeatToArray(usize)` receive the `RepeatOptions`,\na structure with minimum and maximum counts of the elements in the sequence. All\nparsers stop when reach the maximum count and fail if don't reach the minimum.\n\n### Try one or try another\n\nWe'll postpone the `value` parser for now, and instead of that will focus on\ncreating a parsers for the '+' and '-' symbols.\n```zig\n// ResultType: i32\nconst value: ParserCombinator(???) = ???; \n```\n\nFirst of all, we should be able to parse every symbol separately. The `char`\nparser is the best candidate for it:\n```zig\nconst plus = parcom.char('+');\nconst minus = parcom.char('-');\n```\nNext, we have to choose one of them. To accomplish this, let's combine parsers\nto a new one, that first attempt one, and if it fails, it will try the other:\n```zig\n// ResultType: parcom.Either(u8, u8)\nconst plus_or_minus = plus.orElse(minus);\n```\nThe result type of the new parser is `parcom.Either(L, R)`, an alias for\n`union(enum) { left: L, right: R }` type.\n\n### Combine results\n\nWe have a parser for operations and we assume that we have a parser for\nvalues as well. This is sufficient to build the `Sum` parser, which, as you\nmay recall, follows this structure:\n```\nSum := Value (('+' / '-') Value)*\n```\nLet's start from the part in brackets. We have to combine the `plus_or_minus` parser\nwith `value` parser and repeat result:\n```zig\n// ResultType: []struct{ parcom.Either(u8, u8), i32 }\nplus_or_minus.andThen(value).repeat(allocator, .{});\n```\nThe `andThen` combinator runs the left parser and then the right. If both\nparsers were successful, it returns a tuple of results. Finally, we can combine\nthe value with the new parser to have the version of the `expression`\nparser that follows the grammar:\n```zig\n// ResultType: struct{ i32, []struct{ parcom.Either(u8, u8), i32 } }\nconst sum = value.andThen(plus_or_minus.andThen(value).repeat(allocator, .{}));\n```\n\n### Transform the result\n\nSo far so good. We are ready to create a parser that will not only parse the input, but\nalso sum of parsed values:\n```zig\nconst expr = sum.transform(i32, {}, struct {\n    fn evaluate(_: void, value: struct{ i32, []struct{ Either(u8, u8), i32 } }) !i32 {\n        var result: i32 = value[0];\n        for (value[1]) |op_and_arg| {\n            switch(op_and_arg[0]) {\n                .left =\u003e result += op_and_arg[1],\n                .right =\u003e result -= op_and_arg[1],\n            )\n        }\n        return result;\n    }\n}.evaluate);\n```\nThe combinator `transform` requires a context and a function for transformation. It\nruns the left parser and applies the function to the parsed result.\n\n### Tagged parser\n\nNow the time to build the `value` parser:\n```\nValue  := Number / '(' Expr ')'\n```\nThis is a recursive parser that not only forms part of the `expression` parser but\nalso depends on it. How we can implement this? First of all, let's wrap the\n`expression` parser to the function:\n```zig\nconst std = @import(\"std\");\nconst parcom = @import(\"parcom\");\n\nfn expression(allocator: std.mem.Allocator) ??? {\n\n    // ResultType: u8\n    const num_char = parcom.anyChar().suchThat({}, struct {\n        fn condition(_: void, ch: u8) bool {\n            return switch (ch) {\n                '0' ... '9' =\u003e true,\n                else =\u003e false,\n            };\n        }\n    }.condition);\n\n    // ResultType: i32\n    const number = num_char.repeat(allocator, .{ .min_count = 1 }).transform(i32, {}, struct {\n        fn parseInt(_: void, value: []u8) !i32 {\n            return try std.fmt.parseInt(i32, value, 10);\n        }\n    }.parseInt);\n\n    // ResultType: i32\n    const value = ???;\n\n    // ResultType: parcom.Either(u8, u8)\n    const plus_or_minus = parcom.char('+').orElse(parcom.char('-'));\n\n    // ResultType: struct{ i32, []struct{ parcom.Either(u8, u8), i32 } }\n    const sum = value.andThen(plus_or_minus.andThen(value).repeat(allocator, .{}));\n\n    const expr = sum.transform(i32, {}, struct {\n        fn evaluate(_: void, v: struct{ i32, []struct{ parcom.Either(u8, u8), i32 } }) !i32 {\n            var result: i32 = v[0];\n            for (v[1]) |op_and_arg| {\n                switch(op_and_arg[0]) {\n                    .left =\u003e result += op_and_arg[1],\n                    .right =\u003e result -= op_and_arg[1],\n                }\n            }\n            return result;\n        }\n    }.evaluate);\n\n    return expr;\n}\n```\nThe type of `ParserCombinator` in `Parcom` can be very cumbersome, and it is\noften impractical to manually declare it as a function's type. However, Zig\nrequires this type to allocate enough memory for the parser instance.\nWhile most parsers in `Parcom` are simply namespaces, this is not true for all\nof them. What can we do is moving our parser to heap and replace particular type\nby the pointer to it. This is exactly how the `TaggedParser` works. It has a\npointer to the original parser, and a pointer to a function responsible for\nparsing the input. More over, the `TaggedParser` has explicit `ResultType`:\n```zig\nconst std = @import(\"std\");\nconst parcom = @import(\"parcom\");\n\nfn expression(allocator: std.mem.Allocator) parcom.TaggedParser(i32) {\n    ...\n    return expr.taggedAllocated(allocator);\n}\n```\n\n### Deferred parser\n\nLet's go ahead and finally build the `value` parser:\n```zig\nconst value = number.orElse(\n    parcom.char('(').rightThen(expression(allocator)).leftThen(parcom.char(')')\n);\n```\nPay attention on `rightThen` and `leftThen` combinators. Unlike the `andThen`\ncombinator, these two do not produce a tuple. Instead, they ignore one value and\nreturn another. The `rightThen` uses only result of the right parser, and\n`leftThen` of the left parser respectively. It means, that both brackets will be\nparsed, but ignored in the example above.\n\nBut this is not all. Unfortunately, such implementation of the `value`\nparser will lead to infinite loop of invocations the `expression` function. We\ncan solve this by invoking the function only when we need to parse an expression\nwithin brackets. The `Parcom` has the `deferred` parser for such purposes.\nIt receives the `ResultType` of `TaggedParser` which should be returned by the function,\na context that should be passed to the function and pointer to the function:\n\n```zig\nconst value = number.orElse(\n    parcom.char('(').rightThen(parcom.deferred(i32, allocator, expression)).leftThen(parcom.char(')'))\n);\n```\nWhen the tagged parsed completes its deferred work, the `deinit` method will be\ninvoked, and memory will be freed. But, do not forget to invoke `deinit`\nmanually, when you create the `TaggedParser` outside the `deferred` parser!\n\n\u003cdetails\u003e\n  \u003csummary\u003eComplete solution\u003c/summary\u003e\n  \n```zig\nconst std = @import(\"std\");\nconst parcom = @import(\"parcom\");\n\nfn expression(allocator: std.mem.Allocator) !parcom.TaggedParser(i32) {\n\n    // ResultType: u8\n    const num_char = parcom.anyChar().suchThat({}, struct {\n        fn condition(_: void, ch: u8) bool {\n            return switch (ch) {\n                '0' ... '9' =\u003e true,\n                else =\u003e false,\n            };\n        }\n    }.condition);\n\n    // ResultType: i32\n    const number = num_char.repeat(allocator, .{ .min_count = 1 }).transform(i32, {}, struct {\n        fn parseInt(_: void, value: []u8) !i32 {\n            return try std.fmt.parseInt(i32, value, 10);\n        }\n    }.parseInt);\n\n    // ResultType: i32\n    const value = number.orElse(\n        parcom.char('(').rightThen(parcom.deferred(i32, allocator, expression)).leftThen(parcom.char(')'))\n    )\n    .transform(i32, {}, struct {\n        fn getFromEither(_: void, v: parcom.Either(i32, i32)) !i32 {\n            return switch (v) {\n                .left =\u003e v.left,\n                .right =\u003e v.right,\n            };\n        }\n    }.getFromEither);\n\n    // ResultType: parcom.Either(u8, u8)\n    const plus_or_minus = parcom.char('+').orElse(parcom.char('-'));\n\n    // ResultType: struct{ i32, []struct{ parcom.Either(u8, u8), i32 } }\n    const sum = value.andThen(plus_or_minus.andThen(value).repeat(allocator, .{}));\n\n    // ResultType: i32\n    const expr = sum.transform(i32, {}, struct {\n        fn evaluate(_: void, v: struct{ i32, []struct{ parcom.Either(u8, u8), i32 } }) !i32 {\n            var result: i32 = v[0];\n            for (v[1]) |op_and_arg| {\n                switch(op_and_arg[0]) {\n                    .left =\u003e result += op_and_arg[1],\n                    .right =\u003e result -= op_and_arg[1],\n                }\n            }\n            return result;\n        }\n    }.evaluate);\n\n    return expr.taggedAllocated(allocator);\n}\n\ntest \"9-(5+2) == 2\" {\n    var arena = std.heap.ArenaAllocator.init(std.testing.allocator);\n    defer arena.deinit();\n    const parser = try expression(arena.allocator());\n    try std.testing.expectEqual(2, try parser.parseString(\"9-(5+2)\"));\n}\n```\n  \n\u003c/details\u003e\n\n### Cutting the input\n\nIn some cases it is reasonable not to consume the entire input to the string, and\ninstead parse it on-the-fly. For such cases, the `Parcom` library provides the\n`parseFromReader` method, which takes a `std.io.AnyReader` as the input. During the\nparsing, all consumed bytes are stored in an internal buffer to make it possible\nto rollback the input and try another parser (such as with the `orElse` combinator).\nWhile this approach may lead to the same result as reading the whole input to the string,\nrollback may not make sense for some parsers. For example, when parsing JSON,\nencountering the '{' symbol means the entire JObject must be parsed. If parsing\ncannot proceed, it indicates that the input is malformed, and all parsers will\nfailed. It means, that the input can be cropped right before the '{' symbol.\n\nIn the example above can be reasonable to cut the input when the left brace is\nparsed:\n```zig\n...\nconst value = number.orElse(\n    parcom.char('(').cut().rightThen(parcom.deferred(i32, allocator, expression)).leftThen(parcom.char(')'))\n//         added this ^\n)\n...\n```\n\nCropping the input, when possible, can significantly reduce required memory and\nmay improve the speed of parsing. See [this example](examples/json.zig) for more details.\n\n### Debug\n\nWhen something is going wrong during the parsing, and a correct at first glance\nparser returns null, it can be difficult to understand the root cause without\nadditional insights. In `Parcom` you can turn on logging for any particular\nparser to see how it works during the parsing. For example, let's turn on\nlogging for the expression parser from the example above (with added `cut`\ncombinator)\n```zig\n...\n    return expr.logged(.{ .label = \"EXPR\", .scope = .example }).taggedAllocated(allocator);\n}\n```\nand run it on a string with unexpected symbol '!':\n```zig\ntest \"parse unexpected symbol\" {\n    // don't forget to turn on debug level for the test\n    std.testing.log_level = .debug;\n    var arena = std.heap.ArenaAllocator.init(std.testing.allocator);\n    defer arena.deinit();\n    const parser = try expression(arena.allocator());\n    try std.testing.expectEqual(2, try parser.parseString(\"9-(!5+2)\"));\n}\n```\nNow, we have enough insights to understand what happened and where it occurred:\n```\nerror: 'expression.test.parse unexpected symbol' failed: [example] (debug):\nThe parsing by the \u003cEXPR\u003e has been started from position 0:\n[9]-(!5+2)\n[example] (debug):\nThe parsing by the \u003cEXPR\u003e has been started from position 3:\n…[!]5+2)\n[example] (debug): The parsing is failed at position 3:\n…[!]5+2)\n[example] (debug): End parsing by the \u003cEXPR\u003e. Cut 3 items during the parsing process.\n[parcom] (warn): Imposible to reset the input from 3 to 2 at position 3:\n…[!]5+2).\n[example] (debug): An error error.ResetImposible occured on parsing by \u003cEXPR\u003e at position 3:\n…[!]5+2)\n[example] (debug): End parsing by the \u003cEXPR\u003e. Cut 3 items during the parsing process.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdokwork%2Fparcom","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdokwork%2Fparcom","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdokwork%2Fparcom/lists"}