{"id":13503007,"url":"https://github.com/matklad/tom","last_synced_at":"2025-04-12T13:53:00.801Z","repository":{"id":31342770,"uuid":"127601597","full_name":"matklad/tom","owner":"matklad","description":"tom: a format-preserving TOML parser in Rust","archived":false,"fork":false,"pushed_at":"2022-06-17T01:36:50.000Z","size":1294,"stargazers_count":38,"open_issues_count":27,"forks_count":2,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-26T08:37:15.095Z","etag":null,"topics":["parser","toml"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/matklad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-04-01T05:54:31.000Z","updated_at":"2023-10-10T18:03:28.000Z","dependencies_parsed_at":"2022-08-24T14:21:30.306Z","dependency_job_id":null,"html_url":"https://github.com/matklad/tom","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matklad%2Ftom","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matklad%2Ftom/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matklad%2Ftom/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matklad%2Ftom/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/matklad","download_url":"https://codeload.github.com/matklad/tom/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248358634,"owners_count":21090407,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["parser","toml"],"created_at":"2024-07-31T22:02:33.274Z","updated_at":"2025-04-12T13:53:00.781Z","avatar_url":"https://github.com/matklad.png","language":"Rust","funding_links":[],"categories":["Rust","TOML"],"sub_categories":["PHP - Docblock Parser"],"readme":"# TOM\n\n[![CI](https://github.com/matklad/tom/workflows/CI/badge.svg)](https://github.com/matklad/tom/actions)\n[![Crates.io](https://img.shields.io/crates/v/tom.svg)](https://crates.io/crates/tom)\n[![API reference](https://docs.rs/tom/badge.svg)](https://docs.rs/tom/)\n\n\n**Status** a rewrite to the [rowan](https://github.com/rust-analyzer/rowan) library is in progress.\nNothing works, but we have a language server in the `crates/tom` dir now!\nThe docs below may be outdated!\n\nYet another TOML parser. Preserves whitespace for real this time!\n\nWork in progress, take a look at\n[Molten](https://github.com/LeopoldArkham/Molten) or\n[toml-edit](https://github.com/ordian/toml_edit) for something\nrelatively more ready.\n\nThe best documentation at the moment is\n[./crates/tom_syntax/examples/api-walkthrough.rs](./crates/tom/examples/api-walkthrough.rs).\n\nThere's a WASM demo of the parser here: [https://matklad.github.io/tom/](https://matklad.github.io/tom/).\n\n\n# Contributing\n\nContributions are very much welcome! Keep in mind that the code is\nvery much in experimental state, and so good contributing guides are\nmissing, formatting is artisan, etc. Feel free to ask questions by\ncreating issues/PRs, or by pinging @matklad at the\n[rust-analyzer zulip](https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Fwg-rls-2.2E0).\n\nCheckout [E-easy](https://github.com/matklad/tom/issues?q=is%3Aopen+is%3Aissue+label%3AE-easy)\nand [E-has-instructions](https://github.com/matklad/tom/issues?q=is%3Aopen+is%3Aissue+label%3AE-has-instructions) labels.\n\n\n# Architecture\n\n## Building the Code\n\nCurrently, beta version of Rust is required.\n\n`cargo test`, as usual, runs the tests.\n\nCode-generation is used heavily:\n\n  * `cargo xtask gen-symbols` generates the `symbol` module,\n  * `cargo xtask gen-ast` generates the `ast` module,\n  * `cargo xtask gen-tests` generates tests from special comments.\n\nThe generated code is committed: this way, clients of the library don't\nneed to build the code-generator, which has a lot of dependencies.\n\nSee `.cargo/config` file and the `xtask` subdirectories to understand how\ncodegen works.\n\n## Data Structures Walkthrough\n\nThe entry point of the library is the `TomlDoc` type.\n\nThe core data structure is `tree::Tree`, a generic mutable arena/index\nbased tree. The design is inspired by\n[indextree](https://github.com/saschagrunert/indextree). Indices allow\nto store parent links, and a flexible editing API: you can mutate the\ntree without invalidating existing node indices and without running\ninto borrow-checker errors. The price for this flexibility is that\nclients have to pass `\u0026Tree` or `\u0026mut Tree` to every method of a node,\nbecause a node is just a 32-bit index and all the actual data are\nstored in the `Tree`. This beauty/quirk of the API bleeds to all\nhigher-level layers.\n\nOn top of the `tree::Tree` a Concrete Syntax Tree data structure is\nbuild (see the `cst` module). Each node in the CST has a `Symbol`,\nwhich is a \"type\" of the node: is it a key-value pair, or a table, or\na whole document. There are about 30 different symbols in TOML, see\nthe generated `symbol` module for the whole list. Additionally, each\nleaf node, including comments and whitespace, contains a `\u0026str` with a\ntext (string interning is used to avoid allocating each token\nseparately). Thus, it is possible to reconstruct the text of each CST\nnode exactly by recursively walking its children and concatenating\ntexts of the leaves.\n\nThe CST is stored as a part of `TomlDoc`, which also contains the list\nof syntax errors, and the cache of text ranges of nodes. Ranges are\nrecalculated by recursively walking the tree and summing lengths of\nthe leaves.\n\nTwo smaller \"data structures\" are the `intern::Intern` string interner\nand the `chunked_text::ChunkedText` trait. The latter allows\nprocessing the text of internal nodes without materializing it into a\nsingle continuous `String`.\n\n## Parser\n\nParsing is not too unusual: regular expressions based lexer +\nhand-written recursive descent. The lexer is in `parser/lexer.rs`, the\nparser is in `parser/grammar.rs`.\n\nHowever, both parser and lexer do not abort on error and *always*\nparse the document to the end. A special `ERROR` CST node is created\nfor those parts of inputs which can't be recognized as TOML.\n\nParsing and the actual tree construction are decoupled via the\n`EventSink`. The parser notifies the sink when it starts/finishes\nreading a particular node, and the sink takes care of actually\nconstructing the tree. `EventSink` also takes care of whitespace and\ncomment handing. The CST for `foo = 92 #comment` would include\n`#comment` token as a child of `foo = 92` key-value, based on the\nsame-line heuristic (see `EventSink::trailing_ws`).\n\nThe grammar in `parser/grammar.rs` is interspersed with `// test`\ncomments. These comments help to map grammar's code to the TOML\nsyntax, and they are real regression tests as well: `cargo xtask gen-test`\ncollects all such comments and dumps them as test-cases to\n`tests/data/inline`. Additional parser/lexer tests are found in\n`tests/data/**`. Each tests is a pair of `.toml` file and a `.txt`\nfile with serialized CST representation.\n\nParser detects only strictly syntactical errors. Problems like \"no\nnewlines are allowed in inline tables\" are detected by an additional\nvalidation pass over the CST. See `validator` for details.\n\n## AST\n\nAST is layered on top of the CST: each AST node is just a CST node\nwhich remembers, at the type level, node's `Symbol`. As with `CST`,\nyou'll need to pass `\u0026TomlDoc` as an argument to get anything useful.\n\nAST lives in the `ast` module, which is generated by the `cargo\ngen-ast` command.\n\n## Editing\n\nThe underlying `tree::Tree` is mutable and document-editing API builds\non that. It is specified in the `edit.rs` file and is more-or less\njust a wrapper of the corresponding `tree::Tree` API.\n\nOne interesting bit is that to create a completely new node, we just\nparse it from text. That way, arbitrary comments and whitespace are\nsupported.\n\nBecause edits can create intermediate invalid documents, an edit\noperation has to be explicitly delimited (`start/finish _edit`).\n\n## License\n\nTom is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).\n\nSee LICENSE-APACHE and LICENSE-MIT for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatklad%2Ftom","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmatklad%2Ftom","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatklad%2Ftom/lists"}