{"id":13671871,"url":"https://github.com/olson-sean-k/wax","last_synced_at":"2025-04-05T20:01:50.984Z","repository":{"id":38891282,"uuid":"398440351","full_name":"olson-sean-k/wax","owner":"olson-sean-k","description":"Opinionated and portable globs that can be matched against paths and directory trees.","archived":false,"fork":false,"pushed_at":"2024-04-01T19:04:03.000Z","size":444,"stargazers_count":118,"open_issues_count":17,"forks_count":10,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-28T02:48:26.265Z","etag":null,"topics":["file-system","glob","pattern","rust"],"latest_commit_sha":null,"homepage":"https://glob.guide","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/olson-sean-k.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-21T01:38:47.000Z","updated_at":"2025-03-25T14:19:49.000Z","dependencies_parsed_at":"2024-02-14T23:25:14.053Z","dependency_job_id":"a770f87a-f6dc-4835-b6e0-ca8c0156db4a","html_url":"https://github.com/olson-sean-k/wax","commit_stats":{"total_commits":127,"total_committers":3,"mean_commits":"42.333333333333336","dds":0.03149606299212604,"last_synced_commit":"800917d6e40b0d2b9066ceda1671ea8c57504d99"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olson-sean-k%2Fwax","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olson-sean-k%2Fwax/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olson-sean-k%2Fwax/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olson-sean-k%2Fwax/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/olson-sean-k","download_url":"https://codeload.github.com/olson-sean-k/wax/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247393545,"owners_count":20931811,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["file-system","glob","pattern","rust"],"created_at":"2024-08-02T09:01:20.689Z","updated_at":"2025-04-05T20:01:50.966Z","avatar_url":"https://github.com/olson-sean-k.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n    \u003cimg alt=\"Wax\" src=\"https://raw.githubusercontent.com/olson-sean-k/wax/master/doc/wax.svg?sanitize=true\" width=\"320\"/\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\n**Wax** is a Rust library that provides opinionated and portable globs that can\nbe matched against file paths and directory trees. Globs use a familiar syntax\nand support expressive features with semantics that emphasize component\nboundaries.\n\n[![GitHub](https://img.shields.io/badge/GitHub-olson--sean--k/wax-8da0cb?logo=github\u0026style=for-the-badge)](https://github.com/olson-sean-k/wax)\n[![docs.rs](https://img.shields.io/badge/docs.rs-wax-66c2a5?logo=rust\u0026style=for-the-badge)](https://docs.rs/wax)\n[![crates.io](https://img.shields.io/crates/v/wax.svg?logo=rust\u0026style=for-the-badge)](https://crates.io/crates/wax)\n\n## Basic Usage\n\nMatch a path against a glob:\n\n```rust\nuse wax::{Glob, Program};\n\nlet glob = Glob::new(\"*.png\").unwrap();\nassert!(glob.is_match(\"logo.png\"));\n```\n\nMatch a path against a glob with matched text (captures):\n\n```rust\nuse wax::{CandidatePath, Glob, Program};\n\nlet glob = Glob::new(\"**/{*.{go,rs}}\").unwrap();\n\nlet path = CandidatePath::from(\"src/main.go\");\nlet matched = glob.matched(\u0026path).unwrap();\n\nassert_eq!(\"main.go\", matched.get(2).unwrap());\n```\n\nMatch a directory tree against a glob:\n\n```rust\nuse wax::Glob;\n\nlet glob = Glob::new(\"**/*.{md,txt}\").unwrap();\nfor entry in glob.walk(\"doc\") {\n    let entry = entry.unwrap();\n    // ...\n}\n```\n\nMatch a directory tree against a glob with negations:\n\n```rust\nuse wax::walk::{FileIterator, LinkBehavior};\nuse wax::Glob;\n\nlet glob = Glob::new(\"**/*.{md,txt}\").unwrap();\nfor entry in glob\n    .walk_with_behavior(\"doc\", LinkBehavior::ReadTarget)\n    .not(\"**/secret/**\")\n    .unwrap()\n{\n    let entry = entry.unwrap();\n    // ...\n}\n```\n\nMatch a path against multiple globs:\n\n```rust\nuse wax::{Glob, Program};\n\nlet any = wax::any([\n    \"src/**/*.rs\",\n    \"tests/**/*.rs\",\n    \"doc/**/*.md\",\n    \"pkg/**/PKGBUILD\",\n]).unwrap();\nassert!(any.is_match(\"src/token/mod.rs\"));\n```\n\nSee more details below.\n\n## Construction\n\nGlobs are encoded as UTF-8 strings called glob expressions that resemble Unix\npaths consisting of nominal components delimited by separators. The most\nfundamental type in the Wax API is `Glob`, which is constructed from a glob\nexpression via inherent functions or standard conversion traits. Data is\nborrowed where possible in most APIs, but can be copied into owned instances\nusing an `into_owned` method with most types.\n\n```rust\nuse wax::Glob;\n\nlet glob = Glob::new(\"site/img/logo.svg\").unwrap();\n```\n\nNot only are APIs designed for portability, **but so too are glob expressions**.\nRegardless of platform or operating system, globs support the same features and\nuse the same syntax. **Glob expressions are distinct from paths**, which [differ\nin syntax and features](#schemes-and-prefixes) on each platform.\n\nIn glob expressions, forward slash `/` is the only path component separator and\nback slashes `\\` are forbidden (back slash is used for escape sequences, but the\nliteral sequence `\\\\` is not supported). This means that it is impossible to\nrepresent `\\` in nominal path components, but this character is generally\nforbidden as such and its disuse avoids confusion.\n\nGlobs enforce various rules regarding meta-characters, patterns, and component\nboundaries that reject [nonsense expressions](#errors-and-diagnostics). While\nthese rules can sometimes make glob expressions a bit more difficult to compose,\nthey also make glob expressions more consistent, easier to reason about, and\nless prone to errors.\n\n## Patterns\n\nGlobs resemble Unix paths, but additionally support patterns that can be matched\nagainst paths and directory trees. Patterns use a syntax that resembles globbing\nin Unix shells and tools like `git`, though there are some important\ndifferences.\n\n```rust\nuse wax::Glob;\n\nlet glob = Glob::new(\"**/*.{go,rs}\").unwrap();\nassert!(glob.is_match(\"src/lib.rs\"));\n```\n\nPatterns form captures that can be used to extract matched text (as seen in many\nregular expression engines). In the above example, there are three patterns that\ncan be queried for matched text: `**/`, `*`, and `{go,rs}`. Every glob\nexpression has an implicit capture for the complete matched text.\n\nGlobs use a consistent and opinionated format and patterns are **not**\nconfigurable; the semantics of a particular glob are always the same. For\nexample, `*` **never** matches across component boundaries. Components are an\nimportant part of paths and file system trees, and only the tree wildcard `**`\n(see below) implicitly matches across them.\n\n### Wildcards\n\nWildcards match some amount of arbitrary text in paths and are the most\nfundamental pattern provided by globs (and likely the most familiar).\n\nThe zero-or-more wildcards `*` and `$` match zero or more of any character\nwithin a component (**never path separators**). Zero-or-more wildcards cannot be\nadjacent to other zero-or-more wildcards. The `*` wildcard is eager and will\nmatch the longest possible text while the `$` wildcard is lazy and will match\nthe shortest possible text. When followed by a literal, `*` stops at the last\noccurrence of that literal while `$` stops at the first occurence.\n\nThe exactly-one wildcard `?` matches any single character within a component\n(**never path separators**). Exactly-one wildcards do not group automatically,\nso a pattern of contiguous wildcards such as `???` form distinct captures for\neach `?` wildcard. [An alternation](#alternations) can be used to group\nexactly-one wildcards into a single capture, such as `{???}`.\n\nThe tree wildcard `**` matches any characters across zero or more components.\n**This is the only pattern that implicitly matches across arbitrary component\nboundaries**; all other patterns do **not** implicitly match across component\nboundaries. When a tree wildcard participates in a match and does not terminate\nthe pattern, its captured text includes the trailing separator. If a tree\nwildcard does not participate in a match, then its captured text is an empty\nstring.\n\nTree wildcards must be delimited by forward slashes or terminations (the\nbeginning and/or end of an expression). **Tree wildcards and path separators are\ndistinct** and any adjacent forward slashes that form a tree wildcard are parsed\ntogether. Rooting forward slashes in tree wildcards are meaningful and the glob\nexpressions `**/*.txt` and `/**/*.txt` differ in that the former is relative\n(has no root) and the latter has a root.\n\nIf a glob expression consists solely of a tree wildcard, then it matches any and\nall paths and the complete contents of any and all directory trees, including\nthe root.\n\n### Character Classes\n\nCharacter classes match any single character from a group of literals and ranges\nwithin a component (**never path separators**). Classes are delimited by square\nbrackets `[...]`. Individual character literals are specified as is, such as\n`[ab]` to match either `a` or `b`. Character ranges are formed from two\ncharacters separated by a hyphen, such as `[x-z]` to match `x`, `y`, or `z`.\nCharacter classes match characters exactly and are always case-sensitive, so the\nexpressions `[ab]` and `{a,b}` are not necessarily the same.\n\nAny number of character literals and ranges can be used within a single\ncharacter class. For example, `[qa-cX-Z]` matches any of `q`, `a`, `b`, `c`,\n`X`, `Y`, or `Z`.\n\nCharacter classes may be negated by including an exclamation mark `!` at the\nbeginning of the class pattern. For example, `[!a]` matches any character except\nfor `a`. **These are the only patterns that support negation.**\n\nIt is possible to escape meta-characters like `*`, `$`, etc., using character\nclasses though globs also support escaping via a backslash `\\`. To match the\ncontrol characters `[`, `]`, and `-` within a character class, they must be\nescaped via a backslash, such as `[a\\-]` to match `a` or `-`.\n\nCharacter classes have notable platform-specific behavior, because they match\narbitrary characters in native paths but never match path separators. This means\nthat if a character class consists of **only** path separators on a given\nplatform, then the character class is considered empty and matches nothing. For\nexample, in the expression `a[/]b` the character class `[/]` matches nothing on\nUnix and Windows. Such character classes are not rejected, because the role of\narbitrary characters depends on the platform. In practice, this is rarely a\nconcern, but **such patterns should be avoided**.\n\nCharacter classes have limited utility on their own, but compose well with\n[repetitions](#repetitions).\n\n### Alternations\n\nAlternations match an arbitrary sequence of one or more comma separated\nsub-globs delimited by curly braces `{...,...}`. For example, `{a?c,x?z,foo}`\nmatches any of the alternative globs `a?c`, `x?z`, or `foo`. Alternations may be\narbitrarily nested and composed with [repetitions](#repetitions).\n\nAlternations form a single capture group regardless of the contents of their\nsub-globs. This capture is formed from the complete match of the sub-glob, so if\nthe alternation `{a?c,x?z}` matches the path `abc`, then the captured text will\nbe `abc` (**not** `b`). Alternations can be used to group captures using a\nsingle sub-glob, such as `{*.{go,rs}}` to capture an entire file name with a\nparticular extension or `{???}` to group a sequence of exactly-one wildcards.\n\nAlternations must consider adjacency rules and neighboring patterns. For\nexample, `*{a,b*}` is allowed but `*{a,*b}` is not. Additionally, they may not\ncontain a sub-glob consisting of a singular tree wildcard `**` and cannot root a\nglob expression as this could cause the expression to match or walk overlapping\ntrees.\n\n### Repetitions\n\nRepetitions match a sub-glob a specified number of times. Repetitions are\ndelimited by angle brackets with a separating colon `\u003c...:...\u003e` where a sub-glob\nprecedes the colon and an optional bounds specification follows it. For example,\n`\u003ca*/:0,\u003e` matches the sub-glob `a*/` zero or more times. Though not implicit\nlike tree [wildcards](#wildcards), **repetitions can match across component\nboundaries** (and can themselves include tree wildcards). Repetitions may be\narbitrarily nested and composed with [alternations](#alternations).\n\nBound specifications are formed from inclusive lower and upper bounds separated\nby a comma `,`, such as `:1,4` to match between one and four times. The upper\nbound is optional and may be omitted. For example, `:1,` matches one or more\ntimes (note the trailing comma `,`). A singular bound is convergent, so `:3`\nmatches exactly three times (both the lower and upper bounds are three). If no\nlower or upper bound is specified, then the sub-glob matches one or more times,\nso `\u003ca:\u003e` and `\u003ca:1,\u003e` are equivalent. Similarly, if the colon `:` is also\nomitted, then the sub-glob matches zero or more times, so `\u003ca\u003e` and `\u003ca:0,\u003e` are\nequivalent.\n\nRepetitions form a singular capture group regardless of the contents of their\nsub-glob. The capture is formed from the complete match of the sub-glob. If the\nrepetition `\u003cabc/\u003e` matches `abc/abc/`, then the captured text will be\n`abc/abc/`.\n\nRepetitions compose well with [character classes](#character-classes). Most\noften, a glob expression like `{????}` is sufficient, but the more specific\nexpression `\u003c[0-9]:4\u003e` further constrains the matched characters to digits, for\nexample. Repetitions may also be more terse, such as `\u003c?:8\u003e`. Furthermore,\nrepetitions can form tree expressions that further constrain components, such as\n`\u003c[!.]*/\u003e[!.]*` to match paths that contain no leading dots `.` in any\ncomponent.\n\nRepetitions must consider adjacency rules and neighboring patterns. For example,\n`a/\u003cb/**:1,\u003e` is allowed but `\u003ca/**:1,\u003e/b` is not. Additionally, they may not\ncontain a sub-glob consisting of a singular separator `/`, a singular\nzero-or-more wildcard `*` or `$`, nor a singular tree wildcard `**`. Repetitions\nwith a lower bound of zero may not root a glob expression, as this could cause\nthe expression to match or walk overlapping trees.\n\n## Combinators\n\nGlob patterns can be combined and matched together using the `any` combinator.\n`any` accepts an `IntoIterator` of `Pattern`s, such as compiled `Program`s like\n`Glob` or pattern text like `str` slices. The output is an `Any`, which\nimplements `Program` and efficiently matches any of its input patterns.\n\n```rust\nuse wax::{Glob, Program};\n\nlet any = wax::any([\"**/*.txt\", \"src/**/*.rs\"]).unwrap();\nassert!(any.is_match(\"src/lib.rs\"));\n```\n\n`Any` and the `any` combinator can be used anywhere a `Pattern` or `Program`\ncan.\n\n```rust\nuse wax::walk::FileIterator;\nuse wax::Glob;\n\nlet glob = Glob::new(\"**/*.{md,rs,toml,txt,yaml,yml}\").unwrap();\nfor entry in glob\n    .walk(\"projects\")\n    .not(wax::any([\n        \"**/{.git,.github,target}/**\",\n        \"**/{lib,main,mod}.rs\",\n    ]))\n    .unwrap()\n{\n    let entry = entry.unwrap();\n    // ...\n}\n```\n\nUnlike `Glob`, an `Any` cannot be matched against a directory tree (as with\n`Glob::walk`). However, `Any` supports features that alternations do not and\n`Glob`s cannot represent, such as overlapping trees.\n\n## Flags and Case Sensitivity\n\nFlags toggle the matching behavior of globs. Importantly, flags are a part of a\nglob expression rather than an API. Behaviors are toggled immediately following\nflags in the order in which they appear in glob expressions. Flags are delimited\nby parenthesis with a leading question mark `(?...)` and may appear anywhere\nwithin a glob expression so long as they do not split tree wildcards (e.g.,\n`a/*(?i)*` is not allowed). Each flag is represented by a single character and\ncan be negated by preceding the corresponding character with a minus `-`. Flags\nare toggled in the order in which they appear within `(?...)`.\n\nThe only supported flag is the case-insensitivty flag `i`. By default, glob\nexpressions use the same case sensitivity as the target platforms's file system\nAPIs (case-sensitive on Unix and case-insensitive on Windows), but `i` can be\nused to toggle this explicitly as needed. For example,\n`(?-i)photos/**/*.(?i){jpg,jpeg}` matches file paths beneath a `photos`\ndirectory with a case-**sensitive** base and a case-**insensitive** extension\n`jpg` or `jpeg`.\n\nWax considers literals, their configured case sensitivity, and the case\nsensitivity of the target platform's file system APIs [when partitioning glob\nexpressions](#partitioning-and-semantic-literals) with `Glob::partition`.\nPartitioning is unaffected in glob expressions with no flags.\n\n## Errors and Diagnostics\n\nThe `GlobError` type represents error conditions that can occur when building a\npattern or walking a directory tree. `GlobError` and its sub-errors implement\nthe standard `Error` and `Display` traits via [`thiserror`][thiserror].\n\nWax optionally integrates with the [`miette`][miette] crate, which can be used\nto capture and display diagnostics. This can be useful for reporting errors to\nusers that provide glob expressions. When enabled, error types implement the\n`Diagnostic` trait.\n\n```\nError: wax::glob::adjacent_zero_or_more\n\n  x malformed glob expression: adjacent zero-or-more wildcards `*` or `$`\n   ,----\n 1 | doc/**/*{.md,.tex,*.txt}\n   :        |^^^^^^^^|^^^^^^^\n   :        |        | `-- here\n   :        |        `-- in this alternation\n   :        `-- here\n   `----\n```\n\nWax also provides inspection APIs that allow code to query glob metadata, such\nas captures and variance.\n\n```rust\nuse wax::Glob;\n\nlet glob = Glob::new(\"videos/**/{*.{mp4,webm}}\").unwrap();\nassert_eq!(2, glob.captures().count());\n```\n\n## Cargo Features\n\nWax provides some optional integrations and features that can be toggled via\nthe Cargo features described below.\n\n| Feature  | Default | Dependencies       | Description                                                                   |\n|----------|---------|--------------------|-------------------------------------------------------------------------------|\n| `miette` | No      | `miette`, `tardar` | Integrates with `miette` and provides `Diagnostic` error types and reporting. |\n| `walk`   | Yes     | `walkdir`          | Provides APIs for matching globs against directory trees.                     |\n\nFeatures can be configured in a crate's `Cargo.toml` manifest.\n\n```toml\n[dependency.wax]\nversion = \"^0.x.0\"\ndefault-features = false\nfeatures = [\n    \"miette\",\n    \"walk\"\n]\n```\n\n## Unsupported Path Features\n\nAny components not recognized as separators nor patterns are interpreted as\nliterals. In combination with strict rules, this means **some platform-specific\npath features cannot be used directly in globs**. This limitation is by design\nand additional code may be necessary to bridge this gap for some use cases.\n\n### Partitioning and Semantic Literals\n\nGlobs support no notion of a current or parent directory. The path components\n`.` and `..` are interpreted as literals and only match paths with the\ncorresponding components (even on Unix and Windows). For example, the glob\n`src/../*.rs` matches the path `src/../lib.rs` but does **not** match the\nsemantically equivalent path `lib.rs`.\n\nParent directory components have unclear meaning and far less utility when they\nfollow patterns in a glob. However, such components are intuitive and are often\nimportant for escaping a working directory when they precede variant patterns\n(i.e., as a prefix). For example, the glob `../src/**/*.rs` has more obvious\nintended meaning than the glob `src/**/../*.rs`. As seen above though, the first\nglob would only match the literal path component `..` and not paths that replace\nthis with a parent directory.\n\n`Glob::partition` can be used to isolate semantic components that precede\npatterns and apply semantic path operations to them (namely `..`).\n`Glob::partition` partitions a glob into an invariant `PathBuf` prefix and a\nvariant `Glob` postfix. Here, invariant means that the partition contains no\nglob patterns that resolve differently than an equivalent native path using the\ntarget platform's file system APIs. The prefix can be used as needed in\ncombination with the glob.\n\n```rust\nuse dunce; // Avoids UNC paths on Windows.\nuse std::path::Path;\nuse wax::{Glob, Program};\n\nlet path: \u0026Path = /* ... */ // Candidate path.\n\nlet directory = Path::new(\".\"); // Working directory.\nlet (prefix, glob) = Glob::new(\"../../src/**\").unwrap().partition();\nlet prefix = dunce::canonicalize(directory.join(\u0026prefix)).unwrap();\nif dunce::canonicalize(path)\n    .unwrap()\n    .strip_prefix(\u0026prefix)\n    .map(|path| glob.is_match(path))\n    .unwrap_or(false)\n{\n    // ...\n}\n```\n\nAdditionally, `Glob::has_semantic_literals` can be used to detect literal\ncomponents in a glob that have special semantics on the target platform. When\nthe `miette` feature is enabled, such literals are reported as warnings.\n\n```rust\nuse wax::Glob;\n\nlet glob = Glob::new(\"../**/src/**/main.rs\").unwrap();\nassert!(glob.has_semantic_literals());\n```\n\n### Schemes and Prefixes\n\nWhile globs can be rooted, they cannot include schemes nor Windows path\nprefixes. For example, the Windows UNC share path `\\\\server\\share\\src` cannot be\nrepresented directly as a glob.\n\nThis can be limiting, but the design of Wax explicitly forbids this: Windows\nprefixes and other volume components are not portable. Instead, when this is\nneeded, an additional native path or working directory must be used, such as\n[the `--tree` option provided by Nym][nym]. In most contexts, globs are applied\nrelative to some such working directory.\n\n### Non-nominal Constraints\n\nGlobs are strictly nominal and do not support any non-nominal constraints. It is\nnot possible to directly filter or otherwise select paths or files based on\nadditional metadata (such as a modification timestamp) in a glob expression.\nHowever, it is possible for user code to query any such metadata for a matching\npath or effeciently apply such filtering when matching directory trees using\n`FileIterator::filter_tree`.\n\nFor such additional features, including metadata filters and transformations\nusing matched text, see [Nym][nym].\n\n### Encoding\n\nGlobs operate exclusively on UTF-8 encoded text. However, this encoding is not\nused for paths on all platforms. Wax uses the `CandidatePath` type to re-encode\nnative paths via lossy conversions that use Unicode replacement codepoints\nwhenever a part of a path cannot be represented as valid UTF-8. In practice,\nmost paths can be losslessly encoded in UTF-8, but this means that Wax cannot\nmatch nor capture some literal byte strings.\n\n## Stability\n\nAt the time of writing, Wax is experimental and unstable. It is possible that\nglob expression syntax and semantics may change between versions in the `0.y.z`\nseries without warning nor deprecation.\n\n[miette]: https://github.com/zkat/miette\n[nym]: https://github.com/olson-sean-k/nym\n[thiserror]: https://github.com/dtolnay/thiserror\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Folson-sean-k%2Fwax","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Folson-sean-k%2Fwax","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Folson-sean-k%2Fwax/lists"}