{"id":30136484,"url":"https://github.com/dk26/soft-canonicalize-rs","last_synced_at":"2026-04-19T00:18:48.515Z","repository":{"id":305139376,"uuid":"1022071408","full_name":"DK26/soft-canonicalize-rs","owner":"DK26","description":"Path canonicalization that works with non-existing paths.","archived":false,"fork":false,"pushed_at":"2026-04-18T22:14:55.000Z","size":642,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-18T23:30:42.705Z","etag":null,"topics":["canonicalization","canonicalize","cross-platform","filesystem","non-existing-paths","path-canonicalization","path-manipulation","realpath","rust","rust-crate","security","symlink-resolution","utilities"],"latest_commit_sha":null,"homepage":"https://docs.rs/soft-canonicalize","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DK26.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"docs/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-07-18T12:03:00.000Z","updated_at":"2026-04-18T22:10:59.000Z","dependencies_parsed_at":"2025-08-11T22:06:51.328Z","dependency_job_id":"d559af9f-22bc-4aa7-907c-67788260b380","html_url":"https://github.com/DK26/soft-canonicalize-rs","commit_stats":null,"previous_names":["dk26/soft-canonicalize-rs"],"tags_count":34,"template":false,"template_full_name":null,"purl":"pkg:github/DK26/soft-canonicalize-rs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DK26%2Fsoft-canonicalize-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DK26%2Fsoft-canonicalize-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DK26%2Fsoft-canonicalize-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DK26%2Fsoft-canonicalize-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DK26","download_url":"https://codeload.github.com/DK26/soft-canonicalize-rs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DK26%2Fsoft-canonicalize-rs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31989341,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T20:23:30.271Z","status":"ssl_error","status_checked_at":"2026-04-18T20:23:29.375Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["canonicalization","canonicalize","cross-platform","filesystem","non-existing-paths","path-canonicalization","path-manipulation","realpath","rust","rust-crate","security","symlink-resolution","utilities"],"created_at":"2025-08-10T23:09:08.647Z","updated_at":"2026-04-19T00:18:48.487Z","avatar_url":"https://github.com/DK26.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# soft-canonicalize\r\n\r\n[![Crates.io](https://img.shields.io/crates/v/soft-canonicalize.svg)](https://crates.io/crates/soft-canonicalize)\r\n[![License](https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-blue.svg)](LICENSE-MIT)\r\n[![Documentation](https://docs.rs/soft-canonicalize/badge.svg)](https://docs.rs/soft-canonicalize)\r\n[![CI](https://github.com/DK26/soft-canonicalize-rs/actions/workflows/ci.yml/badge.svg)](https://github.com/DK26/soft-canonicalize-rs/actions)\r\n[![Security audit](https://github.com/DK26/soft-canonicalize-rs/actions/workflows/audit.yml/badge.svg)](https://github.com/DK26/soft-canonicalize-rs/actions/workflows/audit.yml)\r\n[![MSRV](https://img.shields.io/badge/MSRV-1.70.0-blue.svg)](https://blog.rust-lang.org/2023/06/01/Rust-1.70.0.html)\r\n\r\n**Path canonicalization that works with non-existing paths.**\r\n\r\nRust implementation inspired by Python 3.6+ `pathlib.Path.resolve(strict=False)`, providing the same functionality as `std::fs::canonicalize` (Rust's equivalent to Unix `realpath()`) but extended to handle non-existing paths, with optional features for simplified Windows output (`dunce`) and virtual filesystem semantics (`anchored`).\r\n\r\n## Why Use This?\r\n\r\n**🚀 Works with non-existing paths** - Plan file locations before creating them  \r\n**⚡ Fast** - Mixed workload median performance: Windows ~1.8x (13,840 paths/s), Linux ~3.0x (379,119 paths/s) faster than Python's pathlib (see [benchmark methodology](benches/README.md) for 5-run protocol and environment details)  \r\n**✅ Compatible** - 100% behavioral match with `std::fs::canonicalize` for existing paths, with optional UNC simplification via `dunce` feature (Windows)  \r\n**🎯 Virtual filesystem support** - Optional `anchored` feature for bounded canonicalization within directory boundaries  \r\n**🔒 Robust** - 500+ comprehensive tests including symlink cycle protection, malicious stream validation, and edge case handling  \r\n**🛡️ Safe traversal** - Proper `..` and symlink resolution with cycle detection  \r\n**🌍 Cross-platform** - Windows, macOS, Linux with comprehensive UNC/symlink handling  \r\n**💾 Exotic filesystem support** - Works on RAM disks, network drives, Docker volumes ([rust-lang/rust#45067](https://github.com/rust-lang/rust/issues/45067), [#48249](https://github.com/rust-lang/rust/issues/48249))  \r\n**🔧 Zero dependencies** - Optional features may add minimal dependencies\r\n\r\n## Lexical vs. Filesystem-Based Resolution\r\n\r\nPath resolution libraries fall into two categories:\r\n\r\n**Lexical Resolution** (no I/O):\r\n- **Performance**: Fast - no filesystem access\r\n- **Accuracy**: Incorrect if symlinks are present (doesn't resolve them)\r\n- **Use when**: You're 100% certain no symlinks exist and need maximum performance\r\n- **Examples**: `std::path::absolute`, `normpath::normalize`\r\n\r\n**Filesystem-Based Resolution** (performs I/O):\r\n- **Performance**: Slower - requires filesystem syscalls to resolve symlinks\r\n- **Accuracy**: Correct - follows symlinks to their targets\r\n- **Use when**: Safety is priority over performance, or symlinks may be present\r\n- **Examples**: `std::fs::canonicalize`, `soft_canonicalize`, `dunce::canonicalize`\r\n\r\n**Rule of thumb**: If you cannot guarantee symlinks won't be introduced, or if correctness is critical, use filesystem-based resolution.\r\n\r\n## Use Cases\r\n\r\n### Path Comparison\r\n\r\n- **Equality**: Determine if two different path strings point to the same location\r\n- **Containment**: Check if one path is inside another directory\r\n\r\n### Common Applications\r\n\r\n- **Build Systems**: Resolve output paths during build planning before directories exist\r\n- **Configuration Validation**: Ensure user-provided paths stay within allowed boundaries\r\n- **Deduplication**: Detect when different path strings refer to the same planned location\r\n- **Cross-Platform Normalization**: Handle Windows UNC paths and symlinks consistently\r\n\r\n## Quick Start\r\n\r\n### Cargo.toml\r\n```toml\r\n[dependencies]\r\nsoft-canonicalize = \"0.5\"\r\n```\r\n\r\n### Code Example\r\n\r\n```rust\r\nuse soft_canonicalize::soft_canonicalize;\r\n\r\nlet non_existing_path = r\"C:\\Users\\user\\documents\\..\\non\\existing\\config.json\";\r\n\r\n// Using Rust's own std canonicalize function:\r\nlet result = std::fs::canonicalize(non_existing_path);\r\nassert!(result.is_err());\r\n\r\n// Using our crate's function:\r\nlet result = soft_canonicalize(non_existing_path);\r\nassert!(result.is_ok());\r\n\r\n// Shows the UNC path conversion and path normalization\r\nassert_eq!(\r\n    result.unwrap().to_string_lossy(),\r\n    r\"\\\\?\\C:\\Users\\user\\non\\existing\\config.json\"\r\n);\r\n\r\n// With `dunce` feature enabled, paths are simplified when safe\r\n// assert_eq!(\r\n//     result.unwrap().to_string_lossy(),\r\n//     r\"C:\\Users\\user\\non\\existing\\config.json\"\r\n// );\r\n```\r\n\r\n## Optional Features\r\n\r\n- **`anchored`** - Virtual filesystem/bounded canonicalization (cross-platform)\r\n- **`dunce`** - Simplified Windows path output (Windows-only target-conditional dependency)\r\n\r\n### Anchored Canonicalization (`anchored` feature)\r\n\r\nFor **correct symlink resolution within virtual/constrained directory spaces**, use `anchored_canonicalize`. This function implements true virtual filesystem semantics by clamping ALL paths (including absolute symlink targets) to the anchor directory:\r\n\r\n```toml\r\n[dependencies]\r\nsoft-canonicalize = { version = \"0.5\", features = [\"anchored\"] }\r\n```\r\n\r\n```rust\r\nuse soft_canonicalize::anchored_canonicalize;\r\nuse std::fs;\r\n\r\n// Set up an anchor/root directory (no need to pre-canonicalize)\r\nlet anchor = std::env::temp_dir().join(\"workspace_root\");\r\nfs::create_dir_all(\u0026anchor)?;\r\n\r\n// Canonicalize paths relative to the anchor (anchor is soft-canonicalized internally)\r\nlet resolved_path = anchored_canonicalize(\u0026anchor, \"../../../etc/passwd\")?;\r\n// Result: /tmp/workspace_root/etc/passwd (lexical .. clamped to anchor)\r\n\r\n// Absolute symlinks are also clamped to the anchor\r\n// If there's a symlink: workspace_root/config -\u003e /etc/config\r\n// It resolves to: workspace_root/etc/config (clamped to anchor)\r\nlet symlink_path = anchored_canonicalize(\u0026anchor, \"config\")?;\r\n// Safe: always stays within workspace_root, even if symlink points to /etc/config\r\n```\r\n\r\nKey features of `anchored_canonicalize`:\r\n- **Virtual filesystem semantics**: All absolute paths (including symlink targets) are clamped to anchor\r\n- **Anchor-relative canonicalization**: Resolves paths relative to a specific anchor directory\r\n- **Complete symlink clamping**: Follows symlink chains with clamping at each step\r\n- **Component-by-component**: Processes path components in proper order\r\n- **Absolute results**: Always returns absolute canonical paths within the anchor boundary\r\n\r\n**For a complete multi-tenant security example**, see:\r\n```bash\r\ncargo run --example virtual_filesystem_demo --features anchored\r\n```\r\n\r\n### Simplified Path Output (`dunce` feature, Windows-only)\r\n\r\nBy default on Windows, `soft_canonicalize` returns paths in extended-length UNC format (`\\\\?\\C:\\foo`) for maximum robustness and compatibility with long paths, reserved names, and other Windows filesystem edge cases.\r\n\r\nIf you need simplified paths (`C:\\foo`) for compatibility with legacy Windows applications or user-facing output, enable the **`dunce` feature**:\r\n\r\n```toml\r\n[dependencies]\r\nsoft-canonicalize = { version = \"0.5\", features = [\"dunce\"] }\r\n```\r\n\r\n**Example:**\r\n\r\n```rust\r\nuse soft_canonicalize::soft_canonicalize;\r\n\r\nlet path = soft_canonicalize(r\"C:\\Users\\user\\documents\\..\\config.json\")?;\r\n\r\n// Without dunce feature (default):\r\n// Returns: \\\\?\\C:\\Users\\user\\config.json (extended-length UNC)\r\n\r\n// With dunce feature enabled:\r\n// Returns: C:\\Users\\user\\config.json (simplified when safe)\r\n```\r\n\r\n**When to use:**\r\n- ✅ Legacy applications that don't support UNC paths\r\n- ✅ User-facing output requiring familiar path format\r\n- ✅ Tools expecting traditional Windows path format\r\n\r\n**How it works:**\r\nThe [dunce](https://crates.io/crates/dunce) crate intelligently simplifies Windows UNC paths (`\\\\?\\C:\\foo` → `C:\\foo`) **only when safe**:\r\n- Automatically keeps UNC for paths \u003e260 chars\r\n- Automatically keeps UNC for reserved names (CON, PRN, NUL, COM1-9, LPT1-9)\r\n- Automatically keeps UNC for paths with trailing spaces/dots\r\n- Automatically keeps UNC for paths containing `..` (literal interpretation)\r\n\r\n## When Paths Must Exist: `proc-canonicalize`\r\n\r\nSince v0.5.0, `soft_canonicalize` uses [`proc-canonicalize`](https://crates.io/crates/proc-canonicalize) by default for existing-path canonicalization instead of `std::fs::canonicalize`. This fixes a critical issue with Linux namespace boundaries.\r\n\r\n### The Problem with `std::fs::canonicalize`\r\n\r\nOn Linux, `std::fs::canonicalize` resolves \"magic symlinks\" like `/proc/PID/root` to their targets:\r\n\r\n```rust\r\n// std::fs::canonicalize follows magic symlinks incorrectly\r\nlet path = std::fs::canonicalize(\"/proc/1/root\")?; // Returns \"/\" (wrong!)\r\n// This loses the namespace boundary - dangerous for container tooling\r\n```\r\n\r\n### The Solution\r\n\r\n`proc-canonicalize` preserves namespace boundaries:\r\n\r\n```rust\r\nuse proc_canonicalize::canonicalize;\r\n\r\nlet path = canonicalize(\"/proc/1/root\")?; // Returns \"/proc/1/root\" (correct!)\r\n// Namespace boundary is preserved\r\n```\r\n\r\n### When to Use Which\r\n\r\n| Use Case                      | Function                          | Reason                     |\r\n| ----------------------------- | --------------------------------- | -------------------------- |\r\n| Paths that may not exist      | `soft_canonicalize`               | Handles non-existing paths |\r\n| Existing paths (general)      | `proc_canonicalize::canonicalize` | Correct namespace handling |\r\n| Existing paths (std behavior) | `std::fs::canonicalize`           | Legacy compatibility only  |\r\n\r\n**Recommendation**: If you need to canonicalize paths that **must exist** (and would previously use `std::fs::canonicalize`), use `proc_canonicalize::canonicalize` for correct Linux namespace handling:\r\n\r\n```toml\r\n[dependencies]\r\nproc-canonicalize = \"0.0\"\r\n```\r\n\r\n## Comparison with Alternatives\r\n\r\n### Feature Comparison\r\n\r\n| Feature                          | `soft_canonicalize`           | `normpath` | `proc_canonicalize` | `std::fs::canonicalize` | `std::path::absolute` | `dunce::canonicalize` |\r\n| -------------------------------- | ----------------------------- | ---------- | ------------------- | ----------------------- | --------------------- | --------------------- |\r\n| Resolution type                  | Filesystem-based              | Lexical    | Filesystem-based    | Filesystem-based        | Lexical               | Filesystem-based      |\r\n| Works with non-existing paths    | ✅                             | ✅          | ❌                   | ❌                       | ✅                     | ❌                     |\r\n| Resolves symlinks                | ✅                             | ❌          | ✅                   | ✅                       | ❌                     | ✅                     |\r\n| RAM disk / network drive support | ✅ (graceful fallback)         | ✅ (no I/O) | ❌                   | ❌                       | ✅                     | ❌                     |\r\n| Preserves Linux namespaces       | ✅ (default)                   | N/A        | ✅                   | ❌                       | N/A                   | ❌                     |\r\n| Simplified Windows paths         | ✅ (opt-in `dunce` feature)    | ✅ (opt-in) | ✅ (opt-in)          | ❌ (UNC)                 | ❌ (varies)            | ✅                     |\r\n| Virtual/bounded canonicalization | ✅ (opt-in `anchored` feature) | ❌          | ❌                   | ❌                       | ❌                     | ❌                     |\r\n| Zero dependencies                | ✅ (default)                   | ✅          | ✅                   | ✅                       | ✅                     | ✅                     |\r\n\r\n### When to Use Each\r\n\r\n**Choose `soft_canonicalize` when:**\r\n- ✅ You need `std::fs::canonicalize` behavior for paths that don't exist yet\r\n- ✅ Planning file locations before creating them (build systems, config generation)\r\n- ✅ You want virtual filesystem/bounded canonicalization (with `anchored` feature)\r\n- ✅ You need simplified Windows paths for legacy apps (with `dunce` feature)\r\n\r\n**Choose alternatives when:**\r\n- **`normpath::normalize`** - Maximum performance needed, you can guarantee no symlinks exist, and you want pure lexical normalization (no I/O). Note: lacks symlink resolution, so not suitable when security against symlink-based attacks is required\r\n- **`proc_canonicalize::canonicalize`** - All paths exist and you need correct Linux namespace handling (recommended over `std::fs::canonicalize`)\r\n- **`std::fs::canonicalize`** - All paths exist; only when you specifically need the legacy behavior that resolves `/proc/PID/root` to `/`\r\n- **`std::path::absolute`** - You only need absolute paths without symlink resolution (lexical, fast)\r\n- **`dunce::canonicalize`** - Windows-only, all paths exist, just need UNC simplification\r\n- **`path_absolutize`** - Absolute path resolution without symlink following, with CWD caching optimizations\r\n\r\n## Related Projects\r\n\r\n- **[strict-path](https://crates.io/crates/strict-path)** - Type-safe path restriction with compile-time guarantees. Uses `soft-canonicalize` internally for path validation and boundary enforcement.\r\n\r\n## Security \u0026 CVE Coverage\r\n\r\nSecurity does not depend on enabling features. The core API is secure-by-default; the optional `anchored` feature is a convenience for virtual roots. We test all modes (no features; `--features anchored`; `--features anchored,dunce`).\r\n\r\n**Built-in protections include:**\r\n- **NTFS Alternate Data Stream (ADS) validation** - Blocks malicious stream placements and traversal attempts\r\n- **Symlink cycle detection** - Bounded depth tracking prevents infinite loops\r\n- **Path traversal clamping** - Never ascends past root/share/device boundaries\r\n- **Null byte rejection** - Early validation prevents injection attacks\r\n- **UNC/device semantics** - Preserves Windows extended-length and device namespace integrity\r\n- **TOCTOU race resistance** - Tested against time-of-check-time-of-use attacks\r\n\r\nSee [`docs/SECURITY.md`](docs/SECURITY.md) for detailed analysis, attack scenarios, and test references.\r\n\r\n## Known Limitations\r\n\r\n### Windows Short Filename Equivalence\r\n\r\nOn Windows, the filesystem may generate short filenames (8.3 format) for long directory names. For **non-existing paths**, this library cannot determine if a short filename form (e.g., `PROGRA~1`) and its corresponding long form (e.g., `Program Files`) refer to the same future location:\r\n\r\n```rust\r\nuse soft_canonicalize::soft_canonicalize;\r\n\r\n// These non-existing paths are treated as different (correctly)\r\nlet short_form = soft_canonicalize(\"C:/PROGRA~1/MyApp/config.json\")?;\r\nlet long_form = soft_canonicalize(\"C:/Program Files/MyApp/config.json\")?;\r\n\r\n// They will NOT be equal because we cannot determine equivalence\r\n// without filesystem existence\r\nassert_ne!(short_form, long_form);\r\n```\r\n\r\n**This is a fundamental limitation** shared by Python's `pathlib.Path.resolve(strict=False)` and other path canonicalization libraries across languages. Short filename mapping only exists when files/directories are actually created by the filesystem.\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\r\n\r\n## License\r\n\r\nLicensed under either of:\r\n\r\n- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))\r\n- MIT license ([LICENSE-MIT](LICENSE-MIT))\r\n\r\n## Changelog\r\n\r\nSee [CHANGELOG.md](CHANGELOG.md) for a detailed history of changes.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdk26%2Fsoft-canonicalize-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdk26%2Fsoft-canonicalize-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdk26%2Fsoft-canonicalize-rs/lists"}