{"id":16826289,"url":"https://github.com/burntsushi/bstr","last_synced_at":"2025-10-19T09:24:36.817Z","repository":{"id":41844264,"uuid":"170155321","full_name":"BurntSushi/bstr","owner":"BurntSushi","description":"A string type for Rust that is not required to be valid UTF-8.","archived":false,"fork":false,"pushed_at":"2025-04-08T16:37:05.000Z","size":2485,"stargazers_count":930,"open_issues_count":26,"forks_count":60,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-05-13T00:17:56.437Z","etag":null,"topics":["byte-string","bytes","graphemes","substring","substring-search","unicode","utf-8"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BurntSushi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":["BurntSushi"]}},"created_at":"2019-02-11T15:46:10.000Z","updated_at":"2025-05-10T17:26:29.000Z","dependencies_parsed_at":"2024-06-18T13:58:19.992Z","dependency_job_id":"4c9d0589-4440-4266-b82c-fef11bab600a","html_url":"https://github.com/BurntSushi/bstr","commit_stats":{"total_commits":240,"total_committers":37,"mean_commits":6.486486486486487,"dds":"0.36250000000000004","last_synced_commit":"cbe2c692782d772f7c2fd8be0972bf9b91889a97"},"previous_names":[],"tags_count":47,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BurntSushi%2Fbstr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BurntSushi%2Fbstr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BurntSushi%2Fbstr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BurntSushi%2Fbstr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BurntSushi","download_url":"https://codeload.github.com/BurntSushi/bstr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253843225,"owners_count":21972874,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["byte-string","bytes","graphemes","substring","substring-search","unicode","utf-8"],"created_at":"2024-10-13T11:16:56.690Z","updated_at":"2025-10-19T09:24:36.705Z","avatar_url":"https://github.com/BurntSushi.png","language":"Rust","readme":"bstr\n====\nThis crate provides extension traits for `\u0026[u8]` and `Vec\u003cu8\u003e` that enable\ntheir use as byte strings, where byte strings are _conventionally_ UTF-8. This\ndiffers from the standard library's `String` and `str` types in that they are\nnot required to be valid UTF-8, but may be fully or partially valid UTF-8.\n\n[![Build status](https://github.com/BurntSushi/bstr/workflows/ci/badge.svg)](https://github.com/BurntSushi/bstr/actions)\n[![crates.io](https://img.shields.io/crates/v/bstr.svg)](https://crates.io/crates/bstr)\n\n\n### Documentation\n\nhttps://docs.rs/bstr\n\n\n### When should I use byte strings?\n\nSee this part of the documentation for more details:\n\u003chttps://docs.rs/bstr/1.*/bstr/#when-should-i-use-byte-strings\u003e.\n\nThe short story is that byte strings are useful when it is inconvenient or\nincorrect to require valid UTF-8.\n\n\n### Usage\n\n`cargo add bstr`\n\n### Examples\n\nThe following two examples exhibit both the API features of byte strings and\nthe I/O convenience functions provided for reading line-by-line quickly.\n\nThis first example simply shows how to efficiently iterate over lines in stdin,\nand print out lines containing a particular substring:\n\n```rust\nuse std::{error::Error, io::{self, Write}};\nuse bstr::{ByteSlice, io::BufReadExt};\n\nfn main() -\u003e Result\u003c(), Box\u003cdyn Error\u003e\u003e {\n    let stdin = io::stdin();\n    let mut stdout = io::BufWriter::new(io::stdout());\n\n    stdin.lock().for_byte_line_with_terminator(|line| {\n        if line.contains_str(\"Dimension\") {\n            stdout.write_all(line)?;\n        }\n        Ok(true)\n    })?;\n    Ok(())\n}\n```\n\nThis example shows how to count all of the words (Unicode-aware) in stdin,\nline-by-line:\n\n```rust\nuse std::{error::Error, io};\nuse bstr::{ByteSlice, io::BufReadExt};\n\nfn main() -\u003e Result\u003c(), Box\u003cdyn Error\u003e\u003e {\n    let stdin = io::stdin();\n    let mut words = 0;\n    stdin.lock().for_byte_line_with_terminator(|line| {\n        words += line.words().count();\n        Ok(true)\n    })?;\n    println!(\"{}\", words);\n    Ok(())\n}\n```\n\nThis example shows how to convert a stream on stdin to uppercase without\nperforming UTF-8 validation _and_ amortizing allocation. On standard ASCII\ntext, this is quite a bit faster than what you can (easily) do with standard\nlibrary APIs. (N.B. Any invalid UTF-8 bytes are passed through unchanged.)\n\n```rust\nuse std::{error::Error, io::{self, Write}};\nuse bstr::{ByteSlice, io::BufReadExt};\n\nfn main() -\u003e Result\u003c(), Box\u003cdyn Error\u003e\u003e {\n    let stdin = io::stdin();\n    let mut stdout = io::BufWriter::new(io::stdout());\n\n    let mut upper = vec![];\n    stdin.lock().for_byte_line_with_terminator(|line| {\n        upper.clear();\n        line.to_uppercase_into(\u0026mut upper);\n        stdout.write_all(\u0026upper)?;\n        Ok(true)\n    })?;\n    Ok(())\n}\n```\n\nThis example shows how to extract the first 10 visual characters (as grapheme\nclusters) from each line, where invalid UTF-8 sequences are generally treated\nas a single character and are passed through correctly:\n\n```rust\nuse std::{error::Error, io::{self, Write}};\nuse bstr::{ByteSlice, io::BufReadExt};\n\nfn main() -\u003e Result\u003c(), Box\u003cdyn Error\u003e\u003e {\n    let stdin = io::stdin();\n    let mut stdout = io::BufWriter::new(io::stdout());\n\n    stdin.lock().for_byte_line_with_terminator(|line| {\n        let end = line\n            .grapheme_indices()\n            .map(|(_, end, _)| end)\n            .take(10)\n            .last()\n            .unwrap_or(line.len());\n        stdout.write_all(line[..end].trim_end())?;\n        stdout.write_all(b\"\\n\")?;\n        Ok(true)\n    })?;\n    Ok(())\n}\n```\n\n\n### Cargo features\n\nThis crates comes with a few features that control standard library, serde and\nUnicode support.\n\n* `std` - **Enabled** by default. This provides APIs that require the standard\n  library, such as `Vec\u003cu8\u003e` and `PathBuf`. Enabling this feature also enables\n  the `alloc` feature.\n* `alloc` - **Enabled** by default. This provides APIs that require allocations\n  via the `alloc` crate, such as `Vec\u003cu8\u003e`.\n* `unicode` - **Enabled** by default. This provides APIs that require sizable\n  Unicode data compiled into the binary. This includes, but is not limited to,\n  grapheme/word/sentence segmenters. When this is disabled, basic support such\n  as UTF-8 decoding is still included. Note that currently, enabling this\n  feature also requires enabling the `std` feature. It is expected that this\n  limitation will be lifted at some point.\n* `serde` - Enables implementations of serde traits for `BStr`, and also\n  `BString` when `alloc` is enabled.\n\n\n### Minimum Rust version policy\n\nThis crate's minimum supported `rustc` version (MSRV) is `1.73`.\n\nIn general, this crate will be conservative with respect to the minimum\nsupported version of Rust. MSRV may be bumped in minor version releases.\n\n\n### Future work\n\nSince it is plausible that some of the types in this crate might end up in your\npublic API (e.g., `BStr` and `BString`), we will commit to being very\nconservative with respect to new major version releases. It's difficult to say\nprecisely how conservative, but unless there is a major issue with the `1.0`\nrelease, I wouldn't expect a `2.0` release to come out any sooner than some\nperiod of years.\n\nA large part of the API surface area was taken from the standard library, so\nfrom an API design perspective, a good portion of this crate should be on solid\nground. The main differences from the standard library are in how the various\nsubstring search routines work. The standard library provides generic\ninfrastructure for supporting different types of searches with a single method,\nwhere as this library prefers to define new methods for each type of search and\ndrop the generic infrastructure.\n\nSome _probable_ future considerations for APIs include, but are not limited to:\n\n* Unicode normalization.\n* More sophisticated support for dealing with Unicode case, perhaps by\n  combining the use cases supported by [`caseless`](https://docs.rs/caseless)\n  and [`unicase`](https://docs.rs/unicase).\n\nHere are some examples that are _probably_ out of scope for this crate:\n\n* Regular expressions.\n* Unicode collation.\n\nThe exact scope isn't quite clear, but I expect we can iterate on it.\n\nIn general, as stated below, this crate brings lots of related APIs together\ninto a single crate while simultaneously attempting to keep the total number of\ndependencies low. Indeed, every dependency of `bstr`, except for `memchr`, is\noptional.\n\n\n### High level motivation\n\nStrictly speaking, the `bstr` crate provides very little that can't already be\nachieved with the standard library `Vec\u003cu8\u003e`/`\u0026[u8]` APIs and the ecosystem of\nlibrary crates. For example:\n\n* The standard library's\n  [`Utf8Error`](https://doc.rust-lang.org/std/str/struct.Utf8Error.html) can be\n  used for incremental lossy decoding of `\u0026[u8]`.\n* The\n  [`unicode-segmentation`](https://unicode-rs.github.io/unicode-segmentation/unicode_segmentation/index.html)\n  crate can be used for iterating over graphemes (or words), but is only\n  implemented for `\u0026str` types. One could use `Utf8Error` above to implement\n  grapheme iteration with the same semantics as what `bstr` provides (automatic\n  Unicode replacement codepoint substitution).\n* The [`twoway`](https://docs.rs/twoway) crate can be used for fast substring\n  searching on `\u0026[u8]`.\n\nSo why create `bstr`? Part of the point of the `bstr` crate is to provide a\nuniform API of coupled components instead of relying on users to piece together\nloosely coupled components from the crate ecosystem. For example, if you wanted\nto perform a search and replace in a `Vec\u003cu8\u003e`, then writing the code to do\nthat with the `twoway` crate is not that difficult, but it's still additional\nglue code you have to write. This work adds up depending on what you're doing.\nConsider, for example, trimming and splitting, along with their different\nvariants.\n\nIn other words, `bstr` is partially a way of pushing back against the\nmicro-crate ecosystem that appears to be evolving. Namely, it is a goal of\n`bstr` to keep its dependency list lightweight. For example, `serde` is an\noptional dependency because there is no feasible alternative. In service of\nthis philosophy, currently, the only required dependency of `bstr` is `memchr`.\n\n\n### License\n\nThis project is licensed under either of\n\n * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or\n   https://www.apache.org/licenses/LICENSE-2.0)\n * MIT license ([LICENSE-MIT](LICENSE-MIT) or\n   https://opensource.org/licenses/MIT)\n\nat your option.\n\nThe data in `src/unicode/data/` is licensed under the Unicode License Agreement\n([LICENSE-UNICODE](https://www.unicode.org/copyright.html#License)), although\nthis data is only used in tests.\n","funding_links":["https://github.com/sponsors/BurntSushi"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fburntsushi%2Fbstr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fburntsushi%2Fbstr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fburntsushi%2Fbstr/lists"}