{"id":18968975,"url":"https://github.com/finnbear/rustrict","last_synced_at":"2025-04-05T06:09:57.990Z","repository":{"id":57665783,"uuid":"405729061","full_name":"finnbear/rustrict","owner":"finnbear","description":"rustrict is a profanity filter for Rust","archived":false,"fork":false,"pushed_at":"2024-03-16T02:43:33.000Z","size":1211,"stargazers_count":83,"open_issues_count":5,"forks_count":9,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-05-01T16:45:05.855Z","etag":null,"topics":["crate","profanity-check","profanity-detection","profanity-filter","rust","rust-lang","rust-library"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/rustrict","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/finnbear.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE-MIT","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["finnbear"]}},"created_at":"2021-09-12T19:02:24.000Z","updated_at":"2024-05-04T17:42:13.067Z","dependencies_parsed_at":"2023-01-30T19:45:47.059Z","dependency_job_id":"0aaff5e4-85d2-4c73-a4f6-2d8398c72ba3","html_url":"https://github.com/finnbear/rustrict","commit_stats":{"total_commits":179,"total_committers":3,"mean_commits":"59.666666666666664","dds":"0.027932960893854775","last_synced_commit":"9de4f72eb96de43410c9ee14dba3328bada5361a"},"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finnbear%2Frustrict","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finnbear%2Frustrict/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finnbear%2Frustrict/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finnbear%2Frustrict/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/finnbear","download_url":"https://codeload.github.com/finnbear/rustrict/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247294541,"owners_count":20915340,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crate","profanity-check","profanity-detection","profanity-filter","rust","rust-lang","rust-library"],"created_at":"2024-11-08T14:48:58.505Z","updated_at":"2025-04-05T06:09:57.971Z","avatar_url":"https://github.com/finnbear.png","language":"Rust","funding_links":["https://github.com/sponsors/finnbear"],"categories":[],"sub_categories":[],"readme":"# rustrict\n\n[![Documentation](https://docs.rs/rustrict/badge.svg)](https://docs.rs/rustrict)\n[![crates.io](https://img.shields.io/crates/v/rustrict.svg)](https://crates.io/crates/rustrict)\n[![Build](https://github.com/finnbear/rustrict/actions/workflows/build.yml/badge.svg)](https://github.com/finnbear/rustrict/actions/workflows/build.yml) \n[![Test Page](https://img.shields.io/badge/Test-page-green)](https://finnbear.github.io/rustrict/)\n\n\n`rustrict` is a profanity filter for Rust.\n\n\u003csup\u003eDisclaimer: Multiple source files (`.txt`, `.csv`, `.rs` test cases) contain profanity. Viewer discretion is advised.\u003c/sup\u003e\n\n## Features\n\n- Multiple types (profane, offensive, sexual, mean, spam)\n- Multiple levels (mild, moderate, severe)\n- Resistant to evasion\n  - Alternative spellings (like \"fck\")\n  - Repeated characters (like \"craaaap\")\n  - Confusable characters (like 'ᑭ', '𝕡', and '🅿')\n  - Spacing (like \"c r_a-p\")\n  - Accents (like \"pÓöp\")\n  - Bidirectional Unicode ([related reading](https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html))\n  - Self-censoring (like \"f*ck\")\n  - Safe phrase list for known bad actors]\n  - Censors invalid Unicode characters\n  - Battle-tested in [Mk48.io](https://mk48.io)\n- Resistant to false positives\n  - One word (like \"**ass**assin\")\n  - Two words (like \"pu**sh it**\")\n- Flexible\n  - Censor and/or analyze\n  - Input `\u0026str` or `Iterator\u003cItem = char\u003e`\n  - Can track per-user state with `context` feature\n  - Can add words with the `customize` feature\n  - Accurately reports the width of Unicode via the `width` feature\n  - Plenty of options\n- Performant\n  - O(n) analysis and censoring\n  - No `regex` (uses custom trie)\n  - 3 MB/s in `release` mode\n  - 100 KB/s in `debug` mode\n\n## Limitations\n\n- Mostly English/emoji\n- Censoring removes most diacritics (accents)\n- Does not detect right-to-left profanity while analyzing, so...\n- Censoring forces Unicode to be left-to-right\n- Doesn't understand context\n- Not resistant to false positives affecting profanities added at runtime\n\n## Usage\n\n### Strings (`\u0026str`)\n```rust\nuse rustrict::CensorStr;\n\nlet censored: String = \"hello crap\".censor();\nlet inappropriate: bool = \"f u c k\".is_inappropriate();\n\nassert_eq!(censored, \"hello c***\");\nassert!(inappropriate);\n```\n\n### Iterators (`Iterator\u003cType = char\u003e`)\n\n```rust\nuse rustrict::CensorIter;\n\nlet censored: String = \"hello crap\".chars().censor().collect();\n\nassert_eq!(censored, \"hello c***\");\n```\n\n### Advanced\n\nBy constructing a `Censor`, one can avoid scanning text multiple times to get a censored `String` and/or\nanswer multiple `is` queries. This also opens up more customization options (defaults are below).\n\n```rust\nuse rustrict::{Censor, Type};\n\nlet (censored, analysis) = Censor::from_str(\"123 Crap\")\n    .with_censor_threshold(Type::INAPPROPRIATE)\n    .with_censor_first_character_threshold(Type::OFFENSIVE \u0026 Type::SEVERE)\n    .with_ignore_false_positives(false)\n    .with_ignore_self_censoring(false)\n    .with_censor_replacement('*')\n    .censor_and_analyze();\n\nassert_eq!(censored, \"123 C***\");\nassert!(analysis.is(Type::INAPPROPRIATE));\nassert!(analysis.isnt(Type::PROFANE \u0026 Type::SEVERE | Type::SEXUAL));\n```\n\nIf you cannot afford to let anything slip though, or have reason to believe a particular user\nis trying to evade the filter, you can check if their input matches a [short list of safe strings](src/safe.txt):\n\n```rust\nuse rustrict::{CensorStr, Type};\n\n// Figure out if a user is trying to evade the filter.\nassert!(\"pron\".is(Type::EVASIVE));\nassert!(\"porn\".isnt(Type::EVASIVE));\n\n// Only let safe messages through.\nassert!(\"Hello there!\".is(Type::SAFE));\nassert!(\"nice work.\".is(Type::SAFE));\nassert!(\"yes\".is(Type::SAFE));\nassert!(\"NVM\".is(Type::SAFE));\nassert!(\"gtg\".is(Type::SAFE));\nassert!(\"not a common phrase\".isnt(Type::SAFE));\n```\n\nIf you want to add custom profanities or safe words, enable the `customize` feature.\n\n```rust\n#[cfg(feature = \"customize\")]\n{\n    use rustrict::{add_word, CensorStr, Type};\n\n    // You must take care not to call these when the crate is being\n    // used in any other way (to avoid concurrent mutation).\n    unsafe {\n        add_word(\"reallyreallybadword\", (Type::PROFANE \u0026 Type::SEVERE) | Type::MEAN);\n        add_word(\"mybrandname\", Type::SAFE);\n    }\n    \n    assert!(\"Reallllllyreallllllybaaaadword\".is(Type::PROFANE));\n    assert!(\"MyBrandName\".is(Type::SAFE));\n}\n```\n\nIf your use-case is chat moderation, and you store data on a per-user basis, you can use `rustrict::Context` as a reference implementation:\n\n```rust\n#[cfg(feature = \"context\")]\n{\n    use rustrict::{BlockReason, Context};\n    use std::time::Duration;\n    \n    pub struct User {\n        context: Context,\n    }\n    \n    let mut bob = User {\n        context: Context::default()\n    };\n    \n    // Ok messages go right through.\n    assert_eq!(bob.context.process(String::from(\"hello\")), Ok(String::from(\"hello\")));\n    \n    // Bad words are censored.\n    assert_eq!(bob.context.process(String::from(\"crap\")), Ok(String::from(\"c***\")));\n\n    // Can take user reports (After many reports or inappropriate messages,\n    // will only let known safe messages through.)\n    for _ in 0..5 {\n        bob.context.report();\n    }\n   \n    // If many bad words are used or reports are made, the first letter of\n    // future bad words starts getting censored too.\n    assert_eq!(bob.context.process(String::from(\"crap\")), Ok(String::from(\"****\")));\n    \n    // Can manually mute.\n    bob.context.mute_for(Duration::from_secs(2));\n    assert!(matches!(bob.context.process(String::from(\"anything\")), Err(BlockReason::Muted(_))));\n}\n```\n\n## Comparison\n\nTo compare filters, the first 100,000 items of [this list](https://raw.githubusercontent.com/vzhou842/profanity-check/master/profanity_check/data/clean_data.csv)\nis used as a dataset. Positive accuracy is the percentage of profanity detected as profanity. Negative accuracy is the percentage of clean text detected as clean.\n\n| Crate | Accuracy | Positive Accuracy | Negative Accuracy | Time |\n|-------|----------|-------------------|-------------------|------|\n| [rustrict](https://crates.io/crates/rustrict) | 80.00%   | 93.98%            | 76.52%            | 9s   |\n| [censor](https://crates.io/crates/censor) | 76.16%   | 72.76%            | 77.01%            | 23s  |\n| [stfu](https://crates.io/crates/stfu) | 91.74% | 77.69% | 95.25% | 45s |\n| [profane-rs](https://crates.io/crates/profane-rs) | 80.47% | 73.79% | 82.14% | 52s |\n\n## Development\n\n[![Build](https://github.com/finnbear/rustrict/actions/workflows/build.yml/badge.svg?branch=master)](https://github.com/finnbear/rustrict/actions/workflows/build.yml)\n\nIf you make an adjustment that would affect false positives, such as adding profanity,\nyou will need to run `false_positive_finder`:\n1. Run `make downloads` to download the required word lists and dictionaries\n2. Run `make false_positives` to automatically find false positives\n\nIf you modify `replacements_extra.csv`, run `make replacements` to rebuild `replacements.csv`.\n\nFinally, run `make test` for a full test or `make test_debug` for a fast test.\n\n## License\n\nLicensed under either of\n\n * Apache License, Version 2.0\n   ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)\n * MIT license\n   ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)\n\nat your option.\n\n## Contribution\n\nUnless you explicitly state otherwise, any contribution intentionally submitted\nfor inclusion in the work by you, as defined in the Apache-2.0 license, shall be\ndual licensed as above, without any additional terms or conditions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinnbear%2Frustrict","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffinnbear%2Frustrict","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinnbear%2Frustrict/lists"}