{"id":18709105,"url":"https://github.com/jojoee/leo-profanity","last_synced_at":"2025-04-04T11:07:55.736Z","repository":{"id":41274784,"uuid":"83965329","full_name":"jojoee/leo-profanity","owner":"jojoee","description":":tiger: Profanity filter, based on \"Shutterstock\" dictionary","archived":false,"fork":false,"pushed_at":"2025-02-12T12:35:43.000Z","size":1582,"stargazers_count":56,"open_issues_count":0,"forks_count":13,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-28T10:02:35.571Z","etag":null,"topics":["bad","curse","dirty","obscene","profanity","swear"],"latest_commit_sha":null,"homepage":"https://jojoee.github.io/leo-profanity/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jojoee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-05T11:35:19.000Z","updated_at":"2025-02-12T12:35:47.000Z","dependencies_parsed_at":"2024-11-07T12:32:19.850Z","dependency_job_id":"20a95b8f-bb77-49f7-a64c-f57953540d38","html_url":"https://github.com/jojoee/leo-profanity","commit_stats":{"total_commits":138,"total_committers":11,"mean_commits":"12.545454545454545","dds":"0.14492753623188404","last_synced_commit":"958c999d470888d0f08025ef4d691d8b6466a81f"},"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jojoee%2Fleo-profanity","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jojoee%2Fleo-profanity/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jojoee%2Fleo-profanity/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jojoee%2Fleo-profanity/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jojoee","download_url":"https://codeload.github.com/jojoee/leo-profanity/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247166144,"owners_count":20894652,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bad","curse","dirty","obscene","profanity","swear"],"created_at":"2024-11-07T12:26:18.484Z","updated_at":"2025-04-04T11:07:55.719Z","avatar_url":"https://github.com/jojoee.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# leo-profanity\n\n![continuous integration](https://github.com/jojoee/leo-profanity/workflows/continuous%20integration/badge.svg?branch=master)\n![release](https://github.com/jojoee/leo-profanity/workflows/release/badge.svg?branch=master)\n![runnable](https://github.com/jojoee/leo-profanity/workflows/runnable/badge.svg?branch=master)\n![runnable old node](https://github.com/jojoee/leo-profanity/workflows/runnable%20old%20node/badge.svg?branch=master)\n![runnable without optional dependencies](https://github.com/jojoee/leo-profanity/workflows/runnable%20without%20optional%20dependencies/badge.svg?branch=master)\n[![Codecov](https://img.shields.io/codecov/c/github/jojoee/leo-profanity.svg)](https://codecov.io/github/jojoee/leo-profanity)\n[![Version - npm](https://img.shields.io/npm/v/leo-profanity.svg)](https://www.npmjs.com/package/leo-profanity)\n[![License - npm](https://img.shields.io/npm/l/leo-profanity.svg)](http://opensource.org/licenses/MIT)\n[![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg?style=flat-square)](https://github.com/semantic-release/semantic-release)\n[![Greenkeeper badge](https://badges.greenkeeper.io/jojoee/leo-profanity.svg)](https://greenkeeper.io/)\n[![Mutation testing badge](https://img.shields.io/endpoint?style=flat\u0026url=https%3A%2F%2Fbadge-api.stryker-mutator.io%2Fgithub.com%2Fjojoee%2Fleo-profanity%2Fmaster)](https://dashboard.stryker-mutator.io/reports/github.com/jojoee/leo-profanity/master)\n\nProfanity filter, based on \"Shutterstock\" dictionary. [Demo page](https://jojoee.github.io/leo-profanity/), [API document page](https://jojoee.github.io/leo-profanity/doc/LeoProfanity.html)\n\n## Installation\n\n```\n// npm\nnpm install leo-profanity\nnpm install leo-profanity --no-optional # install only English bad word dictionary\n\n// yarn\nyarn add leo-profanity\nyarn add leo-profanity --ignore-optional # install only English bad word dictionary\n\n// Bower\nbower install leo-profanity\n// dictionary/default.json\n\n// githack\n\u003cscript src=\"https://raw.githack.com/jojoee/bahttext/master/src/index.js\"\u003e\u003c/script\u003e\nconst filter = LeoProfanity\nfilter.clearList()\nfilter.add([\"boobs\", \"butt\"])\n```\n\n## Example usage for npm\n\n```javascript\n// support languages\n// - en\n// - fr\n// - ru\n\nvar filter = require('leo-profanity');\n\n// output: I have ****, etc.\nfilter.clean('I have boob, etc.');\n\n// replace current dictionary with the french\nfilter.loadDictionary('fr');\n\n// create new dictionary\nfilter.addDictionary('th', ['หนึ่ง', 'สอง', 'สาม', 'สี่', 'ห้า'])\n```\n\nSee more here [LeoProfanity - Documentation](https://jojoee.github.io/leo-profanity/doc/LeoProfanity.html)\n\n## Algorithm\n\nThis project decide to split it into 2 parts,  `Sanitize` and `Filter`\nand these below is a interesting algorithms.\n\n### Sanitize\n\n```\nAttempt 1 (1.1): Convert all into lowercase string\nExample:\n- \"SomeThing\" to \"something\"\nAdvantage:\n- Simple to understand\n- Simple to implement\nDisadvantage or Caution:\n- Will ignore \"case sensitive\" word\n\nAttempt 2 (1.2): Turn \"similar-like\" symbol to alphabet\nExample:\n- \"@\" to \"a\"\n- \"5\" or \"$\" to \"s\"\n- \"@ss\" to \"ass\"\n- \"b00b\" to \"boob\"\n- \"a$$a$$in\" to \"assassin\"\nAdvantage:\n- Detect some trick words\nDisadvantage or Caution:\n- False positive\n- Subjective, which depends on each person think about the symbol\n- Limit user imagination (user cannot play with word)\n  e.g. \"joe@ssociallife.com\"\n  e.g. user want to try something funny like \"a$$a$$in\"\n\nAttempt 3 (1.3): Replace \".\" and \",\" with space to separate words\nIn some sentence, people usually using \".\" and \",\" to connect or end the sentence\nExample:\n- \"I like a55,b00b.t1ts\" to \"I like a55 b00b t1ts\"\nAdvantage:\n- Increase founding possibility e.g. \"I like a55,b00b.t1ts\"\nDisadvantage or Caution:\n- Disconnect some words e.g. \"john.doe@gmail.com\"\n```\n\n### Filter\n\n```\nAttempt 1 (2.1): Split into array (or using regex)\nUsing space to split \"word string\" into \"word array\" then check by profanity word list\nExample:\n- \"I like ass boob\" to [\"I\", \"like\", \"ass\", \"boob\"]\nAdvantage:\n- Simple to implement\nDisadvantage:\n- Need proper list of profanity word\n- Some \"false positive\" e.g. Great tit (https://en.wikipedia.org/wiki/Great_tit)\n\nAttempt 2 (2.2): Filter word inside (with or without space)\nDetect all alphabet that contain \"profanity word\"\nExample:\n- \"thistextisfunnyboobsanda55\" which contains suspicious words: \"boobs\", \"a55\"\nAdvantage:\n- Can detect \"un-spaced\" profanity word\nDisadvantage:\n- Many \"false positive\" e.g. http://www.morewords.com/contains/ass/, Clbuttic mistake (filter mistake)\n```\n\n### In Summary\n- We don't know all methods that can produce profanity word\n  (e.g. how many different ways can you enter a55 ?)\n- There have a non-algorithm-based approach to achieve it (yet)\n- People will always find a way to connect with each other\n  (e.g. [Leet](https://en.wikipedia.org/wiki/Leet))\n\n**So, this project decide to go with 1.1, 1.3 and 2.1.**\n\n(note - you can found other attempts in \"Reference\" section)\n\n## CMD\n\n```\nnpm run test.watch\nnpm run validate\nnpm run doc.generate\n\n# test npm publish\nnpm publish --dry-run\n\n# mutation test\nnpm install -g stryker-cli\nstryker init\nexport STRYKER_DASHBOARD_API_KEY=\u003cthe_project_api_token\u003e\necho $STRYKER_DASHBOARD_API_KEY\nnpx stryker run\n```\n\n## Other languages\n- [x] Javascript on [npmjs.com/package/leo-profanity](https://www.npmjs.com/package/leo-profanity)\n- [x] PHP on [packagist.org/packages/jojoee/leo-profanity](https://packagist.org/packages/jojoee/leo-profanity)\n- [x] Python on [pypi.org/project/leoprofanity](https://pypi.org/project/leoprofanity)\n- [ ] Java on [Maven](https://maven.apache.org/)\n- [ ] Wordpress on [wordpress.org](https://wordpress.org/)\n\n## Reference\n- Inspired by [jwils0n/profanity-filter](https://github.com/jwils0n/profanity-filter)\n- Algorithm / Discussion\n  - [\"similar-like\" symbol to alphabet](http://stackoverflow.com/questions/24515/bad-words-filter#answer-24615)\n  - [Replace Bad words using Regex](http://stackoverflow.com/questions/3342011/replace-bad-words-using-regex)\n  - [Clbuttic](http://www.computerhope.com/jargon/c/clbuttic.htm)\n  - [The Clbuttic Mistake](http://thedailywtf.com/articles/The-Clbuttic-Mistake-)\n  - [The Clbuttic Mistake: When obscenity filters go wrong](http://www.telegraph.co.uk/news/newstopics/howaboutthat/2667634/The-Clbuttic-Mistake-When-obscenity-filters-go-wrong.html)\n  - [Obscenity Filters: Bad Idea, or Incredibly Intercoursing Bad Idea?](https://blog.codinghorror.com/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea/)\n  - [How do you implement a good profanity filter?](http://stackoverflow.com/questions/273516/how-do-you-implement-a-good-profanity-filter)\n  - [The Untold History of Toontown’s SpeedChat (or BlockChattm from Disney finally arrives)](http://habitatchronicles.com/2007/03/the-untold-history-of-toontowns-speedchat-or-blockchattm-from-disney-finally-arrives/)\n  - [Profanity Filter Performance in Java](http://softwareengineering.stackexchange.com/questions/91177/profanity-filter-performance-in-java)\n- Resource bad-word list\n  - [Bad words list (458 words) by Alejandro U. Alvarez](https://urbanoalvarez.es/blog/2008/04/04/bad-words-list/)\n  - DansGuardian - [dansguardian.org](http://dansguardian.org/), [DansGuardian Phraselists](http://contentfilter.futuragts.com/phraselists/)\n  - [Seven dirty words](https://en.wikipedia.org/wiki/Seven_dirty_words)\n  - [Shutterstock](https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words)\n  - [MauriceButler/badwords](https://github.com/MauriceButler/badwords)\n  - http://www.cs.cmu.edu/~biglou/resources/bad-words.txt\n- Tool\n  - [RegExr](http://regexr.com/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjojoee%2Fleo-profanity","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjojoee%2Fleo-profanity","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjojoee%2Fleo-profanity/lists"}