https://github.com/snguyenthanh/better_profanity

Blazingly fast cleaning swear words (and their leetspeak) in strings
https://github.com/snguyenthanh/better_profanity

censor censored-words censorship leetspeak profanity python words

Last synced: over 1 year ago
JSON representation

Blazingly fast cleaning swear words (and their leetspeak) in strings

Host: GitHub
URL: https://github.com/snguyenthanh/better_profanity
Owner: snguyenthanh
License: mit
Created: 2018-11-11T12:35:33.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2024-05-23T08:27:31.000Z (about 2 years ago)
Last Synced: 2024-10-28T17:19:12.580Z (over 1 year ago)
Topics: censor, censored-words, censorship, leetspeak, profanity, python, words
Language: Python
Homepage:
Size: 352 KB
Stars: 211
Watchers: 5
Forks: 70
Open Issues: 29
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

awesome-blazingly-fast - better_profanity - Blazingly fast cleaning swear words (and their leetspeak) in strings (Python)

README

          # better_profanity

*Blazingly fast cleaning swear words (and their leetspeak) in strings*

[![release](https://img.shields.io/badge/dynamic/json.svg?label=release&url=https%3A%2F%2Fpypi.org%2Fpypi%2Fbetter-profanity%2Fjson&query=%24.info.version&colorB=blue)](https://github.com/snguyenthanh/better_profanity/releases/latest)

[![Build Status](https://travis-ci.com/snguyenthanh/better_profanity.svg?branch=master)](https://travis-ci.com/snguyenthanh/better_profanity)

![python](https://img.shields.io/badge/python-3-blue.svg)

[![license](https://img.shields.io/github/license/mashape/apistatus.svg?style=popout)](https://github.com/snguyenthanh/better_profanity/blob/master/LICENSE)

> Currently there is a [performance issue with the latest version (0.7.0)](https://github.com/snguyenthanh/better_profanity/issues/19). It is recommended to use the [last stable version 0.6.1](https://pypi.org/project/better-profanity/0.6.1/).

Inspired from package [profanity](https://github.com/ben174/profanity) of [Ben Friedland](https://github.com/ben174), this library is significantly faster than the original one, by using string comparison instead of regex.

It supports [modified spellings](https://en.wikipedia.org/wiki/Leet) (such as `p0rn`, `h4NDjob`, `handj0b` and `b*tCh`).

## Requirements

This package works with `Python 3.5+` and `PyPy3`.

## Installation

```sh

pip3 install better_profanity

```

## Unicode characters

Only Unicode characters from categories `Ll`, `Lu`, `Mc` and `Mn` are added. More on Unicode categories can be found [here][unicode category link].

[unicode category link]: https://en.wikipedia.org/wiki/Template:General_Category_(Unicode)

Not all languages are supported yet, such as *Chinese*.

## Usage

```python

from better_profanity import profanity

if __name__ == "__main__":

    profanity.load_censor_words()

    text = "You p1ec3 of sHit."

    censored_text = profanity.censor(text)

    print(censored_text)

    # You **** of ****.

```

All modified spellings of words in [profanity_wordlist.txt](./better_profanity/profanity_wordlist.txt) will be generated. For example, the word `handjob` would be loaded into:

```python

'handjob', 'handj*b', 'handj0b', 'handj@b', 'h@ndjob', 'h@ndj*b', 'h@ndj0b', 'h@ndj@b',

'h*ndjob', 'h*ndj*b', 'h*ndj0b', 'h*ndj@b', 'h4ndjob', 'h4ndj*b', 'h4ndj0b', 'h4ndj@b'

```

The full mapping of the library can be found in [profanity.py](./better_profanity/better_profanity.py#L18-L26).

### 1. Censor swear words from a text

By default, `profanity` replaces each swear words with 4 asterisks `****`.

```python

from better_profanity import profanity

if __name__ == "__main__":

    text = "You p1ec3 of sHit."

    censored_text = profanity.censor(text)

    print(censored_text)

    # You **** of ****.

```

### 2. Censor doesn't care about word dividers

The function `.censor()` also hide words separated not just by an empty space ` ` but also other dividers, such as `_`, `,` and `.`. Except for `@, $, *, ", '`.

```python

from better_profanity import profanity

if __name__ == "__main__":

    text = "...sh1t...hello_cat_fuck,,,,123"

    censored_text = profanity.censor(text)

    print(censored_text)

    # "...****...hello_cat_****,,,,123"

```

### 3. Censor swear words with custom character

4 instances of the character in second parameter in `.censor()` will be used to replace the swear words.

```python

from better_profanity import profanity

if __name__ == "__main__":

    text = "You p1ec3 of sHit."

    censored_text = profanity.censor(text, '-')

    print(censored_text)

    # You ---- of ----.

```

### 4. Check if the string contains any swear words

Function `.contains_profanity()` return `True` if any words in the given string has a word existing in the wordlist.

```python

from better_profanity import profanity

if __name__ == "__main__":

    dirty_text = "That l3sbi4n did a very good H4ndjob."

    profanity.contains_profanity(dirty_text)

    # True

```

### 5. Censor swear words with a custom wordlist

#### 5.1. Wordlist as a `List`

Function `load_censor_words` takes a `List` of strings as censored words.

The provided list will replace the default wordlist.

```python

from better_profanity import profanity

if __name__ == "__main__":

    custom_badwords = ['happy', 'jolly', 'merry']

    profanity.load_censor_words(custom_badwords)

    print(profanity.contains_profanity("Have a merry day! :)"))

    # Have a **** day! :)

```

#### 5.2. Wordlist as a file

Function `load_censor_words_from_file takes a filename, which is a text file and each word is separated by lines.

```python

from better_profanity import profanity

if __name__ == "__main__":

    profanity.load_censor_words_from_file('/path/to/my/project/my_wordlist.txt')

```

### 6. Whitelist

Function `load_censor_words` and `load_censor_words_from_file` takes a keyword argument `whitelist_words` to ignore words in a wordlist.

It is best used when there are only a few words that you would like to ignore in the wordlist.

```python

# Use the default wordlist

profanity.load_censor_words(whitelist_words=['happy', 'merry'])

# or with your custom words as a List

custom_badwords = ['happy', 'jolly', 'merry']

profanity.load_censor_words(custom_badwords, whitelist_words=['merry'])

# or with your custom words as a text file

profanity.load_censor_words_from_file('/path/to/my/project/my_wordlist.txt', whitelist_words=['merry'])

```

### 7. Add more censor words

```python

from better_profanity import profanity

if __name__ == "__main__":

    custom_badwords = ['happy', 'jolly', 'merry']

    profanity.add_censor_words(custom_badwords)

    print(profanity.contains_profanity("Happy you, fuck!"))

    # **** you, ****!

```

## Limitations

1. As the library compares each word by characters, the censor could easily be bypassed by adding any character(s) to the word:

```python

profanity.censor('I just have sexx')

# returns 'I just have sexx'

profanity.censor('jerkk off')

# returns 'jerkk off'

```

2. Any word in [wordlist](https://github.com/snguyenthanh/better_profanity/blob/master/better_profanity/profanity_wordlist.txt) that have non-space separators cannot be recognised, such as `s & m`, and therefore, it won't be filtered out. This problem was raised in [#5](https://github.com/snguyenthanh/better_profanity/issues/5).

## Testing

```sh

python3 tests.py

```

## Contributing

Please read [CONTRIBUTING.md](./CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us.

## License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details

## Special thanks to

- [Andrew Grinevich](https://github.com/Derfirm) - Add support for Unicode characters.

- [Jaclyn Brockschmidt](https://github.com/jcbrockschmidt) - Optimize string comparison.

## Acknowledgments

- [Ben Friedland](https://github.com/ben174) - For the inspiring package [profanity](https://github.com/ben174/profanity).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/snguyenthanh/better_profanity

Awesome Lists containing this project

README