https://github.com/viddexa/safetext
Fast profanity word, curse word, swear word, bad word filtering tool for English, Spanish, Chinese, Turkish and more.
https://github.com/viddexa/safetext
bad-words badwords chinese context7 english filter german llmstxt mcp moderation portuguese profanity profanity-detection profanity-filter profanityfilter russian safety spanish swear-filter turkish
Last synced: 4 months ago
JSON representation
Fast profanity word, curse word, swear word, bad word filtering tool for English, Spanish, Chinese, Turkish and more.
- Host: GitHub
- URL: https://github.com/viddexa/safetext
- Owner: viddexa
- License: mit
- Created: 2023-01-04T20:21:01.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-12-27T16:19:18.000Z (5 months ago)
- Last Synced: 2025-12-29T13:22:11.021Z (5 months ago)
- Topics: bad-words, badwords, chinese, context7, english, filter, german, llmstxt, mcp, moderation, portuguese, profanity, profanity-detection, profanity-filter, profanityfilter, russian, safety, spanish, swear-filter, turkish
- Language: Python
- Homepage:
- Size: 149 KB
- Stars: 44
- Watchers: 1
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://context7.com/viddexa/safetext)
[](https://context7.com/viddexa/safetext/llms.txt)
[](https://badge.fury.io/py/safetext)
[](https://pepy.tech/project/safetext)
[](LICENSE)
## 🤔 why safetext?
**Fast profanity detection and filtering for 13 languages.**
- **Multi-format Detection**: Single words, phrases, and contextual profanity
- **Custom Word Lists**: Extend built-in lists with your own profanity words
- **Whitelisting**: Exclude specific words from detection
- **Auto Language Detection**: From text or subtitle files
- **Precise Filtering**: Exact position tracking and custom censoring
- **Simple Integration**: One-line setup with clean API
## 📦 installation
easily install **safetext** with pip:
```bash
pip install safetext
```
for development setup, see our [scripts documentation](scripts/README.md).
## 🎯 quickstart
### check and censor profanity
```python
>>> from safetext import SafeText
>>> st = SafeText(language='en')
>>> results = st.check_profanity(text='Some text with .')
>>> results
[{'word': '', 'index': 4, 'start': 15, 'end': 31}]
>>> text = st.censor_profanity(text='Some text with .')
>>> text
"Some text with ***."
```
### extending profanity lists with custom words
Add your own profanity words by providing a custom words directory:
```python
# Directory structure:
# custom_profanity_words/
# ├── en.txt # English custom words
# ├── tr.txt # Turkish custom words
# └── es.txt # Spanish custom words
>>> st = SafeText(language='en', custom_words_dir='custom_profanity_words')
>>> # Custom words from en.txt are now included
>>> results = st.check_profanity('This mycustomword is inappropriate')
>>> results
[{'word': 'mycustomword', 'index': 2, 'start': 5, 'end': 17}]
```
Custom word files should contain one word/phrase per line:
```
# custom_profanity_words/en.txt
mycustomword
inappropriate phrase
company specific term
```
### using whitelist
exclude specific words from profanity detection:
```python
# Using a list of words
>>> st = SafeText(language='en', whitelist=['word1', 'word2'])
# Using a file (one word per line)
>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')
# Combining custom words with whitelist
>>> st = SafeText(
... language='en',
... custom_words_dir='custom_profanity_words',
... whitelist=['allowedcustomword']
... )
```
### automated language detection
- from text:
```python
>>> from safetext import SafeText
>>> eng_text = "This story is about to take a dark turn."
>>> st = SafeText(language=None)
>>> st.set_language_from_text(eng_text)
>>> st.language
'en'
```
- from .srt (subtitle) file:
```python
>>> from safetext import SafeText
>>> turkish_srt_file_path = "turkish.srt"
>>> st = SafeText(language=None)
>>> st.set_language_from_srt(turkish_srt_file_path)
>>> st.language
'tr'
```
## 🌍 supported languages
**safetext** currently supports profanity detection in 13 languages:
| Language | ISO 639-1 Code | Language Name |
|----------|----------------|---------------|
| 🇸🇦 | `ar` | Arabic |
| 🇦🇿 | `az` | Azerbaijani |
| 🇩🇪 | `de` | German |
| 🇬🇧 | `en` | English |
| 🇪🇸 | `es` | Spanish |
| 🇮🇷 | `fa` | Persian (Farsi) |
| 🇫🇷 | `fr` | French |
| 🇮🇳 | `hi` | Hindi |
| 🇯🇵 | `ja` | Japanese |
| 🇵🇹 | `pt` | Portuguese |
| 🇷🇺 | `ru` | Russian |
| 🇹🇷 | `tr` | Turkish |
| 🇨🇳 | `zh` | Chinese |
## 🤝 contribute to safetext
join our mission in refining content moderation!
contribute by:
- **adding new languages**: create a folder with the ISO 639-1 code and include a `words.txt`.
- **enhancing word lists**: improve detection accuracy.
- **sharing feedback**: your ideas can shape `safetext`.
see our [contributing guidelines](CONTRIBUTING.md) for development workflow, [test documentation](tests/README.md) for running tests, and [scripts guide](scripts/README.md) for automation tools.
______________________________________________________________________
## 🏆 contributors
meet our awesome contributors who make **safetext** better every day!
______________________________________________________________________