https://github.com/ismielabir/txtcleanen
txtcleanen
https://github.com/ismielabir/txtcleanen
nlp python-package text-cleaning text-preprocessing
Last synced: about 1 month ago
JSON representation
txtcleanen
- Host: GitHub
- URL: https://github.com/ismielabir/txtcleanen
- Owner: IsmielAbir
- License: mit
- Created: 2025-11-06T13:44:55.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-11-06T13:51:02.000Z (7 months ago)
- Last Synced: 2026-04-25T03:59:56.903Z (about 1 month ago)
- Topics: nlp, python-package, text-cleaning, text-preprocessing
- Language: Python
- Homepage: https://pypi.org/project/txtcleanen/
- Size: 6.84 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: License
Awesome Lists containing this project
README
# txtcleanen
**txtcleanen** is a simple Python package for cleaning English text by removing HTML tags, URLs, emojis, numbers, punctuation, and extra whitespace — ideal for Natural Language Processing (NLP) and text preprocessing tasks.
---
## ✨ Features
- Remove HTML tags
- Remove URLs
- Remove emojis
- Remove digits and punctuation
- Normalize Unicode text
- Compact multiple spaces into one
---
## 🚀 Installation
```bash
pip install txtcleanen
```
## Example
```
import txtcleanen
text = "Hello 😊 World! Visit https://example.com now!"
clean_text = txtcleanen(text)
print(clean_text)
# Output: "Hello World Visit now"
```