Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sharkutilities/nlpurify

A powerful collection library for feature extraction and text cleaning using Unicode translations, regular expressions, natural language processing, large language models and more.
https://github.com/sharkutilities/nlpurify

Last synced: about 1 month ago
JSON representation

A powerful collection library for feature extraction and text cleaning using Unicode translations, regular expressions, natural language processing, large language models and more.

Awesome Lists containing this project

README

        


favicon

NLPurify

[![Documentation Status](https://readthedocs.org/projects/nlpurify/badge/?version=latest&style=plastic)](https://nlpurify.readthedocs.io/en/latest/?badge=latest)
[![GitHub Issues](https://img.shields.io/github/issues/sharkutilities/NLPurify?style=plastic)](https://github.com/sharkutilities/NLPurify/issues)
[![GitHub Forks](https://img.shields.io/github/forks/sharkutilities/NLPurify?style=plastic)](https://github.com/sharkutilities/NLPurify/network)
[![GitHub Stars](https://img.shields.io/github/stars/sharkutilities/NLPurify?style=plastic)](https://github.com/sharkutilities/NLPurify/stargazers)
[![LICENSE File](https://img.shields.io/github/license/sharkutilities/NLPurify?style=plastic)](https://github.com/sharkutilities/NLPurify/blob/master/LICENSE)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/NLPurify?style=plastic)](https://pypistats.org/packages/pandas-wizard)
[![PyPI Latest Release](https://img.shields.io/pypi/v/NLPurify.svg?style=plastic)](https://pypi.org/project/NLPurify/)

[![GuardRails badge](https://api.guardrails.io/v2/badges/252951?token=2e1d82f6a737cdd3151ea0c869ee61c86196c3a05d17b0d91bf5a032e7766dc0)](https://dashboard.guardrails.io/gh/sharkutilities/repos/252951)

A text cleaning and extraction engine was developed using a combination of traditional techniques like Unicode translations,
cleaning using regular expressions, and modern tools like "natural language processing" and "large language models" to
detect and clean long texts and create word vectors.

## Getting Started

The source code is hosted at GitHub: [**sharkutilities/NLPurify**](https://github.com/sharkutilities/NLPurify).
The binary installers for the latest release are available at the [Python Package Index (PyPI)](https://pypi.org/project/NLPurify/).

```bash
pip install -U NLPurify
```

The module is currently under development, and new ideas are welcomed. Raise a new PR/issue for the same.
The changes between each release are available [here](./CHANGELOG.md).

---

> [!CAUTION]
> **This code depreciates the existing GitHub Gist which was previously designed.**
> Check [`#1`](https://github.com/sharkutilities/NLPurify/issues/1) for more details.

> [!NOTE]
> **_Legacy_ codes are available as a submodule.**
> Check [`#5`](https://github.com/sharkutilities/NLPurify/issues/5) for more details.