An open API service indexing awesome lists of open source software.

https://github.com/voidful/nlprep

🍳 NLPrep - dataset tool for many natural language processing task
https://github.com/voidful/nlprep

dataset nlp prepare pytorch tfkit

Last synced: about 1 year ago
JSON representation

🍳 NLPrep - dataset tool for many natural language processing task

Awesome Lists containing this project

README

          










PyPI


Download


Build


Last Commit

## Feature
- handle over 100 dataset
- generate statistic report about processed dataset
- support many pre-processing ways
- Provide a panel for entering your parameters at runtime
- easy to adapt your own dataset and pre-processing utility

# Online Explorer
[https://voidful.github.io/NLPrep-Datasets/](https://voidful.github.io/NLPrep-Datasets/)

# Documentation
Learn more from the [docs](https://voidful.github.io/NLPrep/).

## Quick Start
### Installing via pip
```bash
pip install nlprep
```
### get one of the dataset
```bash
nlprep --dataset clas_udicstm --outdir sentiment
```

**You can also try nlprep in Google Colab: [![Google Colab](https://colab.research.google.com/assets/colab-badge.svg "nlprep")](https://colab.research.google.com/drive/1EfVXa0O1gtTZ1xEAPDyvXMnyjcHxO7Jk?usp=sharing)**

## Overview
```
$ nlprep
arguments:
--dataset which dataset to use
--outdir processed result output directory

optional arguments:
-h, --help show this help message and exit
--util data preprocessing utility, multiple utility are supported
--cachedir dir for caching raw dataset
--infile local dataset path
--report generate a html statistics report
```

## Contributing
Thanks for your interest.There are many ways to contribute to this project. Get started [here](https://github.com/voidful/nlprep/blob/master/CONTRIBUTING.md).

## License ![PyPI - License](https://img.shields.io/github/license/voidful/nlprep)

* [License](https://github.com/voidful/nlprep/blob/master/LICENSE)

## Icons reference
Icons modify from Darius Dan from www.flaticon.com
Icons modify from Freepik from www.flaticon.com