Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Hironsan/natural-language-preprocessings
Some recipes of natural language pre-processing
https://github.com/Hironsan/natural-language-preprocessings
machine-learning natural-language-processing preprocessnig
Last synced: 3 months ago
JSON representation
Some recipes of natural language pre-processing
- Host: GitHub
- URL: https://github.com/Hironsan/natural-language-preprocessings
- Owner: Hironsan
- License: mit
- Created: 2017-03-28T08:11:21.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2023-07-06T21:13:03.000Z (over 1 year ago)
- Last Synced: 2024-06-17T23:42:27.652Z (5 months ago)
- Topics: machine-learning, natural-language-processing, preprocessnig
- Language: Python
- Homepage:
- Size: 59.6 KB
- Stars: 131
- Watchers: 6
- Forks: 28
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Natural Language Pre-processing
This repository includes some recipes of natural language pre-processing.The list of recipes are as follows:
* Data cleaner
* Word normalization
* Stopwords remover
* Tokenizer
* Word Vector## Install
To install required modules, simply:```
$ pip install -r requirements.txt
```## Setup
First, you should download [livedoor news corpus](http://www.rondhuit.com/download.html#ldcc) and extract it.
For downloading the corpus, please execute following command:```
$ cd src/data
$ python make_dataset.py
```Now, you can ready for classification!
Start jupyter notebook:
```
$ jupyter notebook
```And you can execute [notebooks/document_classification.ipynb](https://github.com/Hironsan/natural-language-preprocessings/blob/master/notebooks/document_classification.ipynb).
Good NLP Life!
## Licence
[MIT](https://github.com/Hironsan/natural-language-preprocessings/blob/master/LICENCE)
## Author
[Hironsan](https://github.com/Hironsan)