https://github.com/iskyd/npl-classification

Natural processing language classification
https://github.com/iskyd/npl-classification

nltk npl python

Last synced: 8 months ago
JSON representation

Natural processing language classification

Host: GitHub
URL: https://github.com/iskyd/npl-classification
Owner: iskyd
Created: 2017-10-27T09:41:08.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2017-10-31T09:08:10.000Z (almost 8 years ago)
Last Synced: 2025-02-23T00:05:01.292Z (8 months ago)
Topics: nltk, npl, python
Language: Python
Homepage:
Size: 7.81 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          Setting up virtual env

===================

    virtualenv my_app

    source my_app/bin/activate

Get npl-classification

===================

    cd my_app

    git clone https://github.com/iskyd/npl-classification ./

Install requirements

===================

    bin/pip install -r requirements.txt

    bin/python -m nltk.downloader -d /path/to/my_app stopwords punkt

Get with Docker

===================

    docker pull iskyd/npl-classification

    docker run -t -d iskyd/npl-classification

    docker exec -ti  bash

    export NLTK_DATA=/app/nltk_data

To copy your data from localhost to container you can use docker cp

    docker cp data.json :/app/data.json

Read more info about [Docker](https://docs.docker.com/)

Usage

===================

    python npl-classification/classify.py --help

**Create train and dump classifier**

    python npl-classification/classify.py --create-classifier='/path/to/document.json' --dump-classifier='/path/to/dump.picke'

Your document.json is just a simply json document with the text and the category

    {

        text: 'Test message',

        category: 'Test category'

    }

You can also have a different structure and specify the fields of text and category adding

    --text-field='mycustomfield'

    --classification-field='mycustomclassificationfield

Default language for stemmer stopwords and word tokenizer is english, you can specify a different language using

    --stemmer-language='italian'

    --stopwords-language='italian'

    --word-tokenize-language='italian'

If you want to ignore stopwords in stemmer just specify -i options.

By default stopwords is removed from training_set, if you want to keep it for training/accuracy/classify just add -k option

You can specify the number of row used for training using the --row-training-set options by default is 1000

    --row-training-set=100

Using the -r options the row from document.json is read random (Default is False).

**Load classifier**

    python npl-classification/classify.py --load-classifier='/path/to/dump.picke'

**Testing classifier**

During the test you can specify the creation/training of classifier or the load of classifier in the same way we've seen before.

The option -a is used to calculate the accuracy and perform test from a json file.

You can specify the json file using --test-file-path.

In the same way we seen before you can specify the number of row used, get it random or no, specify the text field and the classification field of json file and set the language for stemmer, stopwords and word-tokenize.

    python npl-classification/classify.py --load-classifier='/path/to/dump.picke' -a --test-file-path='/path/to/data.json' --random-row-test-set

You can use some other options:

    --test-text-field

    --test-classification-field

    --row-test-set (in order to set limit to the number of row you want to use for test, default=500)

**Classify**

As seen before you can create or load the classifier.

    python npl-classification/classify.py --load-classifier='/path/to/dump.picke' --classify='I want classify this text'

**More info**

To get more info use --help command.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/iskyd/npl-classification

Awesome Lists containing this project

README