https://github.com/heardacat/feature-rich-encoding

Feature Rich Encoding for Python Scikit Learn
https://github.com/heardacat/feature-rich-encoding

Last synced: 2 months ago
JSON representation

Feature Rich Encoding for Python Scikit Learn

Host: GitHub
URL: https://github.com/heardacat/feature-rich-encoding
Owner: HeardACat
Created: 2016-12-07T07:20:03.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2016-12-17T07:03:24.000Z (over 8 years ago)
Last Synced: 2025-03-26T16:40:37.047Z (2 months ago)
Language: TeX
Size: 2.15 MB
Stars: 5
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

        Feature Rich Encoding

=====================

This is a simple Python library which adds the ability to create "feature rich encodings" which were described by Nallapati _et al_(2016) and is built ontop of `scikit-learn` library.

The key idea is to concantenate word embeddings for:

*  Word2Vec

*  POS

*  NER

*  tfidf

Each of word2vec, POS, and NER were converted to a word embedding using word2vec module within Gensim.

Usage

=====

Usage can be viewed from `fre.py`, and can easily be implemented into your `sklearn.Pipeline` workflow:

```py

from FeatureRichEncoding import FeatureRichEncoding

sentences = ["It is not known exactly when the text obtained its current standard form",

             "it may have been as late as the 1960s. Dr. Richard McClintock, a Latin scholar who was the publications director at College in Virginia",

             "discovered the source of the passage sometime before 1982 while searching for instances of the Latin word"]

from sklearn.pipeline import Pipeline, FeatureUnion

from sklearn.feature_extraction.text import TfidfVectorizer

feature_rich_all = FeatureUnion([('w2v', FeatureRichEncoding()), ('pos', FeatureRichEncoding(mode='pos')),

                          ('ner', FeatureRichEncoding(mode='ner')),

                          ('tfidf', TfidfVectorizer())])

combine_feats = feature_rich_all.fit_transform(sentences)

```

Requirments

===========

*  `gensim`

*  `nltk` : you may need to download some of the relevant corpus as well.

*  `scikit-learn`

Installation

============

```

python setup.py install

```

References

==========

Nallapati, R., Xiang, B., & Zhou, B. (2016). Sequence-to-sequence rnns for text summarization. _arXiv preprint arXiv:1602.06023._ Retreived from

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/heardacat/feature-rich-encoding

Awesome Lists containing this project

README