Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bees4ever/seaqube
Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. Acronym `SeaQuBe` or `seaqube`.
https://github.com/bees4ever/seaqube
augmentation benchmark fasttext gensim nlp spacy spacy-nlp wordembeddings
Last synced: 3 months ago
JSON representation
Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. Acronym `SeaQuBe` or `seaqube`.
- Host: GitHub
- URL: https://github.com/bees4ever/seaqube
- Owner: bees4ever
- License: mit
- Created: 2020-09-17T12:58:07.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-01-28T19:08:08.000Z (almost 4 years ago)
- Last Synced: 2024-09-30T09:13:13.984Z (4 months ago)
- Topics: augmentation, benchmark, fasttext, gensim, nlp, spacy, spacy-nlp, wordembeddings
- Language: Python
- Homepage:
- Size: 5.89 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SeaQuBe
Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. Acronym `SeaQuBe` or `seaqube`.
This python framework provides several text augmentation implementations and word embedding quality evaluation methods. It is designed to fit in your machine learning pipeline. The `BaseAugmentation` class provides the same api as the python package [nlpaug](https://github.com/makcedward/nlpaug/), so that this packages can used together smoothly. However `BaseAugmentation` provides also other methods. Detailed examples see beneath.
`SeaQuBe` provides also a toolkit to wrap a trained nlp model to a nice interactive tool.
[![PyPI version](https://badge.fury.io/py/seaqube.svg)](https://badge.fury.io/py/seaqube)
## Features
* Text Data Augmentation
* Chaining and Reducing of Text Data Augmentations
* Word Embedding Quality Methods
* Interactive NLM Model Wrapper## Demo
* [Augmentation in three lines](https://github.com/bees4ever/SeaQuBe#quick-demo)
* [Example of Basic Text Augmentation](https://github.com/bees4ever/SeaQuBe/blob/master/examples/basic_augmentation.ipynb)
* [Example of Text Augmentation Chaining](https://github.com/bees4ever/SeaQuBe/blob/master/examples/chained_augmentation.ipynb)
* [Example of Word Embedding Evaluation](https://github.com/bees4ever/SeaQuBe/blob/master/examples/word_embedding_evaluation.ipynb)
* [Example of Interactive NLP](https://github.com/bees4ever/SeaQuBe/blob/master/examples/nlp.ipynb)## Augmentation
| Level | Augmenter | Description |
|:---:|:---:|:---:|
| Character | QwertyAugmentation | Simulate keyboard distance error |
| Corpus | UnigramAugmentation | Replace ubiquitous words with other ubiquitous words |
| Word | Active2PassiveAugmentation | Change surface of document using an simple active-to-passive transformer |
| Word | EDAAugmentation | Augment document using the [EDA](https://github.com/jasonwei20/eda_nlp) algorithm |
| Word | EmbeddingAugmentation | Replace similar word using [WordNet](https://wordnet.princeton.edu/) |
| Word | TranslationAugmentation | Change surface of document using translation and back-translation (with [GoogleTranslate](https://translate.google.com/))|## Augmentation Chainer
The streaming feature of augmentation is implemented in the ``AugmentationStreamer`` class. One `Reduceing` class exist, more can implemented
extending the ``BaseReduction`` class.| Action | Class | Description |
|:---:|:---:|:---:|
|Streaming|AugmentationStreamer| Run augmentation for each document through all chained augmentations. |
|Reducing| UniqueCorpusReduction | Getting a list of documents, only unique documents are returned.## Word Embedding Evaluation
| Method | Description |
|:---:|:---:|
|WordAnalogyBenchmark|This method benchmark how go relations of the type: `a is to b as c is to d` can be solved correctly.|
|WordSimilarityBenchmark|This methods compares the similarity of a word pair, calculated by a model with a human estimated similarity score.|
|WordOutliersBenchmark|This method benchmark how good a outlier of a group of words can be detected.|
|SemanticWordnetBenchmark|Based on the WordNet graph, the goodnes of the semantic / similarity of a nlp model is benchmarked.|## Installation
`SeaQuBe` can be installed from PyPip using: `pip install seaqube` or run in the main directory: `python setup.py install`.
### External Dependencies
Some external dependencies are not installed automatically, but `seaqube` or `nltk` might throw errors with an instruction what to do.
For example ``seqube`` might ask you to run:````bash
python -c "from seaqube import download;download('vec4ir')"
````## Quick Demo
````python
from seaqube.augmentation.word import Active2PassiveAugmentation, EDAAugmentation, TranslationAugmentation, EmbeddingAugmentation
translate = TranslationAugmentation(max_length=2)
translate.doc_augment(['This', 'is', 'a', 'tokenized', 'corpus'])
````## Setup Dev Environment
_TODO_