{"id":15454747,"url":"https://github.com/bees4ever/seaqube","last_synced_at":"2025-04-21T14:21:29.412Z","repository":{"id":55011758,"uuid":"296326813","full_name":"bees4ever/seaqube","owner":"bees4ever","description":"Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. Acronym `SeaQuBe` or `seaqube`.","archived":false,"fork":false,"pushed_at":"2021-01-28T19:08:08.000Z","size":6177,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-20T00:59:16.985Z","etag":null,"topics":["augmentation","benchmark","fasttext","gensim","nlp","spacy","spacy-nlp","wordembeddings"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bees4ever.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-17T12:58:07.000Z","updated_at":"2021-01-28T19:08:11.000Z","dependencies_parsed_at":"2022-08-14T09:00:56.017Z","dependency_job_id":null,"html_url":"https://github.com/bees4ever/seaqube","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bees4ever%2Fseaqube","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bees4ever%2Fseaqube/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bees4ever%2Fseaqube/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bees4ever%2Fseaqube/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bees4ever","download_url":"https://codeload.github.com/bees4ever/seaqube/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249834788,"owners_count":21331988,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["augmentation","benchmark","fasttext","gensim","nlp","spacy","spacy-nlp","wordembeddings"],"created_at":"2024-10-01T22:05:17.215Z","updated_at":"2025-04-20T00:59:32.308Z","avatar_url":"https://github.com/bees4ever.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\r\n    \u003cbr\u003e\r\n    \u003cimg width=\"200px\" src=\"https://github.com/bees4ever/SeaQuBe/raw/master/logo/seaqube_logo_v1.png\"/\u003e\r\n\u003cbr\u003e\r\n\u003cp\u003e\r\n\r\n# SeaQuBe\r\n\r\nSemantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. Acronym `SeaQuBe` or `seaqube`.\r\n\r\nThis python framework provides several text augmentation implementations and word embedding quality evaluation methods. It is designed to fit in your machine learning pipeline. The `BaseAugmentation` class provides the same api as the python package [nlpaug](https://github.com/makcedward/nlpaug/), so that this packages can used together smoothly. However `BaseAugmentation` provides also other methods. Detailed examples see beneath.\r\n\r\n`SeaQuBe` provides also a toolkit to wrap a trained nlp model to a nice interactive tool.\r\n\r\n\u003ca target=\"_blank\" href=\"https://travis-ci.org/github/bees4ever/SeaQuBe/builds/\"\u003e\u003cimg src=\"https://travis-ci.org/bees4ever/SeaQuBe.svg?branch=master\u0026amp;status=started\" alt=\"Travis build Status\"\u003e\u003c/a\u003e \u003ca href=\"https://app.codacy.com/gh/bees4ever/seaqube?utm_source=github.com\u0026utm_medium=referral\u0026utm_content=bees4ever/seaqube\u0026utm_campaign=Badge_Grad\" \u003e\u003cimg src=\"https://api.codacy.com/project/badge/Grade/50fef8e32b794b65b10651de44637cf8\"  alt=\"code:quality\"\u003e\u003c/a\u003e[![PyPI version](https://badge.fury.io/py/seaqube.svg)](https://badge.fury.io/py/seaqube)\r\n\r\n\r\n## Features\r\n\r\n*  Text Data Augmentation\r\n*  Chaining and Reducing of Text Data Augmentations\r\n*  Word Embedding Quality Methods\r\n*  Interactive NLM Model Wrapper\r\n\r\n## Demo\r\n*   [Augmentation in three lines](https://github.com/bees4ever/SeaQuBe#quick-demo)\r\n*   [Example of Basic Text Augmentation](https://github.com/bees4ever/SeaQuBe/blob/master/examples/basic_augmentation.ipynb)\r\n*   [Example of Text Augmentation Chaining](https://github.com/bees4ever/SeaQuBe/blob/master/examples/chained_augmentation.ipynb)\r\n*   [Example of Word Embedding Evaluation](https://github.com/bees4ever/SeaQuBe/blob/master/examples/word_embedding_evaluation.ipynb)\r\n*   [Example of Interactive NLP](https://github.com/bees4ever/SeaQuBe/blob/master/examples/nlp.ipynb)\r\n\r\n## Augmentation\r\n| Level  | Augmenter  | Description |\r\n|:---:|:---:|:---:|\r\n| Character | QwertyAugmentation | Simulate keyboard distance error |\r\n| Corpus | UnigramAugmentation | Replace ubiquitous words with other ubiquitous words |\r\n| Word | Active2PassiveAugmentation | Change surface of document using an simple active-to-passive transformer |\r\n| Word | EDAAugmentation | Augment document using the [EDA](https://github.com/jasonwei20/eda_nlp) algorithm |\r\n| Word | EmbeddingAugmentation | Replace similar word using [WordNet](https://wordnet.princeton.edu/) |\r\n| Word | TranslationAugmentation | Change surface of document using translation and back-translation (with [GoogleTranslate](https://translate.google.com/))|\r\n\r\n## Augmentation Chainer\r\nThe streaming feature of augmentation is implemented in the ``AugmentationStreamer`` class. One `Reduceing` class exist, more can implemented\r\nextending the ``BaseReduction`` class.  \r\n\r\n| Action  | Class  | Description |\r\n|:---:|:---:|:---:|\r\n|Streaming|AugmentationStreamer| Run augmentation for each document through all chained augmentations.  |\r\n|Reducing| UniqueCorpusReduction | Getting a list of documents, only unique documents are returned.  \r\n\r\n## Word Embedding Evaluation\r\n| Method  | Description |\r\n|:---:|:---:|\r\n|WordAnalogyBenchmark|This method benchmark how go relations of the type: `a is to b as c is to d` can be solved correctly.|\r\n|WordSimilarityBenchmark|This methods compares the similarity of a word pair, calculated by a model with a human estimated similarity score.|\r\n|WordOutliersBenchmark|This method benchmark how good a outlier of a group of words can be detected.|\r\n|SemanticWordnetBenchmark|Based on the WordNet graph, the goodnes of the semantic / similarity of a nlp model is benchmarked.|\r\n\r\n## Installation\r\n\r\n`SeaQuBe` can be installed from PyPip using: `pip install seaqube` or run in the main directory: `python setup.py install`.\r\n\r\n### External Dependencies\r\n\r\nSome external dependencies are not installed automatically, but `seaqube` or `nltk` might throw errors with an instruction what to do.\r\nFor example ``seqube`` might ask you to run:\r\n\r\n````bash \r\npython -c \"from seaqube import download;download('vec4ir')\"\r\n````\r\n\r\n## Quick Demo\r\n````python\r\nfrom seaqube.augmentation.word import Active2PassiveAugmentation, EDAAugmentation, TranslationAugmentation, EmbeddingAugmentation\r\ntranslate = TranslationAugmentation(max_length=2)\r\ntranslate.doc_augment(['This', 'is', 'a', 'tokenized', 'corpus'])\r\n````\r\n\r\n## Setup Dev Environment\r\n_TODO_\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbees4ever%2Fseaqube","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbees4ever%2Fseaqube","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbees4ever%2Fseaqube/lists"}