{"id":23691077,"url":"https://github.com/sagorbrur/bnaug","last_synced_at":"2025-09-02T20:31:41.507Z","repository":{"id":63508801,"uuid":"346042963","full_name":"sagorbrur/bnaug","owner":"sagorbrur","description":"Bangla Text Augmentation","archived":false,"fork":false,"pushed_at":"2023-08-30T16:28:16.000Z","size":53,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-16T21:35:54.577Z","etag":null,"topics":["augmentation-libraries","back-translation","bangla","bangla-text-augmentation","bengali","bengali-nlp","text-augmentation"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sagorbrur.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"license","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-09T14:56:42.000Z","updated_at":"2025-02-18T09:10:22.000Z","dependencies_parsed_at":"2023-01-22T07:17:31.157Z","dependency_job_id":null,"html_url":"https://github.com/sagorbrur/bnaug","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/sagorbrur/bnaug","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sagorbrur%2Fbnaug","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sagorbrur%2Fbnaug/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sagorbrur%2Fbnaug/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sagorbrur%2Fbnaug/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sagorbrur","download_url":"https://codeload.github.com/sagorbrur/bnaug/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sagorbrur%2Fbnaug/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273344639,"owners_count":25089020,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-02T02:00:09.530Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["augmentation-libraries","back-translation","bangla","bangla-text-augmentation","bengali","bengali-nlp","text-augmentation"],"created_at":"2024-12-30T02:54:10.291Z","updated_at":"2025-09-02T20:31:41.226Z","avatar_url":"https://github.com/sagorbrur.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bnaug (Bangla Text Augmentation)\n__bnaug__ is a text augmentation tool for Bangla text.\n\n## Installation\n```\npip install bnaug\n```\n- Dependencies\n    - pytorch \u003e=1.7.0\n    \n## Demo Notebook\n- [bnaug demo](https://github.com/sagorbrur/bnaug/blob/main/notebook/bnaug_demo.ipynb)\n\n## Necessary Model Links\n- [word2vec](https://huggingface.co/sagorsarker/bangla_word2vec/resolve/main/bangla_word2vec_gen4.zip)\n- [glove vector](https://huggingface.co/sagorsarker/bangla-glove-vectors/resolve/main/bn_glove.300d.zip)\n\n## Sentence Augmentation\n### Token Replacement\n- Mask generation based augmentation\n\n    ```py\n    from bnaug.sentence import TokenReplacement\n\n    tokr = TokenReplacement()\n    text = \"আমি ঢাকায় বাস করি।\"\n    output = tokr.masking_based(text, sen_n=5)\n    ```\n\n- Word2Vec based augmentation\n\n    ```py\n    from bnaug.sentence import TokenReplacement\n\n    tokr = TokenReplacement()\n    text = \"আমি ঢাকায় বাস করি।\"\n    model = \"msc/bangla_word2vec/bnwiki_word2vec.model\"\n    output = tokr.word2vec_based(text, model=model, sen_n=5, word_n=5)\n    print(output)\n    ```\n\n- Glove based augmentation\n\n    ```py\n    from bnaug.sentence import TokenReplacement\n\n    tokr = TokenReplacement()\n    text = \"আমি ঢাকায় বাস করি।\"\n    vector = \"msc/bn_glove.300d.txt\"\n    output = tokr.glove_based(text, vector_path=vector, sen_n=5, word_n=5)\n    print(output)\n    ```\n\n### Back Translation\nBack translation based augmentation first translate Bangla sentence to English and then again translate the English to Bangla.\n\n```py\nfrom bnaug.sentence import BackTranslation\n\nbt = BackTranslation()\ntext = \"বাংলা ভাষা আন্দোলন তদানীন্তন পূর্ব পাকিস্তানে সংঘটিত একটি সাংস্কৃতিক ও রাজনৈতিক আন্দোলন। \"\noutput = bt.get_augmented_sentences(text)\nprint(output)\n\n```\n\n### Text Generation\n- Paraphrase generation\n\n```py\nfrom bnaug.sentence import TextGeneration\n\ntg = TextGeneration()\ntext = \"বিমানটি যখন মাটিতে নামার জন্য এয়ারপোর্টের কাছাকাছি আসছে, তখন ল্যান্ডিং গিয়ারের খোপের ঢাকনাটি খুলে যায়।\"\noutput = tg.parapharse_generation(text)\nprint(output)\n```\n\n### Random Augmentation\n- Random remove part and generate new sentence\n\n    At present it's removing word, stopwords, punctuations, numbers and generate new sentences\n\n    ```py\n    from bnaug.sentence import RandomAugmentation\n\n    raug = RandomAugmentation()\n    sentence = \"আমি ১০০ বাকি দিলাম\"\n    output = raug.random_remove(sentence)\n    print(output)\n\n    ```\n\n    or apply individually\n\n    ```py\n    from bnaug import randaug\n\n    text = \"১০০ বাকি দিলাম\"\n    output = randaug.remove_digits(text)\n    print(output)\n\n    text = \"১০০! বাকি দিলাম?\"\n    output = randaug.remove_punctuations(text)\n    print(output)\n\n    text = \"আমি ১০০ বাকি দিলাম\"\n    randaug.remove_stopwords(text)\n    print(output)\n\n    text = \"আমি ১০০ বাকি দিলাম\"\n    randaug.remove_random_word(text)\n    print(output)\n\n    text = \"আমি ১০০ বাকি দিলাম\"\n    randaug.remove_random_char(text)\n    print(output)\n    ```\n\n## Inspired from\n- [nlpaug](https://github.com/makcedward/nlpaug)\n- [amitness blog post](https://amitness.com/2020/05/data-augmentation-for-nlp/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsagorbrur%2Fbnaug","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsagorbrur%2Fbnaug","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsagorbrur%2Fbnaug/lists"}