{"id":16989756,"url":"https://github.com/sappho192/edmtranslator","last_synced_at":"2025-12-07T22:03:53.561Z","repository":{"id":243811409,"uuid":"812872918","full_name":"sappho192/EDMTranslator","owner":"sappho192","description":".NET Text translator library based on LLM models, especially EncoderDecoderModel in HuggingFace","archived":false,"fork":false,"pushed_at":"2024-09-06T16:34:57.000Z","size":89,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-04T15:47:14.649Z","etag":null,"topics":["dotnet","encoder-decoder-model","huggingface","library","llm"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sappho192.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-10T04:13:27.000Z","updated_at":"2024-09-06T16:35:00.000Z","dependencies_parsed_at":"2024-06-11T10:04:11.618Z","dependency_job_id":"c1a0fb65-15a8-4c39-a1b1-be5ab82c8aac","html_url":"https://github.com/sappho192/EDMTranslator","commit_stats":null,"previous_names":["sappho192/edmtranslator"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sappho192%2FEDMTranslator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sappho192%2FEDMTranslator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sappho192%2FEDMTranslator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sappho192%2FEDMTranslator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sappho192","download_url":"https://codeload.github.com/sappho192/EDMTranslator/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239877133,"owners_count":19712020,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dotnet","encoder-decoder-model","huggingface","library","llm"],"created_at":"2024-10-14T03:07:47.351Z","updated_at":"2025-12-07T22:03:48.520Z","avatar_url":"https://github.com/sappho192.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"- [EDMTranslator](#edmtranslator)\n- [Nuget Package list](#nuget-package-list)\n- [Requirements](#requirements)\n- [Supported models](#supported-models)\n- [Quickstart](#quickstart)\n  - [Install the packages](#install-the-packages)\n  - [Prepare the required data](#prepare-the-required-data)\n    - [Japanese dictionary](#japanese-dictionary)\n    - [Fine-tuned translator model](#fine-tuned-translator-model)\n  - [Implement the driver code](#implement-the-driver-code)\n- [How to build](#how-to-build)\n\n# EDMTranslator\n\nText translator library based on LLM models, especially EncoderDecoderModel in HuggingFace\n\n# Nuget Package list\n\n| Package       | repo                                                                                                                            | description  |\n| ------------- | ------------------------------------------------------------------------------------------------------------------------------- | ------------ |\n| EDMTranslator | [![Nuget EDMTranslator](https://img.shields.io/nuget/v/EDMTranslator.svg?style=flat)](https://www.nuget.org/packages/EDMTranslator/) | Main library |\n\n# Requirements\n\n* .NET 6 or above\n* **Free RAM spaces at least 3.5GB** before running the translator\n\n# Supported models\n\n* JESCJaEnTranslator([sappho192/jesc-ja-en-translator](https://huggingface.co/sappho192/jesc-ja-en-translator)): Japanese-to-English translator based on `tohoku-nlp/bert-base-japanese-v2` and `openai-community/gpt2`, fine-tuned with JESC dataset\n* FF14JaKoTranslator([sappho192/ffxiv-ja-ko-translator](https://github.com/sappho192/ffxiv-ja-ko-translator)): Japanese-to-Korean translator based on `tohoku-nlp/bert-base-japanese-v2` and `skt/kogpt2-base-v2`, fine-tuned with FF14 dataset\n* AihubJaKoTranslator([sappho192/aihub-ja-ko-translator](https://huggingface.co/sappho192/aihub-ja-ko-translator)): Japanese-to-Korean translator based on `tohoku-nlp/bert-base-japanese-v2` and `skt/kogpt2-base-v2`, fine-tuned with AIHub dataset\n* More to be added...\n\n# Quickstart\n\nFollowing guide **supposes that you are to use JESCJaEnTranslator** mentioned above.\n\n## Install the packages\n\n1. From the NuGet, install `EDMTranslator` package\n2. And then, install `Tokenizers.DotNet.runtime.win` package too\n\n## Prepare the required data\n\n### Japanese dictionary\n\n* Download unidic mecab dictionary `unidic-mecab-2.1.2_bin.zip` from https://clrd.ninjal.ac.jp/unidic_archive/cwj/2.1.2/ and unzip the archive into somewhere\n\n### Fine-tuned translator model\n\n* Download the translator model from [sappho192/jesc-ja-en-translator](https://huggingface.co/sappho192/jesc-ja-en-translator/blob/main/onnx_jesc-ja-en.7z) (especially `onnx_jesc-ja-en.7z`) and unzip the archive into somewhere\n\n## Implement the driver code\n\nWrite the code like below and you are good to go 🫡\nNote that you need to fix the path of `encoderDictDir` and `modelDir` correctly.\n\n```csharp\n // Console application which translates Japanese sentence to English with JESCJaEnTranslator\n\nusing EDMTranslator.Tokenization;\nusing EDMTranslator.Translation;\n\n// Prepare the tokenizer\nvar encoderVocabPath = await BertJapaneseTokenizer.HuggingFace.GetVocabFromHub(\"tohoku-nlp/bert-base-japanese-v2\");\nvar hubName = \"openai-community/gpt2\";\nvar decoderVocabFilename = \"tokenizer.json\";\nvar decoderVocabPath = await Tokenizers.DotNet.HuggingFace.GetFileFromHub(hubName, decoderVocabFilename, \"deps\");\n\nstring encoderDictDir = @\"D:\\DATASET\\unidic-mecab-2.1.2_bin\";\nvar tokenizer = new BertJa2GPTTokenizer(\n    encoderDictDir: encoderDictDir, encoderVocabPath: encoderVocabPath,\n    decoderVocabPath: decoderVocabPath);\n\nvoid TestTokenizer(ITokenizer tokenizer)\n{\n    Console.WriteLine(\"--Tokenizer test--\");\n    Console.WriteLine(\"[Encode]\");\n    var sentenceJa = \"打ち合わせが終わった後にご飯を食べましょう。\";\n    Console.WriteLine($\"Input: {sentenceJa}\");\n    var (embeddingsJa, attentionMask) = tokenizer.Encode(sentenceJa);\n    Console.WriteLine($\"Encoded: {string.Join(\", \", embeddingsJa)}\");\n\n    Console.WriteLine(\"[Decode]\");\n    // Tokens of \"i was nervous before the exam, and i had a fever.\"\n    var tokens = new uint[] { 72, 373, 10927, 878, 262, 2814, 11, 290, 1312, 550, 257, 17372, 13 };\n    Console.WriteLine($\"Input: {string.Join(\", \", tokens)}\");\n    var decoded = tokenizer.Decode(tokens);\n    Console.WriteLine($\"Decoded: {decoded}\");\n}\nTestTokenizer(tokenizer);\n\n// Prepare the translator\nstring modelDir = @\"D:\\MODEL\\jesc-ja-en-translator\\onnx\"; // The folder should contains encoder_model.onnx and decoder_model_merged.onnx\nvar translator = new JESCJaEnTranslator(tokenizer, modelDir);\nvoid TestTranslator(JESCJaEnTranslator translator)\n{\n    Console.WriteLine(\"--Translator test--\");\n    Translate(translator, \"打ち合わせが終わった後にご飯を食べましょう。\");\n    Translate(translator, \"試験前に緊張したあまり、熱がでてしまった。\");\n    Translate(translator, \"山田は英語にかけてはクラスの誰にも負けない。\");\n    Translate(translator, \"この本によれば、最初の人工橋梁は新石器時代にさかのぼるという。\");\n}\nTestTranslator(translator);\n\nstatic void Translate(JESCJaEnTranslator translator, string sentence)\n{\n    Console.WriteLine($\"SourceText: {sentence}\");\n    string translated = translator.Translate(sentence);\n    Console.WriteLine($\"Translated: {translated}\");\n}\n```\n\n# How to build\n\n1. Prepare following stuff:\n   1. .NET build system (`dotnet 6.0, 7.0, 8.0`)\n   2. PowerShell (Recommend `7.4.2` or above)\n2. Run `cbuild.ps1`\n\nThe build artifact will be saved in `nuget` directory.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsappho192%2Fedmtranslator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsappho192%2Fedmtranslator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsappho192%2Fedmtranslator/lists"}