{"id":21769964,"url":"https://github.com/mideind/greynirt2t","last_synced_at":"2026-04-27T17:34:07.163Z","repository":{"id":66179665,"uuid":"298145949","full_name":"mideind/GreynirT2T","owner":"mideind","description":"Machine Translation between Icelandic and English using Tensor2Tensor","archived":false,"fork":false,"pushed_at":"2020-09-29T13:57:32.000Z","size":84,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-06-27T08:43:12.054Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mideind.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-09-24T02:15:30.000Z","updated_at":"2021-06-03T14:27:18.000Z","dependencies_parsed_at":"2023-02-21T17:00:19.332Z","dependency_job_id":null,"html_url":"https://github.com/mideind/GreynirT2T","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/mideind/GreynirT2T","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mideind%2FGreynirT2T","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mideind%2FGreynirT2T/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mideind%2FGreynirT2T/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mideind%2FGreynirT2T/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mideind","download_url":"https://codeload.github.com/mideind/GreynirT2T/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mideind%2FGreynirT2T/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32348048,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-27T17:12:42.749Z","status":"ssl_error","status_checked_at":"2026-04-27T17:12:41.658Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-26T14:10:50.825Z","updated_at":"2026-04-27T17:34:07.158Z","avatar_url":"https://github.com/mideind.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GreynirT2T\nMachine Translation between Icelandic and English using [Tensor2Tensor](https://github.com/tensorflow/tensor2tensor).\nMost of the provided scripts assume they are running inside a docker container, although they should run outside containers as well.\nMany of the paths are hard-coded and need to be adapted (see e.g. greynirt2t/translate_enis.py)\n\n## Data ##\nBilingual parallel data and monolingual Icelandic data can be downloaded from [CLARIN](https://repository.clarin.is/repository/xmlui/handle/20.500.12537/16 \"CLARIN\").\nNote that a license must be accepted, and the OpenSubtitles2018 subcorpus must be downloaded and cleaned (by a provided script).\n\n### Data preparation ###\nUse the cleaning and preprocessing script filters.py to optionally clean data before training (see script).\n\n### Vocabulary ###\nIn order to run the pre-trained models, the vocabulary used at training time must be used. \n\n## Training ##\nBefore training can begin, the training data must be binarized and a vocabulary must be generated (if one does not already exist).\n\nNote that batch size must be tuned according to your GPU by trial and error since it depends on available VRAM, model size, and maximum sequence size.\nIf a larger batch size is wanted than can fit on your GPU, then larger batch sizes can be simulated (with multistep_adam, see scripts/env.sh).\nThe training can be found at greynirt2t/scripts/train.sh.\n\n## Inference / translation ##\nTo view model predictions, see interactive_decode.sh or translate_file.sh (or the T2T repository for documentation).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmideind%2Fgreynirt2t","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmideind%2Fgreynirt2t","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmideind%2Fgreynirt2t/lists"}