{"id":17989496,"url":"https://github.com/evilfreelancer/enbeddrus","last_synced_at":"2026-02-27T20:37:24.635Z","repository":{"id":242377592,"uuid":"802883601","full_name":"EvilFreelancer/enbeddrus","owner":"EvilFreelancer","description":"Collection of scripts for training bert-based embedder for Russian\u003c\u003eEnglish embeddings extraction","archived":false,"fork":false,"pushed_at":"2024-06-10T18:33:55.000Z","size":772,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-06T14:21:24.726Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EvilFreelancer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-19T14:28:34.000Z","updated_at":"2025-04-02T14:30:48.000Z","dependencies_parsed_at":"2024-06-09T08:28:57.817Z","dependency_job_id":"e6f39cd9-31b0-4be3-bc5d-a4a73e9e0261","html_url":"https://github.com/EvilFreelancer/enbeddrus","commit_stats":null,"previous_names":["evilfreelancer/enbeddrus"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/EvilFreelancer/enbeddrus","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fenbeddrus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fenbeddrus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fenbeddrus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fenbeddrus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EvilFreelancer","download_url":"https://codeload.github.com/EvilFreelancer/enbeddrus/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fenbeddrus/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29912382,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-27T19:37:42.220Z","status":"ssl_error","status_checked_at":"2026-02-27T19:37:41.463Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-29T19:14:49.432Z","updated_at":"2026-02-27T20:37:24.615Z","avatar_url":"https://github.com/EvilFreelancer.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Enbedrus - ENglish and RUSsian emBEDDer\n\nThis is a BERT (uncased) [sentence-transformers](https://www.SBERT.net) model: It maps sentences \u0026 paragraphs to a 768 dimensional\ndense vector space and can be used for tasks like clustering or semantic search.\n\n- **Parameters**: 168 million\n- **Layers**: 12\n- **Hidden Size**: 768\n- **Attention Heads**: 12\n- **Vocabulary Size**: 119,547\n- **Maximum Sequence Length**: 512 tokens\n\nThe Enbeddrus model is designed to extract similar embeddings for comparable English and Russian phrases. It is based on\nthe [bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-cased) model and was\ntrained over 20 epochs on the following datasets:\n\n- [evilfreelancer/opus-php-en-ru-cleaned](https://huggingface.co/datasets/evilfreelancer/opus-php-en-ru-cleaned) (train): 1.6k lines\n- [evilfreelancer/golang-en-ru](https://huggingface.co/datasets/evilfreelancer/golang-en-ru) (train): 554 lines\n- [Helsinki-NLP/opus_books](https://huggingface.co/datasets/Helsinki-NLP/opus_books/viewer/en-ru) (en-ru, train): 17.5k lines\n\nThe goal of this model is to generate identical or very similar embeddings regardless of whether the text is written in\nEnglish or Russian.\n\n[Enbeddrus GGUF](https://ollama.com/evilfreelancer/enbeddrus) version available via Ollama.\n\n## Envaluation test\n\nModels tested via [encodechka](https://github.com/avidale/encodechka)\n\n\n| Name                  | evilfreelancer/enbeddrus-v0.1 | evilfreelancer/enbeddrus-v0.1-domain | evilfreelancer/enbeddrus-v0.2 |\n| --------------------- | ----------------------------- | ------------------------------------ | ----------------------------- |\n| STSBTask              | 0.6418501890569303            | 0.6418501890569303                   | 0.6382642407246252            |\n| ParaphraserTask       | 0.5396186809125094            | 0.5396186809125094                   | 0.5491558495250873            |\n| XnliTask              | 0.37045908183632736           | 0.37045908183632736                  | 0.36666666666666664           |\n| SentimentTask         | 0.7306666666666667            | 0.7306666666666667                   | 0.7246666666666667            |\n| ToxicityTask          | 0.8923319999999999            | 0.8923319999999999                   | 0.894758                      |\n| InappropriatenessTask | 0.7092166782043772            | 0.7092166782043772                   | 0.719323712657756             |\n| IntentsTask           | 0.7086                        | 0.7162                               | 0.7128                        |\n| IntentsXTask          | 0.5116                        | 0.46                                 | 0.5314                        |\n| FactRuTask            | n/a                           | n/a                                  | n/a                           |\n| RudrTask              | n/a                           | n/a                                  | n/a                           |\n| SpeedTask (cuda)      | 4.313722451527913             | 4.339381853739421                    | 4.251763025919597             |\n| SpeedTask (cpu)       | 34.0190052986145              | 34.990905125935875                   | 34.441959857940674            |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevilfreelancer%2Fenbeddrus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevilfreelancer%2Fenbeddrus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevilfreelancer%2Fenbeddrus/lists"}