{"id":34095146,"url":"https://github.com/dumitrescustefan/roner","last_synced_at":"2026-04-08T12:02:17.442Z","repository":{"id":57462622,"uuid":"450058100","full_name":"dumitrescustefan/roner","owner":"dumitrescustefan","description":"Named Entity Recognition for Romanian, based on transformer models","archived":false,"fork":false,"pushed_at":"2022-01-29T10:30:04.000Z","size":2508,"stargazers_count":13,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-16T18:49:47.084Z","etag":null,"topics":["ner","pip","romanian","romanian-bert","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dumitrescustefan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-01-20T10:50:54.000Z","updated_at":"2025-03-21T13:53:46.000Z","dependencies_parsed_at":"2022-09-05T17:20:59.542Z","dependency_job_id":null,"html_url":"https://github.com/dumitrescustefan/roner","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dumitrescustefan/roner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dumitrescustefan%2Froner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dumitrescustefan%2Froner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dumitrescustefan%2Froner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dumitrescustefan%2Froner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dumitrescustefan","download_url":"https://codeload.github.com/dumitrescustefan/roner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dumitrescustefan%2Froner/sbom","scorecard":{"id":359580,"data":{"date":"2025-08-11","repo":{"name":"github.com/dumitrescustefan/roner","commit":"adf3b70084b4856f43bab3a5152f95a982d8d3e1"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":1.7,"checks":[{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Code-Review","score":0,"reason":"Found 0/22 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'main'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":0,"reason":"24 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-3749-ghw9-m3mg","Warn: Project is vulnerable to: PYSEC-2022-43015 / GHSA-47fc-vmwq-366v","Warn: Project is vulnerable to: PYSEC-2025-41 / GHSA-53q9-r3pm-6pq6","Warn: Project is vulnerable to: PYSEC-2024-252 / GHSA-5pcm-hx3q-hm94","Warn: Project is vulnerable to: GHSA-887c-mr87-cxwp","Warn: Project is vulnerable to: PYSEC-2024-251 / GHSA-pg7h-5qx3-wjr3","Warn: Project is vulnerable to: PYSEC-2024-250","Warn: Project is vulnerable to: PYSEC-2024-259","Warn: Project is vulnerable to: PYSEC-2017-74","Warn: Project is vulnerable to: PYSEC-2023-299 / GHSA-282v-666c-3fvg","Warn: Project is vulnerable to: GHSA-37mw-44qp-f5jm","Warn: Project is vulnerable to: GHSA-37q5-v5qm-c9v8","Warn: Project is vulnerable to: PYSEC-2023-300 / GHSA-3863-2447-669p","Warn: Project is vulnerable to: GHSA-6rvg-6v2m-4j46","Warn: Project is vulnerable to: GHSA-9356-575x-2w9m","Warn: Project is vulnerable to: GHSA-fpwr-67px-3qhx","Warn: Project is vulnerable to: PYSEC-2024-229 / GHSA-hxxf-235m-72v3","Warn: Project is vulnerable to: GHSA-jjph-296x-mrcr","Warn: Project is vulnerable to: GHSA-phhr-52qp-3mj4","Warn: Project is vulnerable to: GHSA-q2wp-rjmx-x6x9","Warn: Project is vulnerable to: PYSEC-2025-40 / GHSA-qq3j-4f4f-9583","Warn: Project is vulnerable to: PYSEC-2024-227 / GHSA-qxrp-vhvm-j765","Warn: Project is vulnerable to: PYSEC-2023-301 / GHSA-v68g-wm8c-6x7j","Warn: Project is vulnerable to: PYSEC-2024-228 / GHSA-wrfc-pvp9-mr9g"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-18T10:27:50.421Z","repository_id":57462622,"created_at":"2025-08-18T10:27:50.421Z","updated_at":"2025-08-18T10:27:50.421Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31554110,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T10:21:54.569Z","status":"ssl_error","status_checked_at":"2026-04-08T10:21:38.171Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ner","pip","romanian","romanian-bert","transformers"],"created_at":"2025-12-14T15:08:40.950Z","updated_at":"2026-04-08T12:02:17.332Z","avatar_url":"https://github.com/dumitrescustefan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![version](https://img.shields.io/badge/version-1.0.3-green)\n![bert](https://img.shields.io/badge/model-bert--base--romanian--ner-orange)\n# RoNER\n\nRoNER is a Named Entity Recognition model based on a pre-trained [BERT transformer model](https://huggingface.co/dumitrescustefan/bert-base-romanian-ner) trained on [RONECv2](https://github.com/dumitrescustefan/ronec). It is meant to be an easy to use, high-accuracy Python package providing Romanian NER.\n\nRoNER handles _text splitting_, _word-to-subword alignment_, and it works with arbitrarily _long text sequences_ on CPU or GPU.  \n\nCheck out an [online demo at Huggingface Spaces](https://huggingface.co/spaces/dumitrescustefan/NamedEntityRecognition-Romanian)\n\n## Instalation \u0026 usage\n\nInstall with: ``pip install roner``\n\nRun with:\n```python\nimport roner\nner = roner.NER()\n\ninput_texts = [\"George merge cu trenul Cluj - Timișoara de ora 6:20.\", \n               \"Grecia are capitala la Atena.\"]\n\noutput_texts = ner(input_texts)\n\nfor output_text in output_texts:\n  print(f\"Original text: {output_text['text']}\")\n  for word in output_text['words']:\n    print(f\"{word['text']:\u003e20} = {word['tag']}\")\n```\n\n#### RoNEC input\n\nRoNER accepts either strings or lists of strings as input. If you pass a single string, it will convert it to a list containing this string.\n\n#### RoNEC output\n\nRoNER outputs a list of dictionary objects corresponding to the given input list of strings. A dictionary entry consists of:\n\n```json\n{\n  \"text\": \u003c\u003coriginal text given as input\u003e\u003e,\n  \"input_ids\": \u003c\u003ctoken ids of the original text\u003e\u003e,\n  \"words\": [{\n      \"text\": \u003c\u003ceach word\u003e\u003e,\n      \"tag\": \u003c\u003centity label\u003e\u003e\n      \"pos\": \u003c\u003cpart of speech of this word\u003e\u003e,\n      \"multi_word_entity\": \u003c\u003cTrue if this word is linked to the previous one\u003e\u003e,\n      \"span_after\": \u003c\u003cspan of text linking this word to the next\u003e\u003e,\n      \"start_char\": \u003c\u003cstart position of this word in the original text\u003e\u003e,\n      \"end_char\": \u003c\u003cend position of this word in the original text\u003e\u003e,\n      \"token_ids\": \u003c\u003clist of subtoken ids as given by the BERT tokenizer\u003e\u003e,\n      \"tag_ids\": \u003c\u003clist of BIO2 tags assigned by the model for each subtoken\u003e\u003e\n    }]\n}\n```\n\nThis information is sufficient to save word-to-subtoken alignment, to have access to the original text as well as having other usable info such as the start and end positions for each word.\n\nTo list entities, simply iterate over all the words in the dict, printing the word itself ``word['text']`` and its label ``word['tag']``.\n\n## RoNER properties and considerations\n\n\n#### Constructor options\n\nThe NER constructor has the following properties:\n\n* ``model:str`` Override this if you want to use your own pretrained model. Specify either a HuggingFace model or a folder location. If you use a different tag set than RONECv2, you need to also override the ``bio2tag_list`` option. The default model is ``dumitrescustefan/bert-base-romanian-ner``\n* ``use_gpu:bool`` Set to True if you want to use the GPU (much faster!). Default is enabled; if there is no GPU found, it falls back to CPU.\n* ``batch_size:int`` How many sequences to process in parallel. On an 11GB GPU you can use batch_size = 8. Default is 4. Larger values mean faster processing - increase until you get OOM errors.\n* ``window_size:int`` Model size. BERT uses by default 512. Change if you know what you're doing. RoNER uses this value to compute overlapping windows (will overlap last quarter of the window).\n* ``num_workers:int`` How many workers to use for feeding data to GPU/CPU. Default is 0, meaning use the main process for data loading. Safest option is to leave at 0 to avoid possible errors at forking on different OSes.\n* ``named_persons_only:bool`` Set to True to output only named persons labeled with the class PERSON. This parameter is further explained below. \n* ``verbose:bool`` Set to True to get processing info. Leave it at its default False value for peace and quiet.\n* ``bio2tag_list:list`` Default None, change only if you trained your own model with different ordering of the BIO2 tags.\n\n#### Implicit tokenization of texts\n\nPlease note that RoNER uses Stanza to handle Romanian tokenization into words and part-of-speech tagging. On first run, it will download not only the NER transformer model, but also Stanza's Romanian data package.\n\n#### 'PERSON' class handling\n\nAn important aspect that requires clarification is the handling of the ``PERSON`` label. In RONECv2, persons are not only names of persons (proper nouns, aka ``George Mihailescu``), but also any common noun that refers to a person, such as ``ea``, ``fratele`` or ``doctorul``. For applications that do not need to handle this scenario, please set the ``named_persons_only`` value to ``True`` in RoNER's constructor. \n\nWhat this does is use the part of speech tagging provided by Stanza and only set as ``PERSON``s proper nouns.\n\n#### Multi-word entities\n\nSometimes, entities span multiple words. To handle this, RoNER has a special property named ``multi_word_entity``, which, when True, means that the current entity is linked to the previous one. Single-word entities will have this property set to False, as will the _first_ word of multi-word entities. This is necessary to distinguish between sequential multi-word entities. \n\n#### Detokenization\n\nOne particular use-case for a NER is to perform text anonymization, which means to replace entities with their label. With this in mind, RoNER has a ``detokenization`` function, that, applied to the outputs, will recreate the original strings. \n\nTo perform the anonymization, iterate through all the words, and replace the word's text with its label as in ``word['text'] = word['tag']``.\nThen, simply run ``anonymized_texts = ner.detokenize(outputs)``. This will preserve spaces, new-lines and other characters.\n\n#### NER accuracy metrics\n\nFinally, because we trained the model on a modified version of RONECv2 (we performed data augumentation on the sentences, used a different training scheme and other train/validation/test splits) we are unable to compare to the standard baseline of RONECv2 as part of the original test set is now included in our training data, but we have obtained, to our knowledge, SOTA results on Romanian. This repo is meant to be used in production, and not for comparisons to other models.\n\n## BibTeX entry and citation info\n\nPlease consider citing the following [paper](https://arxiv.org/abs/1909.01247) as a thank you to the authors of the RONEC, even if it describes v1 of the corpus and you are using a model trained on v2 by the same authors: \n```\nDumitrescu, Stefan Daniel, and Andrei-Marius Avram. \"Introducing RONEC--the Romanian Named Entity Corpus.\" arXiv preprint arXiv:1909.01247 (2019).\n```\nor in .bibtex format:\n```\n@article{dumitrescu2019introducing,\n  title={Introducing RONEC--the Romanian Named Entity Corpus},\n  author={Dumitrescu, Stefan Daniel and Avram, Andrei-Marius},\n  journal={arXiv preprint arXiv:1909.01247},\n  year={2019}\n}\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdumitrescustefan%2Froner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdumitrescustefan%2Froner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdumitrescustefan%2Froner/lists"}