https://github.com/nicolay-r/bulk-ner
Tiny no-string framework for a quick third-party models binding for entities extraction from cells of long tabular data
https://github.com/nicolay-r/bulk-ner
arekit bert bert-model colab colab-notebook deeppavlov ner pipelines spreadsheet transformer-model transformers
Last synced: 12 months ago
JSON representation
Tiny no-string framework for a quick third-party models binding for entities extraction from cells of long tabular data
- Host: GitHub
- URL: https://github.com/nicolay-r/bulk-ner
- Owner: nicolay-r
- License: mit
- Created: 2024-01-02T12:24:06.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-10T13:14:44.000Z (over 1 year ago)
- Last Synced: 2025-05-25T09:45:45.280Z (about 1 year ago)
- Topics: arekit, bert, bert-model, colab, colab-notebook, deeppavlov, ner, pipelines, spreadsheet, transformer-model, transformers
- Language: Python
- Homepage: https://github.com/nicolay-r/AREkit
- Size: 128 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# bulk-ner 0.24.1


[](https://colab.research.google.com/github/nicolay-r/ner-service/blob/main/NER_annotation_service.ipynb)
[](https://x.com/nicolayr_/status/1842300499011260827)
[](https://pypistats.org/packages/bulk-ner)
A no-strings inference implementation framework [Named Entity Recognition (NER)](https://en.wikipedia.org/wiki/Named-entity_recognition) service of wrapped AI models powered by
[AREkit](https://github.com/nicolay-r/AREkit) and the related [text-processing pipelines](https://github.com/nicolay-r/AREkit/wiki/Pipelines:-Text-Processing).
The key benefits of this tiny framework are as follows:
1. ☑️ Native support of batching;
2. ☑️ Native long-input contexts handling.
# Installation
```bash
pip install bulk-ner==0.24.1
```
# Usage
This is an example for using `DeepPavlov==1.3.0` as an adapter for NER models passed via `--adapter` parameter:
```bash
python -m bulk_ner.annotate \
--src "test/data/test.tsv" \
--prompt "{text}" \
--batch-size 10 \
--adapter "dynamic:models/dp_130.py:DeepPavlovNER" \
--output "test-annotated.jsonl" \
%% \
--model "ner_ontonotes_bert_mult"
```
You can choose the other models via `--model` parameter.
List of the supported models is available here:
https://docs.deeppavlov.ai/en/master/features/models/NER.html
## Deploy your model
> **Quick example**: Check out the [default DeepPavlov wrapper implementation](/models/dp_130.py)
All you have to do is to implement the `BaseNER` class that has the following protected method:
* `_forward(sequences)` -- expected to return two lists of the same length:
* `terms` -- related to the list of atomic elements of the text (usually words)
* `labels` -- B-I-O labels for each term.
## Powered by
* AREkit [[github]](https://github.com/nicolay-r/AREkit)