https://github.com/nicolay-r/bulk-ner

Tiny no-string framework for a quick third-party models binding for entities extraction from cells of long tabular data
https://github.com/nicolay-r/bulk-ner

arekit bert bert-model colab colab-notebook deeppavlov ner pipelines spreadsheet transformer-model transformers

Last synced: 12 months ago
JSON representation

Tiny no-string framework for a quick third-party models binding for entities extraction from cells of long tabular data

Host: GitHub
URL: https://github.com/nicolay-r/bulk-ner
Owner: nicolay-r
License: mit
Created: 2024-01-02T12:24:06.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-03-10T13:14:44.000Z (over 1 year ago)
Last Synced: 2025-05-25T09:45:45.280Z (about 1 year ago)
Topics: arekit, bert, bert-model, colab, colab-notebook, deeppavlov, ner, pipelines, spreadsheet, transformer-model, transformers
Language: Python
Homepage: https://github.com/nicolay-r/AREkit
Size: 128 KB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # bulk-ner 0.24.1 

![](https://img.shields.io/badge/Python-3.9-brightgreen.svg)

![](https://img.shields.io/badge/AREkit-0.25.0-orange.svg)

[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nicolay-r/ner-service/blob/main/NER_annotation_service.ipynb)

[![twitter](https://img.shields.io/twitter/url/https/shields.io.svg?style=social)](https://x.com/nicolayr_/status/1842300499011260827)

[![PyPI downloads](https://img.shields.io/pypi/dm/bulk-ner.svg)](https://pypistats.org/packages/bulk-ner)



    



A no-strings inference implementation framework [Named Entity Recognition (NER)](https://en.wikipedia.org/wiki/Named-entity_recognition) service of wrapped AI models powered by 

[AREkit](https://github.com/nicolay-r/AREkit) and the related [text-processing pipelines](https://github.com/nicolay-r/AREkit/wiki/Pipelines:-Text-Processing).

The key benefits of this tiny framework are as follows:

1. ☑️ Native support of batching;

2. ☑️ Native long-input contexts handling.

# Installation

```bash

pip install bulk-ner==0.24.1

```

# Usage

This is an example for using `DeepPavlov==1.3.0` as an adapter for NER models passed via `--adapter` parameter:

```bash

python -m bulk_ner.annotate \

    --src "test/data/test.tsv" \

    --prompt "{text}" \

    --batch-size 10 \

    --adapter "dynamic:models/dp_130.py:DeepPavlovNER" \

    --output "test-annotated.jsonl" \

    %% \

    --model "ner_ontonotes_bert_mult"

```

You can choose the other models via `--model` parameter.

List of the supported models is available here: 

https://docs.deeppavlov.ai/en/master/features/models/NER.html

## Deploy your model

> **Quick example**: Check out the [default DeepPavlov wrapper implementation](/models/dp_130.py)

All you have to do is to implement the `BaseNER` class that has the following protected method:

* `_forward(sequences)` -- expected to return two lists of the same length:

    * `terms` -- related to the list of atomic elements of the text (usually words)

    * `labels` -- B-I-O labels for each term.

  

## Powered by

* AREkit [[github]](https://github.com/nicolay-r/AREkit)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nicolay-r/bulk-ner

Awesome Lists containing this project

README