https://github.com/stefan-it/co-funer

Experiments on CO-Fun NER Dataset
https://github.com/stefan-it/co-funer
Last synced: 8 months ago
JSON representation
Experiments on CO-Fun NER Dataset
Host: GitHub
URL: https://github.com/stefan-it/co-funer
Owner: stefan-it
Created: 2024-03-26T09:11:48.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-03-28T15:31:44.000Z (over 1 year ago)
Last Synced: 2025-02-08T07:23:43.256Z (9 months ago)
Language: Jupyter Notebook
Homepage:
Size: 64.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # CO-Funer

This repository shows how to fine-tune Flair models on the [CO-Fun](https://arxiv.org/abs/2403.15322) NER dataset.

## Dataset

The [Company Outsourcing in Fund Prospectuses (CO-Fun) dataset](https://arxiv.org/abs/2403.15322) consists of

948 sentences with 5,969 named entity annotations, including 2,340 Outsourced Services, 2,024 Companies, 1,594 Locations

and 11 Software annotations.

Overall, the following named entities are annotated:

* `Auslagerung` (engl. outsourcing)

* `Unternehmen` (engl. company)

* `Ort` (engl. location)

* `Software`

The CO-Fun NER dataset from the [Model Hub](https://huggingface.co/datasets/stefan-it/co-funer) is used for fine-tuning

Flair models.

## Fine-Tuning

The main fine-tuning is done in [`experiment.py`](experiment.py).

Fine-tuning can be started by calling the `run_experiment()` method and passing a so called `ExperimentConfiguration`.

In `ExperimentConfiguration` all necessary hyper-parameters and fine-tuning options are stored. The interface looks

like:

```python

class ExperimentConfiguration:

    batch_size: int

    learning_rate: float

    epoch: int

    context_size: int

    seed: int

    base_model: str

    base_model_short: str

    layers: str = "-1"

    subtoken_pooling: str = "first"

    use_crf: bool = False

    use_tensorboard: bool = True

```

A hyper-parameter search grid is defined in [`script.py`](script.py). This file is used to start the fine-tuning process.

Additionally, the script upload the Flair model to the Model Hub. The following environment variables should be set:

| Environment Variable | Description                                                                                              |

| -------------------- | -------------------------------------------------------------------------------------------------------- |

| `CONFIG`             | Should point to a configuration file in the `configs` folder, e.g. `configs/german_dbmdz_bert_base.json` |

| `HF_TOKEN`           | HF Access Token, which can be found [here](https://huggingface.co/settings/tokens)                       |

| `HUB_ORG_NAME`       | Should point to user-name or organization where the model should be uploaded to                          |

| `HF_UPLOAD`          | If this variable is set, fine-tuned Flair model won't be uploaded to the Model Hub                       |

## Hyper-Parameter Search

In this example the following hyper-parameter search grid is used:

* Batch Sizes = `[8, 16]`

* Learning Rates = `[3e-05, 5e-05]`

* Seeds = `[1, 2, 3, 4, 5]`

This means 20 models will be fine-tuned in total (2 x 2 x 5 = 20).

## Model Upload

After each model is fine-tuned, it will automatically be uploaded to the Hugging Model Hub. The following files are uploaded:

* `pytorch-model.bin`: Flair internally tracks the best model as `best-model.pt` over all epochs. To be compatible with the Model Hub the `best-model.pt`, is renamed automatically to `pytorch_model.bin`

* `training.log`: Flair stores the training log in `training.log`. This file is later needed to parse the best F1-score on development set

* `./runs`: In this folder the TensorBoard logs are stored. This enables a nice display of metrics on the Model Hub

## Model Card

Additionally, this repository shows how to automatically generate model cards for all uploaded models. This includes

also a results overview table with linked models.

The [`Example.ipynb`](Example.ipynb) notebook gives a detailed overview of all necessary steps.

## Results

We perform experiment for three BERT-based German language models:

* [German BERT](https://huggingface.co/google-bert/bert-base-german-cased)

* [German DBMDZ BERT](https://huggingface.co/dbmdz/bert-base-german-cased)

* [GBERT](https://huggingface.co/deepset/gbert-base)

The following overview table shows the best configuration (batch size, learning rate) for each model on the development

dataset:

| Model Name        | Configuration      | Seed 1       | Seed 2       | Seed 3       | Seed 4       | Seed 5          | Average             |

| ----------------- |--------------------|--------------|--------------|--------------|--------------|-----------------|---------------------|

| German BERT       | `bs8-e10-lr5e-05`  | [0.9346][1]  | [0.9388][2]  | [0.9301][3]  | [0.9291][4]  | [0.9346][5]     | 0.9334 ± 0.0039     |

| German DBMDZ BERT | `bs8-e10-lr5e-05`  | [0.9378][11] | [0.928][12]  | [0.9383][13] | [0.9374][14] | [0.9364][15]    | 0.9356 ± 0.0043     |

| GBERT             | `bs8-e10-lr5e-05`  | [0.9477][21] | [0.935][22]  | [0.9517][23] | [0.9443][24] | [0.9342][25]    | **0.9426** ± 0.0077 |

It can be seen, that GBERT has strongest performance and achieves an average F1-Score of 94.26% on the development set.

Now, we retrieve the F1-Score for the best configuration of each model on the test set:

| Model Name        | Configuration      | Seed 1       | Seed 2       | Seed 3       | Seed 4       | Seed 5          | Average             |

| ----------------- |--------------------|--------------|--------------|--------------|--------------|-----------------|---------------------|

| German BERT       | `bs8-e10-lr5e-05`  | [0.9141][1]  | [0.9159][2]  | [0.9121][3]  | [0.9062][4]  | [0.9105][5]     | 0.9118 ± 0.0033     |

| German DBMDZ BERT | `bs8-e10-lr5e-05`  | [0.9134][11] | [0.9076][12] | [0.9070][13] | [0.8821][14] | [0.9091][15]    | 0.9038 ± 0.0111     |

| GBERT             | `bs8-e10-lr5e-05`  | [0.9180][21] | [0.9117][22] | [0.9163][23] | [0.9155][24] | [0.9110][25]    | **0.9145** ± 0.0027 |

The GBERT model has strongest performance again. With 91.45% our result is close to the reported 92.2% (see Table 1 in

the CO-Fun paper).

[1]: https://hf.co/stefan-it/flair-co-funer-german_bert_base-bs8-e10-lr5e-05-1

[2]: https://hf.co/stefan-it/flair-co-funer-german_bert_base-bs8-e10-lr5e-05-2

[3]: https://hf.co/stefan-it/flair-co-funer-german_bert_base-bs8-e10-lr5e-05-3

[4]: https://hf.co/stefan-it/flair-co-funer-german_bert_base-bs8-e10-lr5e-05-4

[5]: https://hf.co/stefan-it/flair-co-funer-german_bert_base-bs8-e10-lr5e-05-5

[11]: https://hf.co/stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr5e-05-1

[12]: https://hf.co/stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr5e-05-2

[13]: https://hf.co/stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr5e-05-3

[14]: https://hf.co/stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr5e-05-4

[15]: https://hf.co/stefan-it/flair-co-funer-german_dbmdz_bert_base-bs8-e10-lr5e-05-5

[21]: https://hf.co/stefan-it/flair-co-funer-gbert_base-bs8-e10-lr5e-05-1

[22]: https://hf.co/stefan-it/flair-co-funer-gbert_base-bs8-e10-lr5e-05-2

[23]: https://hf.co/stefan-it/flair-co-funer-gbert_base-bs8-e10-lr5e-05-3

[24]: https://hf.co/stefan-it/flair-co-funer-gbert_base-bs8-e10-lr5e-05-4

[25]: https://hf.co/stefan-it/flair-co-funer-gbert_base-bs8-e10-lr5e-05-5

## Release of Fine-tuned Models

All fine-tuned models for this repository are available on the Hugging Face Model Hub incl. a working inference widget

that allows to perform NER:

![Inference Widget](images/inference-widget.png)

All fine-tuned models can be found [here](https://huggingface.co/models?search=flair-co-funer). The best models can be

found in

[this collection](https://huggingface.co/collections/stefan-it/fine-tuned-co-funer-models-66058539530368090082214f).

# Changelog

* 28.03.2024: Initial version of this repository.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/stefan-it/co-funer

Awesome Lists containing this project

README