https://github.com/matteo-stat/transformers-nlp-ner-token-classification

This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing token classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊
https://github.com/matteo-stat/transformers-nlp-ner-token-classification

fine-tuning huggingface huggingface-pipelines huggingface-transformers inference-optimization named-entity-recognition ner nlp onnx onnxruntime token-classification transformers

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/matteo-stat/transformers-nlp-ner-token-classification
Owner: matteo-stat
Created: 2024-08-20T17:40:17.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-08-20T21:26:56.000Z (about 1 year ago)
Last Synced: 2025-03-01T23:11:27.516Z (8 months ago)
Topics: fine-tuning, huggingface, huggingface-pipelines, huggingface-transformers, inference-optimization, named-entity-recognition, ner, nlp, onnx, onnxruntime, token-classification, transformers
Language: Python
Homepage:
Size: 22.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# transformers-nlp-ner-token-classification

Welcome to the **transformers-nlp-ner-token-classification** repository! 🎉

This repo is all about fine-tuning HuggingFace Transformers for token classification (NER - Named Entity Recognition), setting up pipelines, and optimizing models for faster inference. It comes from my experience developing a custom chatbot, where multiple entities could be found in users messages.

I hope these scripts help you fine-tune and deploy your models with ease!

## Repository Structure

Here’s a quick rundown of what you’ll find in this repo:

- **`checkpoints/ner-token-classification/`**: This is where your model checkpoints will be stored during training. Save your progress and pick up where you left off!

- **`data/ner-token-classification/`**: Contains sample data for training, validation, and testing. These samples are here to demonstrate the expected format for token classification problems. Note that entities in samples are anonymized.

- **`models/ner-token-classification/`**: This is where the fine-tuned and optimized models will be saved. After fine-tuning and optimizing, you'll find your models here, ready for action!

## Scripts

Here's what each script in the repo does:

1. **`01-ner-token-classification-train.py`**
Fine-tunes a HuggingFace model on a token classification problem. If you're looking to train your model, this script is your starting point.

2. **`02-ner-token-classification-pipeline.py`**
Builds a pipeline for running inference with your fine-tuned model. This script allows you to run inference on single or multiple samples effortlessly.

3. **`03-ner-token-classification-optimize-model-for-inference.py`**
Optimizes your model for faster inference on CPU using ONNX Runtime. Perfect for when you're working on a development server with limited GPU memory.

4. **`04-ner-token-classification-pipeline-inference-optmized-model.py`**
Similar to the `02` script, but specifically for inference with the optimized model (using ONNX Runtime). Get faster predictions using a CPU!

## Requirements and Installation Warnings

Before you dive into the scripts, here are a few important notes about the dependencies and installation process:

### Dependency Files

- **`requirements-without-inference-optimization.txt`**
Includes dependencies for scripts `01-ner-token-classification-train.py` and `02-ner-token-classification-pipeline.py` (excludes ONNX Runtime dependencies).

- **`requirements-with-inference-optimization.txt`**
Includes dependencies for all scripts, including ONNX Runtime dependencies for optimization and inference.

### Note for PyTorch and NVIDIA GPUs

If you are using PyTorch with an NVIDIA GPU, it's crucial to ensure you have the correct version of PyTorch installed. Before running the requirements installation, you should install the specific version of PyTorch compatible with your CUDA version (cuda 12.1 in the example below):

```bash
pip install torch==2.3.1 --index-url https://download.pytorch.org/whl/cu121

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/matteo-stat/transformers-nlp-ner-token-classification

Awesome Lists containing this project

README