https://github.com/matteo-stat/transformers-nlp-ner-token-classification
This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing token classification models for inference. They are based on my experience developing a custom chatbot, Iโm sharing these in the hope they will help others to quickly fine-tune and use models in their projects! ๐
https://github.com/matteo-stat/transformers-nlp-ner-token-classification
fine-tuning huggingface huggingface-pipelines huggingface-transformers inference-optimization named-entity-recognition ner nlp onnx onnxruntime token-classification transformers
Last synced: 3 months ago
JSON representation
This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing token classification models for inference. They are based on my experience developing a custom chatbot, Iโm sharing these in the hope they will help others to quickly fine-tune and use models in their projects! ๐
- Host: GitHub
- URL: https://github.com/matteo-stat/transformers-nlp-ner-token-classification
- Owner: matteo-stat
- Created: 2024-08-20T17:40:17.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-08-20T21:26:56.000Z (9 months ago)
- Last Synced: 2025-01-12T12:42:49.054Z (4 months ago)
- Topics: fine-tuning, huggingface, huggingface-pipelines, huggingface-transformers, inference-optimization, named-entity-recognition, ner, nlp, onnx, onnxruntime, token-classification, transformers
- Language: Python
- Homepage:
- Size: 22.5 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# transformers-nlp-ner-token-classification
Welcome to the **transformers-nlp-ner-token-classification** repository! ๐
This repo is all about fine-tuning HuggingFace Transformers for token classification (NER - Named Entity Recognition), setting up pipelines, and optimizing models for faster inference. It comes from my experience developing a custom chatbot, where multiple entities could be found in users messages.
I hope these scripts help you fine-tune and deploy your models with ease!
## Repository Structure
Hereโs a quick rundown of what youโll find in this repo:
- **`checkpoints/ner-token-classification/`**: This is where your model checkpoints will be stored during training. Save your progress and pick up where you left off!
- **`data/ner-token-classification/`**: Contains sample data for training, validation, and testing. These samples are here to demonstrate the expected format for token classification problems. Note that entities in samples are anonymized.
- **`models/ner-token-classification/`**: This is where the fine-tuned and optimized models will be saved. After fine-tuning and optimizing, you'll find your models here, ready for action!
## Scripts
Here's what each script in the repo does:
1. **`01-ner-token-classification-train.py`**
Fine-tunes a HuggingFace model on a token classification problem. If you're looking to train your model, this script is your starting point.2. **`02-ner-token-classification-pipeline.py`**
Builds a pipeline for running inference with your fine-tuned model. This script allows you to run inference on single or multiple samples effortlessly.3. **`03-ner-token-classification-optimize-model-for-inference.py`**
Optimizes your model for faster inference on CPU using ONNX Runtime. Perfect for when you're working on a development server with limited GPU memory.4. **`04-ner-token-classification-pipeline-inference-optmized-model.py`**
Similar to the `02` script, but specifically for inference with the optimized model (using ONNX Runtime). Get faster predictions using a CPU!## Requirements and Installation Warnings
Before you dive into the scripts, here are a few important notes about the dependencies and installation process:
### Dependency Files
- **`requirements-without-inference-optimization.txt`**
Includes dependencies for scripts `01-ner-token-classification-train.py` and `02-ner-token-classification-pipeline.py` (excludes ONNX Runtime dependencies).- **`requirements-with-inference-optimization.txt`**
Includes dependencies for all scripts, including ONNX Runtime dependencies for optimization and inference.### Note for PyTorch and NVIDIA GPUs
If you are using PyTorch with an NVIDIA GPU, it's crucial to ensure you have the correct version of PyTorch installed. Before running the requirements installation, you should install the specific version of PyTorch compatible with your CUDA version (cuda 12.1 in the example below):
```bash
pip install torch==2.3.1 --index-url https://download.pytorch.org/whl/cu121