Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fcakyon/video-transformers
Easiest way of fine-tuning HuggingFace video classification models
https://github.com/fcakyon/video-transformers
accelerate classification deep-learning evaluate huggingface layer machine-learning neptune onnx onnxruntime python pytorch pytorch-video tensorboard transformers video video-classification video-transformer vision wandb
Last synced: 13 days ago
JSON representation
Easiest way of fine-tuning HuggingFace video classification models
- Host: GitHub
- URL: https://github.com/fcakyon/video-transformers
- Owner: fcakyon
- License: mit
- Created: 2022-08-12T13:52:28.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-03-20T20:43:24.000Z (over 1 year ago)
- Last Synced: 2024-10-23T06:07:21.551Z (21 days ago)
- Topics: accelerate, classification, deep-learning, evaluate, huggingface, layer, machine-learning, neptune, onnx, onnxruntime, python, pytorch, pytorch-video, tensorboard, transformers, video, video-classification, video-transformer, vision, wandb
- Language: Python
- Homepage:
- Size: 72.3 KB
- Stars: 132
- Watchers: 5
- Forks: 13
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
Easiest way of fine-tuning HuggingFace video classification models.## π Features
`video-transformers` uses:
- π€ [accelerate](https://github.com/huggingface/accelerate) for distributed training,
- π€ [evaluate](https://github.com/huggingface/evaluate) for evaluation,
- [pytorchvideo](https://github.com/facebookresearch/pytorchvideo) for dataloading
and supports:
- creating and fine-tunining video models using [transformers](https://github.com/huggingface/transformers) and [timm](https://github.com/rwightman/pytorch-image-models) vision models
- experiment tracking with [neptune](https://neptune.ai/), [tensorboard](https://www.tensorflow.org/tensorboard) and other trackers
- exporting fine-tuned models in [ONNX](https://onnx.ai/) format
- pushing fine-tuned models into [HuggingFace Hub](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads)
- loading pretrained models from [HuggingFace Hub](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads)
- Automated [Gradio app](https://gradio.app/), and [space](https://huggingface.co/spaces) creation
## π Installation
- Install `Pytorch`:
```bash
conda install pytorch=1.11.0 torchvision=0.12.0 cudatoolkit=11.3 -c pytorch
```- Install pytorchvideo and transformers from main branch:
```bash
pip install git+https://github.com/facebookresearch/pytorchvideo.git
pip install git+https://github.com/huggingface/transformers.git
```- Install `video-transformers`:
```bash
pip install video-transformers
```## π₯ Usage
- Prepare video classification dataset in such folder structure (.avi and .mp4 extensions are supported):
```bash
train_root
label_1
video_1
video_2
...
label_2
video_1
video_2
...
...
val_root
label_1
video_1
video_2
...
label_2
video_1
video_2
...
...
```- Fine-tune Timesformer (from HuggingFace) video classifier:
```python
from torch.optim import AdamW
from video_transformers import VideoModel
from video_transformers.backbones.transformers import TransformersBackbone
from video_transformers.data import VideoDataModule
from video_transformers.heads import LinearHead
from video_transformers.trainer import trainer_factory
from video_transformers.utils.file import download_ucf6backbone = TransformersBackbone("facebook/timesformer-base-finetuned-k400", num_unfrozen_stages=1)
download_ucf6("./")
datamodule = VideoDataModule(
train_root="ucf6/train",
val_root="ucf6/val",
batch_size=4,
num_workers=4,
num_timesteps=8,
preprocess_input_size=224,
preprocess_clip_duration=1,
preprocess_means=backbone.mean,
preprocess_stds=backbone.std,
preprocess_min_short_side=256,
preprocess_max_short_side=320,
preprocess_horizontal_flip_p=0.5,
)head = LinearHead(hidden_size=backbone.num_features, num_classes=datamodule.num_classes)
model = VideoModel(backbone, head)optimizer = AdamW(model.parameters(), lr=1e-4)
Trainer = trainer_factory("single_label_classification")
trainer = Trainer(datamodule, model, optimizer=optimizer, max_epochs=8)trainer.fit()
```
- Fine-tune ConvNeXT (from HuggingFace) + Transformer based video classifier:
```python
from torch.optim import AdamW
from video_transformers import TimeDistributed, VideoModel
from video_transformers.backbones.transformers import TransformersBackbone
from video_transformers.data import VideoDataModule
from video_transformers.heads import LinearHead
from video_transformers.necks import TransformerNeck
from video_transformers.trainer import trainer_factory
from video_transformers.utils.file import download_ucf6backbone = TimeDistributed(TransformersBackbone("facebook/convnext-small-224", num_unfrozen_stages=1))
neck = TransformerNeck(
num_features=backbone.num_features,
num_timesteps=8,
transformer_enc_num_heads=4,
transformer_enc_num_layers=2,
dropout_p=0.1,
)download_ucf6("./")
datamodule = VideoDataModule(
train_root="ucf6/train",
val_root="ucf6/val",
batch_size=4,
num_workers=4,
num_timesteps=8,
preprocess_input_size=224,
preprocess_clip_duration=1,
preprocess_means=backbone.mean,
preprocess_stds=backbone.std,
preprocess_min_short_side=256,
preprocess_max_short_side=320,
preprocess_horizontal_flip_p=0.5,
)head = LinearHead(hidden_size=neck.num_features, num_classes=datamodule.num_classes)
model = VideoModel(backbone, head, neck)optimizer = AdamW(model.parameters(), lr=1e-4)
Trainer = trainer_factory("single_label_classification")
trainer = Trainer(
datamodule,
model,
optimizer=optimizer,
max_epochs=8
)trainer.fit()
```
- Fine-tune Resnet18 (from HuggingFace) + GRU based video classifier:
```python
from video_transformers import TimeDistributed, VideoModel
from video_transformers.backbones.transformers import TransformersBackbone
from video_transformers.data import VideoDataModule
from video_transformers.heads import LinearHead
from video_transformers.necks import GRUNeck
from video_transformers.trainer import trainer_factory
from video_transformers.utils.file import download_ucf6backbone = TimeDistributed(TransformersBackbone("microsoft/resnet-18", num_unfrozen_stages=1))
neck = GRUNeck(num_features=backbone.num_features, hidden_size=128, num_layers=2, return_last=True)download_ucf6("./")
datamodule = VideoDataModule(
train_root="ucf6/train",
val_root="ucf6/val",
batch_size=4,
num_workers=4,
num_timesteps=8,
preprocess_input_size=224,
preprocess_clip_duration=1,
preprocess_means=backbone.mean,
preprocess_stds=backbone.std,
preprocess_min_short_side=256,
preprocess_max_short_side=320,
preprocess_horizontal_flip_p=0.5,
)head = LinearHead(hidden_size=neck.hidden_size, num_classes=datamodule.num_classes)
model = VideoModel(backbone, head, neck)Trainer = trainer_factory("single_label_classification")
trainer = Trainer(
datamodule,
model,
max_epochs=8
)trainer.fit()
```
- Perform prediction for a single file or folder of videos:
```python
from video_transformers import VideoModelmodel = VideoModel.from_pretrained(model_name_or_path)
model.predict(video_or_folder_path="video.mp4")
>> [{'filename': "video.mp4", 'predictions': {'class1': 0.98, 'class2': 0.02}}]
```## π€ Full HuggingFace Integration
- Push your fine-tuned model to the hub:
```python
from video_transformers import VideoModelmodel = VideoModel.from_pretrained("runs/exp/checkpoint")
model.push_to_hub('model_name')
```- Load any pretrained video-transformer model from the hub:
```python
from video_transformers import VideoModelmodel = VideoModel.from_pretrained("runs/exp/checkpoint")
model.from_pretrained('account_name/model_name')
```- Push your model to HuggingFace hub with auto-generated model-cards:
```python
from video_transformers import VideoModelmodel = VideoModel.from_pretrained("runs/exp/checkpoint")
model.push_to_hub('account_name/app_name')
```- (Incoming feature) Push your model as a Gradio app to HuggingFace Space:
```python
from video_transformers import VideoModelmodel = VideoModel.from_pretrained("runs/exp/checkpoint")
model.push_to_space('account_name/app_name')
```## π Multiple tracker support
- Tensorboard tracker is enabled by default.
- To add Neptune/Layer ... tracking:
```python
from video_transformers.tracking import NeptuneTracker
from accelerate.tracking import WandBTrackertrackers = [
NeptuneTracker(EXPERIMENT_NAME, api_token=NEPTUNE_API_TOKEN, project=NEPTUNE_PROJECT),
WandBTracker(project_name=WANDB_PROJECT)
]trainer = Trainer(
datamodule,
model,
trackers=trackers
)```
## πΈοΈ ONNX support
- Convert your trained models into ONNX format for deployment:
```python
from video_transformers import VideoModelmodel = VideoModel.from_pretrained("runs/exp/checkpoint")
model.to_onnx(quantize=False, opset_version=12, export_dir="runs/exports/", export_filename="model.onnx")
```## π€ Gradio support
- Convert your trained models into Gradio App for deployment:
```python
from video_transformers import VideoModelmodel = VideoModel.from_pretrained("runs/exp/checkpoint")
model.to_gradio(examples=['video.mp4'], export_dir="runs/exports/", export_filename="app.py")
```## Contributing
Before opening a PR:
- Install required development packages:
```bash
pip install -e ."[dev]"
```- Reformat with black and isort:
```bash
python -m tests.run_code_style format
```