https://github.com/jina-ai/finetuner

:dart: Task-oriented embedding tuning for BERT, CLIP, etc.
https://github.com/jina-ai/finetuner

bert few-shot-learning fine-tuning finetuning jina metric-learning negative-sampling neural-search openai-clip pretrained-models siamese-network similarity-learning transfer-learning triplet-loss

Last synced: 5 months ago
JSON representation

:dart: Task-oriented embedding tuning for BERT, CLIP, etc.

Host: GitHub
URL: https://github.com/jina-ai/finetuner
Owner: jina-ai
License: apache-2.0
Archived: true
Created: 2021-08-11T13:15:43.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2024-03-11T08:05:13.000Z (over 1 year ago)
Last Synced: 2025-01-20T09:46:57.738Z (5 months ago)
Topics: bert, few-shot-learning, fine-tuning, finetuning, jina, metric-learning, negative-sampling, neural-search, openai-clip, pretrained-models, siamese-network, similarity-learning, transfer-learning, triplet-loss
Language: Python
Homepage: https://finetuner.jina.ai
Size: 71.5 MB
Stars: 1,486
Watchers: 28
Forks: 67
Open Issues: 8
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - jina-ai/finetuner
awesome-production-machine-learning - Finetuner - ai/finetuner.svg?style=social) - Finetuner provides an effective way to improve performance on neural search tasks. (Neural Search and Retrieval)

README

Task-oriented finetuning for better embeddings on neural search

Fine-tuning is an effective way to improve performance on [neural search](https://jina.ai/news/what-is-neural-search-and-learn-to-build-a-neural-search-engine/) tasks.
However, setting up and performing fine-tuning can be very time-consuming and resource-intensive.

Jina AI's Finetuner makes fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure in the cloud.
With Finetuner, you can easily enhance the performance of pre-trained models,
making them production-ready [without extensive labeling](https://jina.ai/news/fine-tuning-with-low-budget-and-high-expectations/) or expensive hardware.

🎏 **Better embeddings**: Create high-quality embeddings for semantic search, visual similarity search, cross-modal text<->image search, recommendation systems,
clustering, duplication detection, anomaly detection, or other uses.

⏰ **Low budget, high expectations**: Bring considerable improvements to model performance, making the most out of as little as a few hundred training samples, and finish fine-tuning in as little as an hour.

📈 **Performance promise**: Enhance the performance of pre-trained models so that they deliver state-of-the-art performance on
domain-specific applications.

🔱 **Simple yet powerful**: Easy access to 40+ mainstream loss functions, 10+ optimizers, layer pruning, weight
freezing, dimensionality reduction, hard-negative mining, cross-modal models, and distributed training.

☁ **All-in-cloud**: Train using our GPU infrastructure, manage runs, experiments, and artifacts on Jina AI Cloud
without worrying about resource availability, complex integration, or infrastructure costs.

## [Documentation](https://finetuner.jina.ai/)

## Pretrained Text Embedding Models

| name | parameter | dimension | Huggingface |
|------------------------|-----------|-----------|--------------------------------------------------------|
| jina-embedding-t-en-v1 | 14m | 312 | [link](https://huggingface.co/jinaai/jina-embedding-t-en-v1) |
| jina-embedding-s-en-v1 | 35m | 512 | [link](https://huggingface.co/jinaai/jina-embedding-s-en-v1) |
| jina-embedding-b-en-v1 | 110m | 768 | [link](https://huggingface.co/jinaai/jina-embedding-b-en-v1) |
| jina-embedding-l-en-v1 | 330m | 1024 | [link](https://huggingface.co/jinaai/jina-embedding-l-en-v1) |

## Benchmarks

Model
Task
Metric
Pretrained
Finetuned
Delta
Run it!

BERT
Quora Question Answering
mRR
0.835
0.967
15.8%

Recall
0.915
0.963
5.3%

ResNet
Visual similarity search on TLL
mAP
0.110
0.196
78.2%

Recall
0.249
0.460
84.7%

CLIP
Deep Fashion text-to-image search
mRR
0.575
0.676
17.4%

Recall
0.473
0.564
19.2%

M-CLIP
Cross market product recommendation (German)
mRR
0.430
0.648
50.7%

Recall
0.247
0.340
37.7%

PointNet++
ModelNet40 3D Mesh Search
mRR
0.791
0.891
12.7%

Recall
0.154
0.242
57.1%

_{^{All metrics were evaluated for k@20 after training for 5 epochs using the Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models, 5e-4 for PointNet++}}

## Install

Make sure you have Python 3.8+ installed. Finetuner can be installed via `pip` by executing:

```bash
pip install -U finetuner
```

If you want to submit a fine-tuning job on the cloud, please use

```bash
pip install "finetuner[full]"
```

> ⚠️ Starting with version 0.5.0, Finetuner computing is performed on Jina AI Cloud. The last local version is `0.4.1`.
> This version is still available for installation via `pip`. See [Finetuner git tags and releases](https://github.com/jina-ai/finetuner/releases).

## Articles about Finetuner

Check out our published blogposts and tutorials to see Finetuner in action!

- [Fine-tuning with Low Budget and High Expectations](https://jina.ai/news/fine-tuning-with-low-budget-and-high-expectations/)
- [Hype and Hybrids: Search is more than Keywords and Vectors](https://jina.ai/news/hype-and-hybrids-multimodal-search-means-more-than-keywords-and-vectors-2/)
- [Improving Search Quality for Non-English Queries with Fine-tuned Multilingual CLIP Models](https://jina.ai/news/improving-search-quality-non-english-queries-fine-tuned-multilingual-clip-models/)
- [How Much Do We Get by Finetuning CLIP?](https://jina.ai/news/applying-jina-ai-finetuner-to-clip-less-data-smaller-models-higher-performance/)

If you find Jina Embeddings useful in your research, please cite the following paper:

```text
@misc{günther2023jina,
title={Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models},
author={Michael Günther and Louis Milliken and Jonathan Geuter and Georgios Mastrapas and Bo Wang and Han Xiao},
year={2023},
eprint={2307.11224},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

```

## Support

- Use [Discussions](https://github.com/jina-ai/finetuner/discussions) to talk about your use cases, questions, and
support queries.
- Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
- Join our [Engineering All Hands](https://youtube.com/playlist?list=PL3UBBWOUVhFYRUa_gpYYKBqEAkO4sxmne) meet-up to discuss your use case and learn Jina AI new features.
- **When?** The second Tuesday of every month
- **Where?**
Zoom ([see our public events calendar](https://calendar.google.com/calendar/embed?src=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&ctz=Europe%2FBerlin)/[.ical](https://calendar.google.com/calendar/ical/c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com/public/basic.ics))
and [live stream on YouTube](https://youtube.com/c/jina-ai)
- Subscribe to the latest video tutorials on our [YouTube channel](https://youtube.com/c/jina-ai)

## Join Us

Finetuner is backed by [Jina AI](https://jina.ai) and licensed under [Apache-2.0](./LICENSE).

[We are actively hiring](https://jobs.jina.ai) AI engineers and solution engineers to build the next generation of
open-source AI ecosystems.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jina-ai/finetuner

Awesome Lists containing this project

README