https://github.com/ankane/transformers-ruby
State-of-the-art transformers for Ruby
https://github.com/ankane/transformers-ruby
Last synced: 7 months ago
JSON representation
State-of-the-art transformers for Ruby
- Host: GitHub
- URL: https://github.com/ankane/transformers-ruby
- Owner: ankane
- License: apache-2.0
- Created: 2024-08-19T18:41:02.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-12-29T22:43:32.000Z (over 1 year ago)
- Last Synced: 2025-04-14T22:21:43.261Z (about 1 year ago)
- Language: Ruby
- Size: 179 KB
- Stars: 710
- Watchers: 10
- Forks: 11
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Transformers.rb
:slightly_smiling_face: State-of-the-art [transformers](https://github.com/huggingface/transformers) for Ruby
For fast inference, check out [Informers](https://github.com/ankane/informers) :fire:
[](https://github.com/ankane/transformers-ruby/actions)
## Installation
First, [install Torch.rb](https://github.com/ankane/torch.rb#installation).
Then add this line to your application’s Gemfile:
```ruby
gem "transformers-rb"
```
## Getting Started
- [Models](#models)
- [Pipelines](#pipelines)
## Models
Embedding
- [sentence-transformers/all-MiniLM-L6-v2](#sentence-transformersall-MiniLM-L6-v2)
- [sentence-transformers/multi-qa-MiniLM-L6-cos-v1](#sentence-transformersmulti-qa-MiniLM-L6-cos-v1)
- [sentence-transformers/all-mpnet-base-v2](#sentence-transformersall-mpnet-base-v2)
- [sentence-transformers/paraphrase-MiniLM-L6-v2](#sentence-transformersparaphrase-minilm-l6-v2)
- [mixedbread-ai/mxbai-embed-large-v1](#mixedbread-aimxbai-embed-large-v1)
- [thenlper/gte-small](#thenlpergte-small)
- [intfloat/e5-base-v2](#intfloate5-base-v2)
- [BAAI/bge-base-en-v1.5](#baaibge-base-en-v15)
- [Snowflake/snowflake-arctic-embed-m-v1.5](#snowflakesnowflake-arctic-embed-m-v15)
Sparse embedding
- [opensearch-project/opensearch-neural-sparse-encoding-v1](#opensearch-projectopensearch-neural-sparse-encoding-v1)
Reranking
- [mixedbread-ai/mxbai-rerank-base-v1](#mixedbread-aimxbai-rerank-base-v1)
- [BAAI/bge-reranker-base](#baaibge-reranker-base)
### sentence-transformers/all-MiniLM-L6-v2
[Docs](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
```ruby
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Transformers.pipeline("embedding", "sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.(sentences)
```
### sentence-transformers/multi-qa-MiniLM-L6-cos-v1
[Docs](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1)
```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Transformers.pipeline("embedding", "sentence-transformers/multi-qa-MiniLM-L6-cos-v1")
query_embedding = model.(query)
doc_embeddings = model.(docs)
scores = doc_embeddings.map { |e| e.zip(query_embedding).sum { |d, q| d * q } }
doc_score_pairs = docs.zip(scores).sort_by { |d, s| -s }
```
### sentence-transformers/all-mpnet-base-v2
[Docs](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
```ruby
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Transformers.pipeline("embedding", "sentence-transformers/all-mpnet-base-v2")
embeddings = model.(sentences)
```
### sentence-transformers/paraphrase-MiniLM-L6-v2
[Docs](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2)
```ruby
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Transformers.pipeline("embedding", "sentence-transformers/paraphrase-MiniLM-L6-v2")
embeddings = model.(sentences)
```
### mixedbread-ai/mxbai-embed-large-v1
[Docs](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1)
```ruby
query_prefix = "Represent this sentence for searching relevant passages: "
input = [
"The dog is barking",
"The cat is purring",
query_prefix + "puppy"
]
model = Transformers.pipeline("embedding", "mixedbread-ai/mxbai-embed-large-v1")
embeddings = model.(input)
```
### thenlper/gte-small
[Docs](https://huggingface.co/thenlper/gte-small)
```ruby
sentences = ["That is a happy person", "That is a very happy person"]
model = Transformers.pipeline("embedding", "thenlper/gte-small")
embeddings = model.(sentences)
```
### intfloat/e5-base-v2
[Docs](https://huggingface.co/intfloat/e5-base-v2)
```ruby
doc_prefix = "passage: "
query_prefix = "query: "
input = [
doc_prefix + "Ruby is a programming language created by Matz",
query_prefix + "Ruby creator"
]
model = Transformers.pipeline("embedding", "intfloat/e5-base-v2")
embeddings = model.(input)
```
### BAAI/bge-base-en-v1.5
[Docs](https://huggingface.co/BAAI/bge-base-en-v1.5)
```ruby
query_prefix = "Represent this sentence for searching relevant passages: "
input = [
"The dog is barking",
"The cat is purring",
query_prefix + "puppy"
]
model = Transformers.pipeline("embedding", "BAAI/bge-base-en-v1.5")
embeddings = model.(input)
```
### Snowflake/snowflake-arctic-embed-m-v1.5
[Docs](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5)
```ruby
query_prefix = "Represent this sentence for searching relevant passages: "
input = [
"The dog is barking",
"The cat is purring",
query_prefix + "puppy"
]
model = Transformers.pipeline("embedding", "Snowflake/snowflake-arctic-embed-m-v1.5")
embeddings = model.(input, pooling: "cls")
```
### opensearch-project/opensearch-neural-sparse-encoding-v1
[Docs](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v1)
```ruby
docs = ["The dog is barking", "The cat is purring", "The bear is growling"]
model_id = "opensearch-project/opensearch-neural-sparse-encoding-v1"
model = Transformers::AutoModelForMaskedLM.from_pretrained(model_id)
tokenizer = Transformers::AutoTokenizer.from_pretrained(model_id)
special_token_ids = tokenizer.special_tokens_map.map { |_, token| tokenizer.vocab[token] }
feature = tokenizer.(docs, padding: true, truncation: true, return_tensors: "pt", return_token_type_ids: false)
output = model.(**feature)[0]
values, _ = Torch.max(output * feature[:attention_mask].unsqueeze(-1), dim: 1)
values = Torch.log(1 + Torch.relu(values))
values[0.., special_token_ids] = 0
embeddings = values.to_a
```
### mixedbread-ai/mxbai-rerank-base-v1
[Docs](https://huggingface.co/mixedbread-ai/mxbai-rerank-base-v1)
```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Transformers.pipeline("reranking", "mixedbread-ai/mxbai-rerank-base-v1")
result = model.(query, docs)
```
### BAAI/bge-reranker-base
[Docs](https://huggingface.co/BAAI/bge-reranker-base)
```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Transformers.pipeline("reranking", "BAAI/bge-reranker-base")
result = model.(query, docs)
```
## Pipelines
- [Text](#text)
- [Vision](#vision)
### Text
Embedding
```ruby
embed = Transformers.pipeline("embedding")
embed.("We are very happy to show you the 🤗 Transformers library.")
```
Reranking
```ruby
rerank = Informers.pipeline("reranking")
rerank.("Who created Ruby?", ["Matz created Ruby", "Another doc"])
```
Named-entity recognition
```ruby
ner = Transformers.pipeline("ner")
ner.("Ruby is a programming language created by Matz")
```
Sentiment analysis
```ruby
classifier = Transformers.pipeline("sentiment-analysis")
classifier.("We are very happy to show you the 🤗 Transformers library.")
```
Question answering
```ruby
qa = Transformers.pipeline("question-answering")
qa.(question: "Who invented Ruby?", context: "Ruby is a programming language created by Matz")
```
Feature extraction
```ruby
extractor = Transformers.pipeline("feature-extraction")
extractor.("We are very happy to show you the 🤗 Transformers library.")
```
### Vision
Image classification
```ruby
classifier = Transformers.pipeline("image-classification")
classifier.("image.jpg")
```
Image feature extraction
```ruby
extractor = Transformers.pipeline("image-feature-extraction")
extractor.("image.jpg")
```
## API
This library follows the [Transformers Python API](https://huggingface.co/docs/transformers/index). The following model architectures are currently supported:
- BERT
- DeBERTa-v2
- DistilBERT
- MPNet
- ViT
- XLM-RoBERTa
## History
View the [changelog](https://github.com/ankane/transformers-ruby/blob/master/CHANGELOG.md)
## Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- [Report bugs](https://github.com/ankane/transformers-ruby/issues)
- Fix bugs and [submit pull requests](https://github.com/ankane/transformers-ruby/pulls)
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
```sh
git clone https://github.com/ankane/transformers-ruby.git
cd transformers-ruby
bundle install
bundle exec rake download:files
bundle exec rake test
```