An open API service indexing awesome lists of open source software.

https://github.com/michael-0acf4/anitag2vec

Generate vector embeddings from Danbooru, Sakugabooru, Pixiv, MAL style tags.
https://github.com/michael-0acf4/anitag2vec

cosine-similarity danbooru deepset myanimelist myanimelist-filter pixiv pytorch pytorch-implementation ranking-algorithm sakugabooru set-embedding tag-embedding transformer vector-embeddings

Last synced: about 2 months ago
JSON representation

Generate vector embeddings from Danbooru, Sakugabooru, Pixiv, MAL style tags.

Awesome Lists containing this project

README

          

# anitag2vec

anitag2vec is a vector embedding primarily focused on Danbooru, Sakugabooru, Pixiv, MAL, etc type of tags.

# Why?

If you have your own local gallery or index of things you like, which, to be fair you most likely probably don't BUT having a recommendation system is quite laborious without a fuzzy component to it.

I mean, sure you can do tag based statistics but you will have to manually group similar tags and somehow also account for spelling variation. With a vector embedding, problem solved! Just pin something you like then get recommended co**similar** stuff.

There are many off-the-shelf vector embeddings, but they are primarily designed for general-purpose tasks such as sentence embeddings. While you can still adapt them for other use cases, many models are sensitive to token order and the exact phrasing of inputs.

# Setup

The model checkpoints are available [HERE](https://huggingface.co/michael-0acf4/anitag2vec), this includes ONNX ports.

## Python

```bash
pip install torch tokenizers tqdm asciichartpy
```

See the notebook [python/ranked_inference.ipynb](python/ranked_inference.ipynb) for a concrete inference example.

You can also explore the model's capabilities by composing embeddings using +, *, -, /.

```bash
python python/interactive.py
```

Here for example, we look for the closest entries to the expression within [this MAL style dataset](./data/mal_5a250b8b201ace01.json).

![Tag Algebra](misc/tag_algebra.png)

## Inference in Rust

The rust implementation relies on the ONNX port of the PyTorch model.

```bash
cargo add anitag2vec
```

```rust
use anitag2vec::{
downloader::{ModelDownloader, KnownModel},
model::Anitag2Vec,
tagtok::TagSet
};

fn main() {
println!("Downloading models...");
let model_path = ModelDownloader::from_known(KnownModel::Anitag2VecV1, false).download().unwrap();
let tokenizer_path = ModelDownloader::from_known(KnownModel::Anitag2VecTokenizerV1, false).download().unwrap();
println!("Done!");

let mut anitag2vec = Anitag2Vec::load_from_file_v1(model_path, tokenizer_path).unwrap();
let example = vec![
TagSet::new(["transcend", "uma musume", "imageset", "japanese"]),
TagSet::new(["Comedy", "TV", "Anime", "Romance"]),
];
let emb = anitag2vec.run_inference(example).unwrap();
println!("{:?}", emb.shape()); // [2, 128]

// Similar to emb.map(|nd| ..)
// This representation allows various math operations
println!("{}", emb.ndarray());

// or alternatively as Vec>
// println!("{:?}", emb.to_vec());
}
```

# Architecture

You can refer to [my blog post](https://blog.afmichael.dev/posts/2026/set-embeddings-and-anitag2vec/) in which I detail the design decisions and also how it works.