https://github.com/michael-0acf4/anitag2vec
Generate vector embeddings from Danbooru, Sakugabooru, Pixiv, MAL style tags.
https://github.com/michael-0acf4/anitag2vec
cosine-similarity danbooru deepset myanimelist myanimelist-filter pixiv pytorch pytorch-implementation ranking-algorithm sakugabooru set-embedding tag-embedding transformer vector-embeddings
Last synced: about 2 months ago
JSON representation
Generate vector embeddings from Danbooru, Sakugabooru, Pixiv, MAL style tags.
- Host: GitHub
- URL: https://github.com/michael-0acf4/anitag2vec
- Owner: michael-0acf4
- License: mit
- Created: 2026-03-05T20:10:15.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-06T16:40:33.000Z (about 2 months ago)
- Last Synced: 2026-04-06T18:26:54.178Z (about 2 months ago)
- Topics: cosine-similarity, danbooru, deepset, myanimelist, myanimelist-filter, pixiv, pytorch, pytorch-implementation, ranking-algorithm, sakugabooru, set-embedding, tag-embedding, transformer, vector-embeddings
- Language: Python
- Homepage:
- Size: 22.5 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# anitag2vec
anitag2vec is a vector embedding primarily focused on Danbooru, Sakugabooru, Pixiv, MAL, etc type of tags.
# Why?
If you have your own local gallery or index of things you like, which, to be fair you most likely probably don't BUT having a recommendation system is quite laborious without a fuzzy component to it.
I mean, sure you can do tag based statistics but you will have to manually group similar tags and somehow also account for spelling variation. With a vector embedding, problem solved! Just pin something you like then get recommended co**similar** stuff.
There are many off-the-shelf vector embeddings, but they are primarily designed for general-purpose tasks such as sentence embeddings. While you can still adapt them for other use cases, many models are sensitive to token order and the exact phrasing of inputs.
# Setup
The model checkpoints are available [HERE](https://huggingface.co/michael-0acf4/anitag2vec), this includes ONNX ports.
## Python
```bash
pip install torch tokenizers tqdm asciichartpy
```
See the notebook [python/ranked_inference.ipynb](python/ranked_inference.ipynb) for a concrete inference example.
You can also explore the model's capabilities by composing embeddings using +, *, -, /.
```bash
python python/interactive.py
```
Here for example, we look for the closest entries to the expression within [this MAL style dataset](./data/mal_5a250b8b201ace01.json).

## Inference in Rust
The rust implementation relies on the ONNX port of the PyTorch model.
```bash
cargo add anitag2vec
```
```rust
use anitag2vec::{
downloader::{ModelDownloader, KnownModel},
model::Anitag2Vec,
tagtok::TagSet
};
fn main() {
println!("Downloading models...");
let model_path = ModelDownloader::from_known(KnownModel::Anitag2VecV1, false).download().unwrap();
let tokenizer_path = ModelDownloader::from_known(KnownModel::Anitag2VecTokenizerV1, false).download().unwrap();
println!("Done!");
let mut anitag2vec = Anitag2Vec::load_from_file_v1(model_path, tokenizer_path).unwrap();
let example = vec![
TagSet::new(["transcend", "uma musume", "imageset", "japanese"]),
TagSet::new(["Comedy", "TV", "Anime", "Romance"]),
];
let emb = anitag2vec.run_inference(example).unwrap();
println!("{:?}", emb.shape()); // [2, 128]
// Similar to emb.map(|nd| ..)
// This representation allows various math operations
println!("{}", emb.ndarray());
// or alternatively as Vec>
// println!("{:?}", emb.to_vec());
}
```
# Architecture
You can refer to [my blog post](https://blog.afmichael.dev/posts/2026/set-embeddings-and-anitag2vec/) in which I detail the design decisions and also how it works.