{"id":13413368,"url":"https://github.com/knights-analytics/hugot","last_synced_at":"2026-05-20T23:06:55.780Z","repository":{"id":221189039,"uuid":"750910601","full_name":"knights-analytics/hugot","owner":"knights-analytics","description":"Onnx transformer pipelines in Golang","archived":false,"fork":false,"pushed_at":"2025-12-19T16:22:58.000Z","size":656,"stargazers_count":523,"open_issues_count":5,"forks_count":38,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-12-20T21:56:45.539Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/knights-analytics.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-01-31T15:09:13.000Z","updated_at":"2025-12-20T21:08:56.000Z","dependencies_parsed_at":"2024-03-17T15:31:04.624Z","dependency_job_id":"33a68943-d8e1-4212-8384-6a1139f3766d","html_url":"https://github.com/knights-analytics/hugot","commit_stats":null,"previous_names":["knights-analytics/hugo","knights-analytics/hugot"],"tags_count":44,"template":false,"template_full_name":null,"purl":"pkg:github/knights-analytics/hugot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knights-analytics%2Fhugot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knights-analytics%2Fhugot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knights-analytics%2Fhugot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knights-analytics%2Fhugot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/knights-analytics","download_url":"https://codeload.github.com/knights-analytics/hugot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knights-analytics%2Fhugot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28331303,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-12T00:36:25.062Z","status":"ssl_error","status_checked_at":"2026-01-12T00:36:15.229Z","response_time":60,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T20:01:38.746Z","updated_at":"2026-05-20T23:06:55.753Z","avatar_url":"https://github.com/knights-analytics.png","language":"Go","funding_links":[],"categories":["机器学习","Machine Learning","Go"],"sub_categories":["检索及分析资料库","Search and Analytic Databases"],"readme":"# \u003cspan\u003eHugot: ONNX Transformer Pipelines for Go\n\n[![Go Reference](https://pkg.go.dev/badge/github.com/knights-analytics/hugot.svg)](https://pkg.go.dev/github.com/knights-analytics/hugot)\n[![Go Report Card](https://goreportcard.com/badge/github.com/knights-analytics/hugot)](https://goreportcard.com/report/github.com/knights-analytics/hugot)\n[![Coverage Status](https://coveralls.io/repos/github/knights-analytics/hugot/badge.svg?branch=main)](https://coveralls.io/github/knights-analytics/hugot?branch=main)\n\n\u003cdiv style=\"text-align:center\"\u003e\n\u003cimg src=\"./hugot.png\" width=\"300\" alt=\"Go Gopher Transformer\"\u003e\n\u003c/div\u003e\n\n## What\n\nTL;DR: AI use-cases such as embeddings, text generation (generative/LLMs), image classification, entity recognition, fine-tuning, and more, natively running in Go!\n\nThe goal of this library is to provide an easy, scalable, and hassle-free way to run transformer pipelines inference and training in golang applications, such as Hugging Face 🤗 transformers pipelines. It is built on the following principles:\n\n1. Hugging Face compatibility: models trained and tested using the python Hugging Face transformer library can be exported to onnx and used with the Hugot pipelines to obtain identical predictions as in the python version.\n2. Hassle-free and performant production use: we exclusively support onnx models. Pytorch transformer models that don't have an onnx version can be easily exported to onnx via [Hugging Face Optimum](https://huggingface.co/docs/optimum/index), and used with the library.\n3. Run on your hardware: this library is for those who want to run transformer models tightly coupled with their go applications, without the performance drawbacks of having to hit a rest API or the hassle of setting up and maintaining e.g. a python RPC service that talks to go.\n4. Simplicity: the Hugot API allows you to easily deploy pipelines without having to write your own inference or training code. It also now includes a pure Go backend for minimal dependencies!\n\nWe support inference on CPU and on all accelerators supported by ONNX Runtime/OpenXLA. Note, however, that currently only CPU, TPU, and GPU inference on Nvidia GPUs via CUDA, are tested (see below).\n\nIMPORTANT: The Go backend is designed for simpler workloads, environments that disallow cgo, and for smaller models such as [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It works best with small batches of roughly 32 inputs per call. If you have performance requirements, please move to a C backend such as XLA or ORT (detailed below).\n\nHugot loads and saves models in the ONNX format.\n\n## Why\n\nDeveloping and fine-tuning transformer models with the Hugging Face python library is great, but if your production stack is golang-based being able to reliably deploy and scale the resulting pytorch models can be challenging. This library aims to allow you to just lift-and-shift your python model and use the same Hugging Face pipelines you use for development for inference in a go application.\n\n## For whom\n\nFor the golang developer or ML engineer who wants to run or fine-tune transformer pipelines on their own hardware and tightly coupled with their own application, without having to deal with writing their own inference or training code.\n\n## By whom\n\nHugot is brought to you by the friendly folks at [Knights Analytics](https://knightsanalytics.com), who use Hugot in production to automate ai-powered data curation.\n\n## Implemented pipelines\n\nCurrently, we have implementations for the following transformer pipelines:\n\n- [crossEncoder](https://huggingface.co/cross-encoder)\n- [featureExtraction](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.FeatureExtractionPipeline)\n- [imageClassification](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.ImageClassificationPipeline)\n- [objectDetection](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.ImageClassificationPipeline)\n- [questionAnswering](https://huggingface.co/docs/transformers/tasks/question_answering)\n- tabular (classic ML models such as decision trees, random forests etc)\n- [textClassification](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextClassificationPipeline)\n- [textGeneration](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextGenerationPipeline) (currently ORT only)\n- [tokenClassification](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TokenClassificationPipeline)\n- [zeroShotClassification](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.ZeroShotClassificationPipeline)\n\nImplementations for additional pipelines will follow. We also very gladly accept PRs to expand the set of pipelines! See [here](https://huggingface.co/docs/transformers/en/main_classes/pipelines) for the missing pipelines that can be implemented, and the contributing section below if you want to lend a hand.\n\nHugot can be used both as a library and as a command-line application. See below for usage instructions.\n\n## Installation and usage\n\n### Choosing a backend\n\nHugot supports pluggable backends to perform the tokenization and run the ONNX models. Currently, we support the following backends:\n\n- (default) native go (provided by [GoMLX](https://github.com/gomlx/gomlx))\n- [Onnx Runtime](https://onnxruntime.ai/)\n- [OpenXLA](https://openxla.org/)\n\nOnnx Runtime can also be selected as a backend via the build tag \"-tags ORT\". It does not support training, but it is currently the fastest backend for CPU inference. It supports\nall pipelines, including generative pipelines such as text generation.\n\nOpenXLA can be included at compile time via the build tag \"-tags XLA\". This is required for fine-tuning of e.g. embedding models, and is the only backend that supports TPUs. Note that it does not yet support generative pipelines.\n\nCUDA requires a C backend, either OpenXLA or Onnx Runtime.\n\nOnce compiled, Hugot can be instantiated with your backend of choice via calling `NewGoSession()`, `NewXLASession()` or `NewORTSession()` respectively.\n\nYou may combine build tags \"-tags XLA,ORT\" or use \"-tags ALL\" to be able to use all available backends interchangeably.\n\n### Usage\n\nTo use Hugot as a library in your application, you can directly import it and follow the example below.\n\n#### Backends\n\n- if using Onnx Runtime, the libonnxruntime.so file should be obtained from the releases section of this page. If you want to use other architectures than `linux/amd64` you will have to download it from [the ONNX Runtime releases page](https://github.com/microsoft/onnxruntime/releases/), see the [dockerfile](./Dockerfile) as an example. Hugot looks for this file at /usr/lib/libonnxruntime.so by default. A different location can be specified by passing the `WithOnnxLibraryPath()` option to `NewORTSession()`, e.g:\n\n```go\nsession, err := NewORTSession(\n    ctx,\n    options.WithOnnxLibraryPath(\"/path/to/my/lib/directory\"),\n)\n```\n\n- if using XLA, the easiest way is to run \"GOPROXY=direct go run github.com/gomlx/go-xla/cmd/pjrt_installer@latest -plugin=linux -version=v${GOPJRT_VERSION} -path=/usr/local/lib/go-xla\", which will install the XLA backend provided by the [goMLX](https://github.com/gomlx/gomlx) project.\n\n- if using XLA or ORT, you will also need to use the rust-based tokenizer. The tokenizers.a file can be obtained from the releases section of this page (if you want to use alternative architecture from `linux/amd64` you will have to build the tokenizers.a yourself, see [here](https://github.com/daulet/tokenizers)). This file should be at /usr/lib/tokenizers.a so that Hugot can load it. Alternatively, you can explicitly specify the path to the folder with the `libtokenizers.a` file using the `CGO_LDFLAGS` env variable, see the [dockerfile](./Dockerfile). The tokenizer is statically linked at build time.\n\nAlternatively, you can also use the [docker image](https://github.com/knights-analytics/hugot/pkgs/container/hugot) which has all the above dependencies already baked in.\n\n- the latest versions of the onnxruntime and gopjrt libraries that are used in our testing can be found in the [bakefile](./docker-bake.hcl).\n\nThe library can be used as follows:\n\n```go\npackage main\n\nimport (\n    \"github.com/knights-analytics/hugot\"\n    \"context\"\n\t\"encoding/json\"\n    \"fmt\"\n)\n\nfunc check(err error) {\n    if err != nil {\n        panic(err.Error())\n    }\n}\n\nfunc main() {\n    // all sessions require context.Context\n    ctx := context.Background()\n    // start a new session\n    session, err := hugot.NewGoSession(ctx)\n\t// For XLA (requires go build tags \"XLA\" or \"ALL\"):\n\t// session, err := hugot.NewXLASession(ctx)\n\t// For ORT (requires go build tags \"ORT\" or \"ALL\"):\n\t// session, err := hugot.NewORTSession(ctx)\n\t// This looks for the libonnxruntime.so library in its default path, e.g. /usr/lib\n    // If your libonnxruntime.so is somewhere else, you can explicitly set it by using WithOnnxLibraryPath\n    // session, err := hugot.NewORTSession(ctx, WithOnnxLibraryPath(\"/path/to/my/lib/directory\"))\n\tcheck(err)\n\t\n    // A successfully created hugot session needs to be destroyed when you're done\n    defer func (session *hugot.Session) {\n    err = session.Destroy()\n    check(err)\n    }(session)\n\n    // Let's download an onnx sentiment test classification model in the current directory\n    // note: if you compile your library with build flag NODOWNLOAD, this will exclude the downloader.\n    // Useful in case you just want the core engine (because you already have the models)\n    modelPath, err := hugot.DownloadModel(\"KnightsAnalytics/distilbert-base-uncased-finetuned-sst-2-english\", \"./models/\", hugot.NewDownloadOptions())\n    check(err)\n\n    // We now create the configuration for the text classification pipeline we want to create.\n    // Options to the pipeline can be set here using the Options field\n    config := hugot.TextClassificationConfig{\n        ModelPath: modelPath,\n        Name:      \"testPipeline\",\n    }\n    // then we create out pipeline.\n    // Note: the pipeline will also be added to the session object, so all pipelines can be destroyed at once\n    sentimentPipeline, err := hugot.NewPipeline(session, config)\n    check(err)\n\n    // we can now use the pipeline for prediction on a batch of strings\n    batch := []string{\"This movie is disgustingly good !\", \"The director tried too much\"}\n    batchResult, err := sentimentPipeline.RunPipeline(ctx, batch)\n    check(err)\n\n    // and do whatever we want with it :)\n    s, err := json.Marshal(batchResult)\n    check(err)\n    fmt.Println(string(s))\n}\n// OUTPUT: {\"ClassificationOutputs\":[[{\"Label\":\"POSITIVE\",\"Score\":0.99031734}],[{\"Label\":\"NEGATIVE\",\"Score\":0.963696}]]}\n```\n\nSee also hugot_test.go for further examples for all pipelines.\n\n## Generative models\n\nHugot uses the [Onnx Runtime Generative AI](https://onnxruntime.ai/generative-ai) backend to run generative models.\n\nWe currently support generative models only within the text generation pipeline. Please look at the [ORT tests](hugot_ort_test.go) for an example of its usage.\n\nTo use the experimental Engine support for concurrent requests and inference batching, use the `WithGenerativeEngine()` option when creating a session.\n\n## Hardware acceleration 🚀\n\nHugot now also supports the following accelerator backends for your inference:\n - CUDA (tested on Onnx Runtime and XLA). See below for setup instructions.\n - TPU (XLA only)\n - TensorRT (available in Onnx Runtime only)\n - DirectML (available in Onnx Runtime only)\n - CoreML (available in Onnx Runtime only)\n - OpenVINO (available in Onnx Runtime only)\n\nPlease provide feedback if encountering any issues with the accelerators above!\n\nTo use Hugot with Nvidia gpu acceleration, you need to have the following:\n\n- The Nvidia driver for your graphics card (if running in Docker and WSL2, starting with --gpus all should inherit the drivers from the host OS)\n- ONNX Runtime:\n    - The cuda gpu version of ONNX Runtime on the machine/docker container. You can see how we get that by looking at the [Dockerfile](./Dockerfile). You can also get the ONNX Runtime libraries that we use for testing from the release. Just download the gpu .so libraries and put them in /usr/lib.\n    - The required CUDA libraries installed on your system that are compatible with the ONNX Runtime gpu version you use. See [here](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html). For instance, for onnxruntime-gpu 1.24.4, we need CUDA 12.x (any minor version should be compatible) and cuDNN 9.x.\n    - Start a session with the following:\n      ```go\n      ctx := context.Background()\n      opts := []options.WithOption{\n        options.WithCuda(map[string]string{\n          \"device_id\": \"0\",\n        }),\n      }\n      session, err := NewORTSession(ctx, opts...)\n      ```\n- OpenXLA\n    - Install CUDA support via the command `GOPROXY=direct go run github.com/gomlx/go-xla/cmd/pjrt_installer@latest -plugin=cuda13 -version=${JAX_CUDA_VERSION} -path=/usr/local/lib/go-xla`\n    - Start a session with the following:\n      ```go\n      ctx := context.Background()\n      opts := []options.WithOption{\n        options.WithCuda(map[string]string{\n          \"device_id\": \"0\",\n        }),\n      }\n      session, err := NewXLASession(ctx, opts...)\n      ```\n\nFor the ONNX Runtime Cuda libraries, you can install CUDA 12.x by installing the full cuda toolkit, but that's quite a big package. In our testing on awslinux/fedora, we have been able to limit the libraries needed to run Hugot with Nvidia gpu acceleration to just these:\n\n- cuda-cudart-12-9 cuda-nvrtc-12-9 libcublas-12-9 libcurand-12-9 libcufft-12-9 libcudnn9-cuda-12\n\nOn different distros (e.g. Ubuntu), you should be able to install the equivalent packages.\n\n## Training and fine-tuning pipelines \n\nHugot now also supports the training and fine-tuning of transformer pipelines! This functionality requires that you build with XLA enabled as we use gomlx behind the\nscenes for training/fine-tuning: the onnx model will be loaded, converted to xla and trained using [goMLX](https://github.com/gomlx/gomlx), and serialized back to onnx format.\n\nThis is currently supported only for the **FeatureExtractionPipeline**. This can be used to fine-tune the vector embeddings for e.g. semantic textual similarity (for applications like RAG and semantic search). In order to fine-tune the feature extraction pipeline for semantic search you will need to collect a training dataset in the following format:\n\n```js\n{\"sentence1\": \"The quick brown fox jumps over the lazy dog\", \"sentence2\": \"A quick brown fox jumps over a lazy dog\", \"score\": 1}\n{\"sentence1\": \"The quick brown fox jumps over the lazy dog\", \"sentence2\": \"A quick brown cow jumps over a lazy caterpillar\", \"score\": 0.5}\n```\n\nSee the [example](testcases/semanticSimilarityTest.jsonl) for a sample dataset.\n\nThe score is assumed to be a float between 0 and 1 that encodes the semantic similarity between the sentences, and by default a cosine similarity loss is used (see [sentence transformers](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss)). However, you can also specify a different loss function from `goMLX` using the `XLATrainingOptions` field in the `TrainingConfig` struct. See [the training tests](./hugot_training_test.go) for examples on how to train or fine-tune feature extraction pipelines.\n\nNote that training on GPU is currently much faster and memory efficient than training on CPU, although optimizations are underway. On CPU, we recommend smaller batch sizes.\n\nSee [the tests](hugot_training_test.go) for an example on how to fine-tune semantic similarity starting with an open source sentence transformers model and a few examples.\n\n## Performance Tuning\n\nFirstly, the throughput depends largely on the size of the input requests. The best batch size is affected by the number of tokens per input, but we find batches of roughly 32 inputs per call to be a good starting point.\n\n### ONNX Runtime\nThe library defaults to ONNX Runtime's default tuning settings. These are optimised for latency over throughput, and will attempt to parallelize single threaded calls to ONNX Runtime over multiple cores.\n\nFor maximum throughput, it is best to call a single shared Hugot pipeline from multiple goroutines (1 per core), using a channel to pass the input data. In this scenario, the following settings will greatly increase inference throughput.\n\n```go\nsession, err := hugot.NewORTSession(\n\tcontext.Background(),\n\thugot.WithInterOpNumThreads(1),\n\thugot.WithIntraOpNumThreads(1),\n\thugot.WithCpuMemArena(false),\n\thugot.WithMemPattern(false),\n)\n```\n\nInterOpNumThreads and IntraOpNumThreads constricts each goroutine's call to a single core, greatly reducing locking and cache penalties. Disabling CpuMemArena and MemPattern skips pre-allocation of some memory structures, increasing latency, but also throughput efficiency.\n\n## File Systems\nWe use an [abstract file system](https://github.com/viant/afs) within Hugot. It works out of the box with various OS filesystems, to use object stores such as S3 please import the appropriate plugin from the afsc library, e.g.\n```go\nimport _ \"github.com/viant/afsc/s3\"\n```\n\n## Limitations\n\nApart from the fact that only the aforementioned pipelines are currently implemented, the current limitations are:\n\n- the library is only built/tested on amd64-linux currently.\n\n## Contributing\n\nIf you would like to contribute to Hugot, please see the [contribution guidelines](./contrib.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fknights-analytics%2Fhugot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fknights-analytics%2Fhugot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fknights-analytics%2Fhugot/lists"}