{"id":13475076,"url":"https://github.com/huggingface/candle","last_synced_at":"2025-05-12T18:11:15.316Z","repository":{"id":176350991,"uuid":"655797848","full_name":"huggingface/candle","owner":"huggingface","description":"Minimalist ML framework for Rust","archived":false,"fork":false,"pushed_at":"2025-04-28T07:19:46.000Z","size":12573,"stargazers_count":17088,"open_issues_count":521,"forks_count":1086,"subscribers_count":169,"default_branch":"main","last_synced_at":"2025-04-28T10:11:22.700Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/huggingface.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-06-19T16:06:31.000Z","updated_at":"2025-04-28T07:19:49.000Z","dependencies_parsed_at":"2023-08-10T09:17:14.334Z","dependency_job_id":"a42a3add-3572-40ff-b339-10fbaaf20d1f","html_url":"https://github.com/huggingface/candle","commit_stats":{"total_commits":2104,"total_committers":163,"mean_commits":12.9079754601227,"dds":0.3018060836501901,"last_synced_commit":"67cab7d6b8279f953b0a8cc5012b135b9743cdc8"},"previous_names":["laurentmazare/candle","candle-rs/candle"],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fcandle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fcandle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fcandle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fcandle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/huggingface","download_url":"https://codeload.github.com/huggingface/candle/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252522366,"owners_count":21761720,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T16:01:17.174Z","updated_at":"2025-05-05T15:23:07.699Z","avatar_url":"https://github.com/huggingface.png","language":"Rust","funding_links":[],"categories":["Libraries","Rust","others","其他_机器学习与深度学习","Other Versions of YOLO","Summary","Training","Frameworks","Machine Learning","Library / Framework","Inference Engines","Neural Networks","Tools","Projects","\u003ca name=\"Rust\"\u003e\u003c/a\u003eRust","2. Libraries \u0026 Frameworks","Uncategorized","1. Core Frameworks \u0026 Libraries","Inference Servers","Deep Learning Frameworks"],"sub_categories":["Artificial Intelligence","Frameworks for Training","Platform Guides","General-Purpose Machine Learning","Framework","Rust","Uncategorized"],"readme":"# candle\n[![discord server](https://dcbadge.vercel.app/api/server/hugging-face-879548962464493619)](https://discord.gg/hugging-face-879548962464493619)\n[![Latest version](https://img.shields.io/crates/v/candle-core.svg)](https://crates.io/crates/candle-core)\n[![Documentation](https://docs.rs/candle-core/badge.svg)](https://docs.rs/candle-core)\n[![License](https://img.shields.io/github/license/base-org/node?color=blue)](https://github.com/huggingface/candle/blob/main/LICENSE-MIT)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square)](https://github.com/huggingface/candle/blob/main/LICENSE-APACHE)\n\nCandle is a minimalist ML framework for Rust with a focus on performance (including GPU support) \nand ease of use. Try our online demos: \n[whisper](https://huggingface.co/spaces/lmz/candle-whisper),\n[LLaMA2](https://huggingface.co/spaces/lmz/candle-llama2),\n[T5](https://huggingface.co/spaces/radames/Candle-T5-Generation-Wasm),\n[yolo](https://huggingface.co/spaces/lmz/candle-yolo),\n[Segment\nAnything](https://huggingface.co/spaces/radames/candle-segment-anything-wasm).\n\n## Get started\n\nMake sure that you have [`candle-core`](https://github.com/huggingface/candle/tree/main/candle-core) correctly installed as described in [**Installation**](https://huggingface.github.io/candle/guide/installation.html).\n\nLet's see how to run a simple matrix multiplication.\nWrite the following to your `myapp/src/main.rs` file:\n```rust\nuse candle_core::{Device, Tensor};\n\nfn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let device = Device::Cpu;\n\n    let a = Tensor::randn(0f32, 1., (2, 3), \u0026device)?;\n    let b = Tensor::randn(0f32, 1., (3, 4), \u0026device)?;\n\n    let c = a.matmul(\u0026b)?;\n    println!(\"{c}\");\n    Ok(())\n}\n```\n\n`cargo run` should display a tensor of shape `Tensor[[2, 4], f32]`.\n\n\nHaving installed `candle` with Cuda support, simply define the `device` to be on GPU:\n\n```diff\n- let device = Device::Cpu;\n+ let device = Device::new_cuda(0)?;\n```\n\nFor more advanced examples, please have a look at the following section.\n\n## Check out our examples\n\nThese online demos run entirely in your browser:\n- [yolo](https://huggingface.co/spaces/lmz/candle-yolo): pose estimation and\n  object recognition.\n- [whisper](https://huggingface.co/spaces/lmz/candle-whisper): speech recognition.\n- [LLaMA2](https://huggingface.co/spaces/lmz/candle-llama2): text generation.\n- [T5](https://huggingface.co/spaces/radames/Candle-T5-Generation-Wasm): text generation.\n- [Phi-1.5, and Phi-2](https://huggingface.co/spaces/radames/Candle-Phi-1.5-Wasm): text generation.\n- [Segment Anything Model](https://huggingface.co/spaces/radames/candle-segment-anything-wasm): Image segmentation.\n- [BLIP](https://huggingface.co/spaces/radames/Candle-BLIP-Image-Captioning): image captioning.\n\nWe also provide a some command line based examples using state of the art models:\n\n- [LLaMA v1, v2, and v3](./candle-examples/examples/llama/): general LLM, includes\n  the SOLAR-10.7B variant.\n- [Falcon](./candle-examples/examples/falcon/): general LLM.\n- [Codegeex4](./candle-examples/examples/codegeex4-9b/): Code completion,code interpreter,web search,fuction calling,repository-level\n- [GLM4](./candle-examples/examples/glm4/): Open Multilingual Multimodal Chat LMs by THUDM\n- [Gemma v1 and v2](./candle-examples/examples/gemma/): 2b and 7b+/9b general LLMs from Google Deepmind.\n- [RecurrentGemma](./candle-examples/examples/recurrent-gemma/): 2b and 7b\n  Griffin based models from Google that mix attention with a RNN like state.\n- [Phi-1, Phi-1.5, Phi-2, and Phi-3](./candle-examples/examples/phi/): 1.3b,\n  2.7b, and 3.8b general LLMs with performance on par with 7b models.\n- [StableLM-3B-4E1T](./candle-examples/examples/stable-lm/): a 3b general LLM\n  pre-trained on 1T tokens of English and code datasets. Also supports\n  StableLM-2, a 1.6b LLM trained on 2T tokens, as well as the code variants.\n- [Mamba](./candle-examples/examples/mamba/): an inference only\n  implementation of the Mamba state space model.\n- [Mistral7b-v0.1](./candle-examples/examples/mistral/): a 7b general LLM with\n  better performance than all publicly available 13b models as of 2023-09-28.\n- [Mixtral8x7b-v0.1](./candle-examples/examples/mixtral/): a sparse mixture of\n  experts 8x7b general LLM with better performance than a Llama 2 70B model with\n  much faster inference.\n- [StarCoder](./candle-examples/examples/bigcode/) and\n  [StarCoder2](./candle-examples/examples/starcoder2/): LLM specialized to code generation.\n- [Qwen1.5](./candle-examples/examples/qwen/): Bilingual (English/Chinese) LLMs.\n- [RWKV v5 and v6](./candle-examples/examples/rwkv/): An RNN with transformer level LLM\n  performance.\n- [Replit-code-v1.5](./candle-examples/examples/replit-code/): a 3.3b LLM specialized for code completion.\n- [Yi-6B / Yi-34B](./candle-examples/examples/yi/): two bilingual\n  (English/Chinese) general LLMs with 6b and 34b parameters.\n- [Quantized LLaMA](./candle-examples/examples/quantized/): quantized version of\n  the LLaMA model using the same quantization techniques as\n  [llama.cpp](https://github.com/ggerganov/llama.cpp).\n\n\u003cimg src=\"https://github.com/huggingface/candle/raw/main/candle-examples/examples/quantized/assets/aoc.gif\" width=\"600\"\u003e\n  \n- [Stable Diffusion](./candle-examples/examples/stable-diffusion/): text to\n  image generative model, support for the 1.5, 2.1, SDXL 1.0 and Turbo versions.\n\n\u003cimg src=\"https://github.com/huggingface/candle/raw/main/candle-examples/examples/stable-diffusion/assets/stable-diffusion-xl.jpg\" width=\"200\"\u003e\n\n- [Wuerstchen](./candle-examples/examples/wuerstchen/): another text to\n  image generative model.\n\n\u003cimg src=\"https://github.com/huggingface/candle/raw/main/candle-examples/examples/wuerstchen/assets/cat.jpg\" width=\"200\"\u003e\n\n- [yolo-v3](./candle-examples/examples/yolo-v3/) and\n  [yolo-v8](./candle-examples/examples/yolo-v8/): object detection and pose\n  estimation models.\n\n\u003cimg src=\"https://github.com/huggingface/candle/raw/main/candle-examples/examples/yolo-v8/assets/bike.od.jpg\" width=\"200\"\u003e\u003cimg src=\"https://github.com/huggingface/candle/raw/main/candle-examples/examples/yolo-v8/assets/bike.pose.jpg\" width=\"200\"\u003e\n- [segment-anything](./candle-examples/examples/segment-anything/): image\n  segmentation model with prompt.\n\n\u003cimg src=\"https://github.com/huggingface/candle/raw/main/candle-examples/examples/segment-anything/assets/sam_merged.jpg\" width=\"200\"\u003e\n\n- [SegFormer](./candle-examples/examples/segformer/): transformer based semantic segmentation model.\n- [Whisper](./candle-examples/examples/whisper/): speech recognition model.\n- [EnCodec](./candle-examples/examples/encodec/): high-quality audio compression\n  model using residual vector quantization.\n- [MetaVoice](./candle-examples/examples/metavoice/): foundational model for\n  text-to-speech.\n- [Parler-TTS](./candle-examples/examples/parler-tts/): large text-to-speech\n  model.\n- [T5](./candle-examples/examples/t5), [Bert](./candle-examples/examples/bert/),\n  [JinaBert](./candle-examples/examples/jina-bert/) : useful for sentence embeddings.\n- [DINOv2](./candle-examples/examples/dinov2/): computer vision model trained\n  using self-supervision (can be used for imagenet classification, depth\n  evaluation, segmentation).\n- [VGG](./candle-examples/examples/vgg/),\n  [RepVGG](./candle-examples/examples/repvgg): computer vision models.\n- [BLIP](./candle-examples/examples/blip/): image to text model, can be used to\n  generate captions for an image.\n- [CLIP](./candle-examples/examples/clip/): multi-model vision and language\n  model.\n- [TrOCR](./candle-examples/examples/trocr/): a transformer OCR model, with\n  dedicated submodels for hand-writing and printed recognition.\n- [Marian-MT](./candle-examples/examples/marian-mt/): neural machine translation\n  model, generates the translated text from the input text.\n- [Moondream](./candle-examples/examples/moondream/): tiny computer-vision model \n  that can answer real-world questions about images.\n\nRun them using commands like:\n```\ncargo run --example quantized --release\n```\n\nIn order to use **CUDA** add `--features cuda` to the example command line. If\nyou have cuDNN installed, use `--features cudnn` for even more speedups.\n\nThere are also some wasm examples for whisper and\n[llama2.c](https://github.com/karpathy/llama2.c). You can either build them with\n`trunk` or try them online:\n[whisper](https://huggingface.co/spaces/lmz/candle-whisper),\n[llama2](https://huggingface.co/spaces/lmz/candle-llama2),\n[T5](https://huggingface.co/spaces/radames/Candle-T5-Generation-Wasm),\n[Phi-1.5, and Phi-2](https://huggingface.co/spaces/radames/Candle-Phi-1.5-Wasm),\n[Segment Anything Model](https://huggingface.co/spaces/radames/candle-segment-anything-wasm).\n\nFor LLaMA2, run the following command to retrieve the weight files and start a\ntest server:\n```bash\ncd candle-wasm-examples/llama2-c\nwget https://huggingface.co/spaces/lmz/candle-llama2/resolve/main/model.bin\nwget https://huggingface.co/spaces/lmz/candle-llama2/resolve/main/tokenizer.json\ntrunk serve --release --port 8081\n```\nAnd then head over to\n[http://localhost:8081/](http://localhost:8081/).\n\n\u003c!--- ANCHOR: useful_libraries ---\u003e\n\n## Useful External Resources\n- [`candle-tutorial`](https://github.com/ToluClassics/candle-tutorial): A\n  very detailed tutorial showing how to convert a PyTorch model to Candle.\n- [`candle-lora`](https://github.com/EricLBuehler/candle-lora): Efficient and\n  ergonomic LoRA implementation for Candle. `candle-lora` has      \n  out-of-the-box LoRA support for many models from Candle, which can be found\n  [here](https://github.com/EricLBuehler/candle-lora/tree/master/candle-lora-transformers/examples).\n- [`optimisers`](https://github.com/KGrewal1/optimisers): A collection of optimisers\n  including SGD with momentum, AdaGrad, AdaDelta, AdaMax, NAdam, RAdam, and RMSprop.\n- [`candle-vllm`](https://github.com/EricLBuehler/candle-vllm): Efficient platform for inference and\n  serving local LLMs including an OpenAI compatible API server.\n- [`candle-ext`](https://github.com/mokeyish/candle-ext): An extension library to Candle that provides PyTorch functions not currently available in Candle.\n- [`candle-coursera-ml`](https://github.com/vishpat/candle-coursera-ml): Implementation of ML algorithms from Coursera's [Machine Learning Specialization](https://www.coursera.org/specializations/machine-learning-introduction) course.\n- [`kalosm`](https://github.com/floneum/floneum/tree/master/interfaces/kalosm): A multi-modal meta-framework in Rust for interfacing with local pre-trained models with support for controlled generation, custom samplers, in-memory vector databases, audio transcription, and more.\n- [`candle-sampling`](https://github.com/EricLBuehler/candle-sampling): Sampling techniques for Candle.\n- [`gpt-from-scratch-rs`](https://github.com/jeroenvlek/gpt-from-scratch-rs): A port of Andrej Karpathy's _Let's build GPT_ tutorial on YouTube showcasing the Candle API on a toy problem.\n- [`candle-einops`](https://github.com/tomsanbear/candle-einops): A pure rust implementation of the python [einops](https://github.com/arogozhnikov/einops) library.\n- [`atoma-infer`](https://github.com/atoma-network/atoma-infer): A Rust library for fast inference at scale, leveraging FlashAttention2 for efficient attention computation, PagedAttention for efficient KV-cache memory management, and multi-GPU support. It is OpenAI api compatible.\n- [`llms-from-scratch-rs`](https://github.com/nerdai/llms-from-scratch-rs): A comprehensive Rust translation of the code from Sebastian Raschka's Build an LLM from Scratch book.\n\nIf you have an addition to this list, please submit a pull request.\n\n\u003c!--- ANCHOR_END: useful_libraries ---\u003e\n\n\u003c!--- ANCHOR: features ---\u003e\n\n## Features\n\n- Simple syntax, looks and feels like PyTorch.\n    - Model training.\n    - Embed user-defined ops/kernels, such as [flash-attention v2](https://github.com/huggingface/candle/blob/89ba005962495f2bfbda286e185e9c3c7f5300a3/candle-flash-attn/src/lib.rs#L152).\n- Backends.\n    - Optimized CPU backend with optional MKL support for x86 and Accelerate for macs.\n    - CUDA backend for efficiently running on GPUs, multiple GPU distribution via NCCL.\n    - WASM support, run your models in a browser.\n- Included models.\n    - Language Models.\n        - LLaMA v1, v2, and v3 with variants such as SOLAR-10.7B.\n        - Falcon.\n        - StarCoder, StarCoder2.\n        - Phi 1, 1.5, 2, and 3.\n        - Mamba, Minimal Mamba\n        - Gemma v1 2b and 7b+, v2 2b and 9b.\n        - Mistral 7b v0.1.\n        - Mixtral 8x7b v0.1.\n        - StableLM-3B-4E1T, StableLM-2-1.6B, Stable-Code-3B.\n        - Replit-code-v1.5-3B.\n        - Bert.\n        - Yi-6B and Yi-34B.\n        - Qwen1.5, Qwen1.5 MoE.\n        - RWKV v5 and v6.\n    - Quantized LLMs.\n        - Llama 7b, 13b, 70b, as well as the chat and code variants.\n        - Mistral 7b, and 7b instruct.\n        - Mixtral 8x7b.\n        - Zephyr 7b a and b (Mistral-7b based).\n        - OpenChat 3.5 (Mistral-7b based).\n    - Text to text.\n        - T5 and its variants: FlanT5, UL2, MADLAD400 (translation), CoEdit (Grammar correction).\n        - Marian MT (Machine Translation).\n    - Text to image.\n        - Stable Diffusion v1.5, v2.1, XL v1.0.\n        - Wurstchen v2.\n    - Image to text.\n        - BLIP.\n        - TrOCR.\n    - Audio.\n        - Whisper, multi-lingual speech-to-text.\n        - EnCodec, audio compression model.\n        - MetaVoice-1B, text-to-speech model.\n        - Parler-TTS, text-to-speech model.\n    - Computer Vision Models.\n        - DINOv2, ConvMixer, EfficientNet, ResNet, ViT, VGG, RepVGG, ConvNeXT,\n          ConvNeXTv2, MobileOne, EfficientVit (MSRA), MobileNetv4, Hiera, FastViT.\n        - yolo-v3, yolo-v8.\n        - Segment-Anything Model (SAM).\n        - SegFormer.\n- File formats: load models from safetensors, npz, ggml, or PyTorch files.\n- Serverless (on CPU), small and fast deployments.\n- Quantization support using the llama.cpp quantized types.\n\n\u003c!--- ANCHOR_END: features ---\u003e\n\n## How to use\n\n\u003c!--- ANCHOR: cheatsheet ---\u003e\nCheatsheet:\n\n|            | Using PyTorch                            | Using Candle                                                     |\n|------------|------------------------------------------|------------------------------------------------------------------|\n| Creation   | `torch.Tensor([[1, 2], [3, 4]])`         | `Tensor::new(\u0026[[1f32, 2.], [3., 4.]], \u0026Device::Cpu)?`           |\n| Creation   | `torch.zeros((2, 2))`                    | `Tensor::zeros((2, 2), DType::F32, \u0026Device::Cpu)?`               |\n| Indexing   | `tensor[:, :4]`                          | `tensor.i((.., ..4))?`                                           |\n| Operations | `tensor.view((2, 2))`                    | `tensor.reshape((2, 2))?`                                        |\n| Operations | `a.matmul(b)`                            | `a.matmul(\u0026b)?`                                                  |\n| Arithmetic | `a + b`                                  | `\u0026a + \u0026b`                                                        |\n| Device     | `tensor.to(device=\"cuda\")`               | `tensor.to_device(\u0026Device::new_cuda(0)?)?`                            |\n| Dtype      | `tensor.to(dtype=torch.float16)`         | `tensor.to_dtype(\u0026DType::F16)?`                                  |\n| Saving     | `torch.save({\"A\": A}, \"model.bin\")`      | `candle::safetensors::save(\u0026HashMap::from([(\"A\", A)]), \"model.safetensors\")?` |\n| Loading    | `weights = torch.load(\"model.bin\")`      | `candle::safetensors::load(\"model.safetensors\", \u0026device)`        |\n\n\u003c!--- ANCHOR_END: cheatsheet ---\u003e\n\n\n## Structure\n\n- [candle-core](./candle-core): Core ops, devices, and `Tensor` struct definition\n- [candle-nn](./candle-nn/): Tools to build real models\n- [candle-examples](./candle-examples/): Examples of using the library in realistic settings\n- [candle-kernels](./candle-kernels/): CUDA custom kernels\n- [candle-datasets](./candle-datasets/): Datasets and data loaders.\n- [candle-transformers](./candle-transformers): transformers-related utilities.\n- [candle-flash-attn](./candle-flash-attn): Flash attention v2 layer.\n- [candle-onnx](./candle-onnx/): ONNX model evaluation.\n\n## FAQ\n\n### Why should I use Candle?\n\n\u003c!--- ANCHOR: goals ---\u003e\n\nCandle's core goal is to *make serverless inference possible*. Full machine learning frameworks like PyTorch\nare very large, which makes creating instances on a cluster slow. Candle allows deployment of lightweight\nbinaries.\n\nSecondly, Candle lets you *remove Python* from production workloads. Python overhead can seriously hurt performance,\nand the [GIL](https://www.backblaze.com/blog/the-python-gil-past-present-and-future/) is a notorious source of headaches.\n\nFinally, Rust is cool! A lot of the HF ecosystem already has Rust crates, like [safetensors](https://github.com/huggingface/safetensors) and [tokenizers](https://github.com/huggingface/tokenizers).\n\n\u003c!--- ANCHOR_END: goals ---\u003e\n\n### Other ML frameworks\n\n- [dfdx](https://github.com/coreylowman/dfdx) is a formidable crate, with shapes being included\n  in types. This prevents a lot of headaches by getting the compiler to complain about shape mismatches right off the bat.\n  However, we found that some features still require nightly, and writing code can be a bit daunting for non rust experts.\n\n  We're leveraging and contributing to other core crates for the runtime so hopefully both crates can benefit from each\n  other.\n\n- [burn](https://github.com/burn-rs/burn) is a general crate that can leverage multiple backends so you can choose the best\n  engine for your workload.\n\n- [tch-rs](https://github.com/LaurentMazare/tch-rs.git) Bindings to the torch library in Rust. Extremely versatile, but they \n  bring in the entire torch library into the runtime. The main contributor of `tch-rs` is also involved in the development\n  of `candle`.\n\n### Common Errors\n\n#### Missing symbols when compiling with the mkl feature.\n\nIf you get some missing symbols when compiling binaries/tests using the mkl\nor accelerate features, e.g. for mkl you get:\n```\n  = note: /usr/bin/ld: (....o): in function `blas::sgemm':\n          .../blas-0.22.0/src/lib.rs:1944: undefined reference to `sgemm_' collect2: error: ld returned 1 exit status\n\n  = note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified\n  = note: use the `-l` flag to specify native libraries to link\n  = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo\n```\nor for accelerate:\n```\nUndefined symbols for architecture arm64:\n            \"_dgemm_\", referenced from:\n                candle_core::accelerate::dgemm::h1b71a038552bcabe in libcandle_core...\n            \"_sgemm_\", referenced from:\n                candle_core::accelerate::sgemm::h2cf21c592cba3c47 in libcandle_core...\n          ld: symbol(s) not found for architecture arm64\n```\n\nThis is likely due to a missing linker flag that was needed to enable the mkl library. You\ncan try adding the following for mkl at the top of your binary:\n```rust\nextern crate intel_mkl_src;\n```\nor for accelerate:\n```rust\nextern crate accelerate_src;\n```\n\n#### Cannot run the LLaMA examples: access to source requires login credentials\n\n```\nError: request error: https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer.json: status code 401\n```\n\nThis is likely because you're not permissioned for the LLaMA-v2 model. To fix\nthis, you have to register on the huggingface-hub, accept the [LLaMA-v2 model\nconditions](https://huggingface.co/meta-llama/Llama-2-7b-hf), and set up your\nauthentication token. See issue\n[#350](https://github.com/huggingface/candle/issues/350) for more details.\n\n#### Missing cute/cutlass headers when compiling flash-attn\n\n```\n  In file included from kernels/flash_fwd_launch_template.h:11:0,\n                   from kernels/flash_fwd_hdim224_fp16_sm80.cu:5:\n  kernels/flash_fwd_kernel.h:8:10: fatal error: cute/algorithm/copy.hpp: No such file or directory\n   #include \u003ccute/algorithm/copy.hpp\u003e\n            ^~~~~~~~~~~~~~~~~~~~~~~~~\n  compilation terminated.\n  Error: nvcc error while compiling:\n```\n[cutlass](https://github.com/NVIDIA/cutlass) is provided as a git submodule so you may want to run the following command to check it in properly.\n```bash\ngit submodule update --init\n```\n\n#### Compiling with flash-attention fails\n\n```\n/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:\n```\n\nThis is a bug in gcc-11 triggered by the Cuda compiler. To fix this, install a different, supported gcc version - for example gcc-10, and specify the path to the compiler in the NVCC_CCBIN environment variable.\n```\nenv NVCC_CCBIN=/usr/lib/gcc/x86_64-linux-gnu/10 cargo ...\n```\n\n#### Linking error on windows when running rustdoc or mdbook tests\n\n```\nCouldn't compile the test.\n---- .\\candle-book\\src\\inference\\hub.md - Using_the_hub::Using_in_a_real_model_ (line 50) stdout ----\nerror: linking with `link.exe` failed: exit code: 1181\n//very long chain of linking\n = note: LINK : fatal error LNK1181: cannot open input file 'windows.0.48.5.lib'\n```\n\nMake sure you link all native libraries that might be located outside a project target, e.g., to run mdbook tests, you should run:\n\n```\nmdbook test candle-book -L .\\target\\debug\\deps\\ `\n-L native=$env:USERPROFILE\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\windows_x86_64_msvc-0.42.2\\lib `\n-L native=$env:USERPROFILE\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\windows_x86_64_msvc-0.48.5\\lib\n```\n\n#### Extremely slow model load time with WSL\n\nThis may be caused by the models being loaded from `/mnt/c`, more details on\n[stackoverflow](https://stackoverflow.com/questions/68972448/why-is-wsl-extremely-slow-when-compared-with-native-windows-npm-yarn-processing).\n\n#### Tracking down errors\n\nYou can set `RUST_BACKTRACE=1` to be provided with backtraces when a candle\nerror is generated.\n\n#### CudaRC error\n\nIf you encounter an error like this one `called `Result::unwrap()` on an `Err` value: LoadLibraryExW { source: Os { code: 126, kind: Uncategorized, message: \"The specified module could not be found.\" } }` on windows. To fix copy and rename these 3 files (make sure they are in path). The paths depend on your cuda version.\n`c:\\Windows\\System32\\nvcuda.dll` -\u003e `cuda.dll`\n`c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.4\\bin\\cublas64_12.dll` -\u003e `cublas.dll`\n`c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.4\\bin\\curand64_10.dll` -\u003e `curand.dll`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Fcandle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhuggingface%2Fcandle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Fcandle/lists"}