An open API service indexing awesome lists of open source software.

https://github.com/mearman/ml-helpers

ML Model Helper Utilities
https://github.com/mearman/ml-helpers

Last synced: 6 months ago
JSON representation

ML Model Helper Utilities

Awesome Lists containing this project

README

        

Name | | | | Parameters | File | Quant method | Bits | Size | Max RAM required | Notes | Host | Owner | Repo | Repo URL | Download
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---
| | | | | | | | | | | | | | |
falcon-7B-Q4_0-ggml | 7 | B | 💾 | | falcon-7B-Q4_0-ggml.bin | q4_0 | 4 | 4.06 GB | | see: https://github.com/ggerganov/ggml/pull/231 Falcon LLM Support · Issue #1602 · ggerganov/llama.cpp · GitHub https://github.com/go-skynet/LocalAI/pull/516 support for falcon model · Issue #217 · ggerganov/ggml · GitHub
Falcon LLM Support · Issue #1602 · ggerganov/llama.cpp · GitHub | huggingface.co | RachidAR | falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml/resolve/main/falcon-7B-Q4_0-ggml.bin
falcon-7B-Q4_1-ggml | 7 | B | 💾 | | falcon-7B-Q4_1-ggml.bin | q4_1 | 4 | 4.51 GB | | | huggingface.co | RachidAR | falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml/resolve/main/falcon-7B-Q4_1-ggml.bin
falcon-7B-Q5_0-ggml | 7 | B | 💾 | | falcon-7B-Q5_0-ggml.bin | q4_0 | 5 | 4.96 GB | | | huggingface.co | RachidAR | falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml/resolve/main/falcon-7B-Q5_0-ggml.bin
falcon-7B-Q5_1-ggml | 7 | B | 💾 | | falcon-7B-Q5_1-ggml.bin | q5_1 | 5 | 5.41 GB | | | huggingface.co | RachidAR | falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml/resolve/main/falcon-7B-Q5_1-ggml.bin
falcon-7B-Q8_0-ggml | 7 | B | 💾 | | falcon-7B-Q8_0-ggml.bin | q8_0 | 8 | 7.67 GB | | | huggingface.co | RachidAR | falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml/resolve/main/falcon-7B-Q8_0-ggml.bin
ggml-replit-code-v1-3b | 3 | B | 💾 | 3,000,000,000 | ggml-replit-code-v1-3b.bin | | 16 | 5.20 GB | | GGML (16bit float) version of Replit V1-3B Code Model. Original model: https://huggingface.co/replit/replit-code-v1-3b | huggingface.co | nomic-ai | ggml-replit-code-v1-3b | https://huggingface.co/nomic-ai/ggml-replit-code-v1-3b | https://huggingface.co/nomic-ai/ggml-replit-code-v1-3b/resolve/main/ggml-replit-code-v1-3b.bin
ggml-gpt4all-l13b-snoozy | 13 | B | 💾 | 13,000,000,000 | ggml-gpt4all-l13b-snoozy.bin | | | 7.58 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-gpt4all-l13b-snoozy.bin
ggml-gpt4all-j-v1.1-breezy | | | 💾 | | ggml-gpt4all-j-v1.1-breezy.bin | | | 3.53 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-gpt4all-j-v1.1-breezy.bin
ggml-gpt4all-j-v1.2-jazzy | | | 💾 | | ggml-gpt4all-j-v1.2-jazzy.bin | | | 3.53 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-gpt4all-j-v1.2-jazzy.bin
ggml-gpt4all-j-v1.3-groovy | | | 💾 | | ggml-gpt4all-j-v1.3-groovy.bin | | | 3.53 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin
ggml-gpt4all-j | | | 💾 | | ggml-gpt4all-j.bin | | | 3.53 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-gpt4all-j.bin
ggml-mpt-7b-base | 7 | B | 💾 | 7,000,000,000 | ggml-mpt-7b-base.bin | | | 4.53 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-mpt-7b-base.bin
ggml-mpt-7b-instruct | 7 | B | 💾 | 7,000,000,000 | ggml-mpt-7b-instruct.bin | | | 4.53 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-mpt-7b-instruct.bin
ggml-nous-gpt4-vicuna-13b | 13 | B | 💾 | 13,000,000,000 | ggml-nous-gpt4-vicuna-13b.bin | | | 7.58 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-nous-gpt4-vicuna-13b.bin
ggml-stable-vicuna-13B.q4_2 | 13 | B | 💾 | 13,000,000,000 | ggml-stable-vicuna-13B.q4_2.bin | | | 7.58 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-stable-vicuna-13B.q4_2.bin
ggml-vicuna-13b-1.1-q4_2 | 13 | B | 💾 | 13,000,000,000 | ggml-vicuna-13b-1.1-q4_2.bin | | | 7.58 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-vicuna-13b-1.1-q4_2.bin
ggml-vicuna-7b-1.1-q4_2 | 7 | B | 💾 | 7,000,000,000 | ggml-vicuna-7b-1.1-q4_2.bin | | | 3.93 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-vicuna-7b-1.1-q4_2.bin
ggml-wizard-13b-uncensored | 13 | B | 💾 | 13,000,000,000 | ggml-wizard-13b-uncensored.bin | | | 7.58 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-wizard-13b-uncensored.bin
ggml-wizardLM-7B.q4_2 | 7 | B | 💾 | 7,000,000,000 | ggml-wizardLM-7B.q4_2.bin | | | 3.93 GB | | | gpt4all.io | | | | https://gpt4all.io/models/ggml-wizardLM-7B.q4_2.bin
alpaca-lora-65B.ggmlv3.q5_1 | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q5_1.bin | q5_1 | 5 | 48.97 GB | 51.47 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q5_1.bin
guanaco-65B.ggmlv3.q5_1 | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q5_1.bin | q5_1 | 5 | 48.97 GB | 51.47 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q5_1.bin
dromedary-lora-65B.ggmlv3.q5_1 | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q5_1.bin | q5_1 | 5 | 48.97 GB | 51.47 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q5_1.bin
gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1 | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin | q5_1 | 5 | 48.97 GB | 51.47 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin
alpaca-lora-65B.ggmlv3.q5_K_M | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 46.20 GB | 48.70 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q5_K_M.bin
guanaco-65B.ggmlv3.q5_K_M | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 46.20 GB | 48.70 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q5_K_M.bin
dromedary-lora-65B.ggmlv3.q5_K_M | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 46.20 GB | 48.70 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q5_K_M.bin
gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_K_M | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 46.20 GB | 48.70 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_K_M.bin
alpaca-lora-65B.ggmlv3.q5_0 | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q5_0.bin | q5_0 | 5 | 44.89 GB | 47.39 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q5_0.bin
alpaca-lora-65B.ggmlv3.q5_K_S | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 44.89 GB | 47.39 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q5_K_S.bin
guanaco-65B.ggmlv3.q5_0 | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q5_0.bin | q5_0 | 5 | 44.89 GB | 47.39 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q5_0.bin
guanaco-65B.ggmlv3.q5_K_S | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 44.89 GB | 47.39 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q5_K_S.bin
dromedary-lora-65B.ggmlv3.q5_0 | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q5_0.bin | q5_0 | 5 | 44.89 GB | 47.39 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q5_0.bin
dromedary-lora-65B.ggmlv3.q5_K_S | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 44.89 GB | 47.39 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q5_K_S.bin
gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_0 | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_0.bin | q5_0 | 5 | 44.89 GB | 47.39 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_0.bin
gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_K_S | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 44.89 GB | 47.39 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_K_S.bin
alpaca-lora-65B.ggmlv3.q4_1 | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q4_1.bin | q4_1 | 4 | 40.81 GB | 43.31 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q4_1.bin
guanaco-65B.ggmlv3.q4_1 | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q4_1.bin | q4_1 | 4 | 40.81 GB | 43.31 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q4_1.bin
dromedary-lora-65B.ggmlv3.q4_1 | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q4_1.bin | q4_1 | 4 | 40.81 GB | 43.31 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q4_1.bin
gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_1 | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_1.bin | q4_1 | 4 | 40.81 GB | 43.31 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_1.bin
alpaca-lora-65B.ggmlv3.q4_K_M | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 39.28 GB | 41.78 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q4_K_M.bin
guanaco-65B.ggmlv3.q4_K_M | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 39.28 GB | 41.78 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q4_K_M.bin
dromedary-lora-65B.ggmlv3.q4_K_M | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 39.28 GB | 41.78 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q4_K_M.bin
gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_K_M | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 39.28 GB | 41.78 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_K_M.bin
alpaca-lora-65B.ggmlv3.q4_0 | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q4_0.bin | q4_0 | 4 | 36.73 GB | 39.23 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q4_0.bin
alpaca-lora-65B.ggmlv3.q4_K_S | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 36.73 GB | 39.23 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q4_K_S.bin
guanaco-65B.ggmlv3.q4_0 | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q4_0.bin | q4_0 | 4 | 36.73 GB | 39.23 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q4_0.bin
guanaco-65B.ggmlv3.q4_K_S | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 36.73 GB | 39.23 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q4_K_S.bin
dromedary-lora-65B.ggmlv3.q4_0 | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q4_0.bin | q4_0 | 4 | 36.73 GB | 39.23 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q4_0.bin
dromedary-lora-65B.ggmlv3.q4_K_S | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 36.73 GB | 39.23 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q4_K_S.bin
gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_0 | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_0.bin | q4_0 | 4 | 36.73 GB | 39.23 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_0.bin
gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_K_S | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 36.73 GB | 39.23 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_K_S.bin
guanaco-33B.ggmlv3.q8_0 | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q8_0.bin | | | 34.60 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q8_0.bin
airoboros-33b-gpt4.ggmlv3.q8_0 | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q8_0.bin | q8_0 | 8 | 34.56 GB | 37.06 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q8_0.bin
WizardLM-30B-Uncensored.ggmlv3.q8_0 | 30 | B | 💾 | 30,000,000,000 | WizardLM-30B-Uncensored.ggmlv3.q8_0.bin | q8_0 | 8 | 34.56 GB | 37.06 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/WizardLM-30B-Uncensored.ggmlv3.q8_0.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0 | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin | q8_0 | 8 | 34.56 GB | 37.06 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin
alpaca-lora-65B.ggmlv3.q3_K_L | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 34.55 GB | 37.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q3_K_L.bin
guanaco-65B.ggmlv3.q3_K_L | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 34.55 GB | 37.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q3_K_L.bin
dromedary-lora-65B.ggmlv3.q3_K_L | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 34.55 GB | 37.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q3_K_L.bin
gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_L | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 34.55 GB | 37.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_L.bin
alpaca-lora-65B.ggmlv3.q3_K_M | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 31.40 GB | 33.90 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q3_K_M.bin
guanaco-65B.ggmlv3.q3_K_M | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 31.40 GB | 33.90 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q3_K_M.bin
dromedary-lora-65B.ggmlv3.q3_K_M | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 31.40 GB | 33.90 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q3_K_M.bin
gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_M | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 31.40 GB | 33.90 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_M.bin
alpaca-lora-65B.ggmlv3.q3_K_S | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 28.06 GB | 30.56 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q3_K_S.bin
guanaco-65B.ggmlv3.q3_K_S | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 28.06 GB | 30.56 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q3_K_S.bin
dromedary-lora-65B.ggmlv3.q3_K_S | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 28.06 GB | 30.56 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q3_K_S.bin
gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_S | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 28.06 GB | 30.56 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_S.bin
alpaca-lora-65B.ggmlv3.q2_K | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q2_K.bin | q2_K | 2 | 27.33 GB | 29.83 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q2_K.bin
guanaco-65B.ggmlv3.q2_K | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q2_K.bin | q2_K | 2 | 27.33 GB | 29.83 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q2_K.bin
dromedary-lora-65B.ggmlv3.q2_K | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q2_K.bin | q2_K | 2 | 27.33 GB | 29.83 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q2_K.bin
gpt4-alpaca-lora_mlp-65B.ggmlv3.q2_K | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q2_K.bin | q2_K | 2 | 27.33 GB | 29.83 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q2_K.bin
guanaco-33B.ggmlv3.q6_K | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q6_K.bin | | | 26.70 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q6_K.bin
airoboros-33b-gpt4.ggmlv3.q6_K | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q6_K.bin | q6_K | 6 | 26.69 GB | 29.19 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q6_K.bin
wizardlm-30b-uncensored.ggmlv3.q6_K | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q6_K.bin | q6_K | 6 | 26.69 GB | 29.19 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q6_K.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q6_K | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q6_K.bin | q6_K | 6 | 26.69 GB | 29.19 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q6_K.bin
guanaco-33B.ggmlv3.q5_1 | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q5_1.bin | | | 24.40 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q5_1.bin
airoboros-33b-gpt4.ggmlv3.q5_1 | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q5_1.bin | q5_1 | 5 | 24.40 GB | 26.90 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q5_1.bin
WizardLM-30B-Uncensored.ggmlv3.q5_1 | 30 | B | 💾 | 30,000,000,000 | WizardLM-30B-Uncensored.ggmlv3.q5_1.bin | q5_1 | 5 | 24.40 GB | 26.90 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/WizardLM-30B-Uncensored.ggmlv3.q5_1.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_1 | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_1.bin | q5_1 | 5 | 24.40 GB | 26.90 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_1.bin
airoboros-33b-gpt4.ggmlv3.q5_K_M | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 23.02 GB | 25.52 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q5_K_M.bin
wizardlm-30b-uncensored.ggmlv3.q5_K_M | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 23.02 GB | 25.52 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q5_K_M.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_K_M | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 23.02 GB | 25.52 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_K_M.bin
guanaco-33B.ggmlv3.q5_K_M | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q5_K_M.bin | | | 23.00 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q5_K_M.bin
guanaco-33B.ggmlv3.q5_0 | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q5_0.bin | | | 22.40 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q5_0.bin
guanaco-33B.ggmlv3.q5_K_S | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q5_K_S.bin | | | 22.40 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q5_K_S.bin
airoboros-33b-gpt4.ggmlv3.q5_0 | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q5_0.bin | q5_0 | 5 | 22.37 GB | 24.87 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q5_0.bin
airoboros-33b-gpt4.ggmlv3.q5_K_S | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 22.37 GB | 24.87 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q5_K_S.bin
WizardLM-30B-Uncensored.ggmlv3.q5_0 | 30 | B | 💾 | 30,000,000,000 | WizardLM-30B-Uncensored.ggmlv3.q5_0.bin | q5_0 | 5 | 22.37 GB | 24.87 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/WizardLM-30B-Uncensored.ggmlv3.q5_0.bin
wizardlm-30b-uncensored.ggmlv3.q5_K_S | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 22.37 GB | 24.87 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q5_K_S.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_0 | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_0.bin | q5_0 | 5 | 22.37 GB | 24.87 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_0.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_K_S | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 22.37 GB | 24.87 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_K_S.bin
airoboros-33b-gpt4.ggmlv3.q4_1 | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q4_1.bin | q4_1 | 4 | 20.33 GB | 22.83 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q4_1.bin
WizardLM-30B-Uncensored.ggmlv3.q4_1 | 30 | B | 💾 | 30,000,000,000 | WizardLM-30B-Uncensored.ggmlv3.q4_1.bin | q4_1 | 4 | 20.33 GB | 22.83 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/WizardLM-30B-Uncensored.ggmlv3.q4_1.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_1 | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_1.bin | q4_1 | 4 | 20.33 GB | 22.83 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_1.bin
guanaco-33B.ggmlv3.q4_1 | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q4_1.bin | | | 20.30 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q4_1.bin
guanaco-33B.ggmlv3.q4_K_M | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q4_K_M.bin | | | 19.60 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q4_K_M.bin
airoboros-33b-gpt4.ggmlv3.q4_K_M | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 19.57 GB | 22.07 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q4_K_M.bin
wizardlm-30b-uncensored.ggmlv3.q4_K_M | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 19.57 GB | 22.07 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q4_K_M.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_M | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 19.57 GB | 22.07 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_M.bin
guanaco-33B.ggmlv3.q4_0 | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q4_0.bin | | | 18.30 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q4_0.bin
guanaco-33B.ggmlv3.q4_K_S | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q4_K_S.bin | | | 18.30 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q4_K_S.bin
airoboros-33b-gpt4.ggmlv3.q4_0 | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q4_0.bin | q4_0 | 4 | 18.30 GB | 20.80 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q4_0.bin
airoboros-33b-gpt4.ggmlv3.q4_K_S | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 18.30 GB | 20.80 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q4_K_S.bin
WizardLM-30B-Uncensored.ggmlv3.q4_0 | 30 | B | 💾 | 30,000,000,000 | WizardLM-30B-Uncensored.ggmlv3.q4_0.bin | q4_0 | 4 | 18.30 GB | 20.80 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/WizardLM-30B-Uncensored.ggmlv3.q4_0.bin
wizardlm-30b-uncensored.ggmlv3.q4_K_S | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 18.30 GB | 20.80 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q4_K_S.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_0 | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_0.bin | q4_0 | 4 | 18.30 GB | 20.80 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_0.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_S | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 18.30 GB | 20.80 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_S.bin
guanaco-33B.ggmlv3.q3_K_L | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q3_K_L.bin | | | 17.20 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q3_K_L.bin
airoboros-33b-gpt4.ggmlv3.q3_K_L | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 17.20 GB | 19.70 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q3_K_L.bin
wizardlm-30b-uncensored.ggmlv3.q3_K_L | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 17.20 GB | 19.70 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q3_K_L.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_L | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 17.20 GB | 19.70 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_L.bin
gpt4-x-vicuna-13B.ggmlv3.q8_0 | 13 | B | 💾 | 13,000,000,000 | gpt4-x-vicuna-13B.ggmlv3.q8_0.bin | q8_0 | 8 | 16.00 GB | 18.00 GB | 8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use. | huggingface.co | TheBloke | gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML/resolve/main/gpt4-x-vicuna-13B.ggmlv3.q8_0.bin
airoboros-33b-gpt4.ggmlv3.q3_K_M | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 15.64 GB | 18.14 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q3_K_M.bin
wizardlm-30b-uncensored.ggmlv3.q3_K_M | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 15.64 GB | 18.14 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q3_K_M.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_M | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 15.64 GB | 18.14 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_M.bin
guanaco-33B.ggmlv3.q3_K_M | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q3_K_M.bin | | | 15.60 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q3_K_M.bin
wizard-mega-13B.ggmlv3.q8_0 | 13 | B | 💾 | 13,000,000,000 | wizard-mega-13B.ggmlv3.q8_0.bin | q8_0 | 8 | 14.60 GB | 17.00 GB | 8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use. | huggingface.co | TheBloke | wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML/resolve/main/wizard-mega-13B.ggmlv3.q8_0.bin
guanaco-33B.ggmlv3.q3_K_S | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q3_K_S.bin | | | 14.00 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q3_K_S.bin
airoboros-33b-gpt4.ggmlv3.q3_K_S | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 13.98 GB | 16.48 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q3_K_S.bin
wizardlm-30b-uncensored.ggmlv3.q3_K_S | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 13.98 GB | 16.48 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q3_K_S.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_S | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 13.98 GB | 16.48 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_S.bin
wizard-vicuna-13B.ggmlv3.q8_0 | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q8_0.bin | q8_0 | 8 | 13.83 GB | 16.33 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q8_0.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q8_0 | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q8_0.bin | q8_0 | 8 | 13.83 GB | 16.33 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q8_0.bin
Manticore-13B.ggmlv3.q8_0 | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q8_0.bin | q8_0 | 8 | 13.83 GB | 16.33 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q8_0.bin
guanaco-33B.ggmlv3.q2_K | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q2_K.bin | | | 13.60 GB | | | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q2_K.bin
airoboros-33b-gpt4.ggmlv3.q2_K | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q2_K.bin | q2_K | 2 | 13.60 GB | 16.10 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q2_K.bin
wizardlm-30b-uncensored.ggmlv3.q2_K | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q2_K.bin | q2_K | 2 | 13.60 GB | 16.10 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q2_K.bin
Wizard-Vicuna-30B-Uncensored.ggmlv3.q2_K | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q2_K.bin | q2_K | 2 | 13.60 GB | 16.10 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q2_K.bin
mpt7b-instruct.ggmlv3.fp16 | 7 | B | 💾 | 7,000,000,000 | mpt7b-instruct.ggmlv3.fp16.bin | fp16 | 16 | 13.30 GB | 16.00 GB | Full 16-bit. | huggingface.co | TheBloke | MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/resolve/main/mpt7b-instruct.ggmlv3.fp16.bin
wizard-vicuna-13B.ggmlv3.q6_K | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q6_K.bin | q6_K | 6 | 10.68 GB | 13.18 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q6_K.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q6_K | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q6_K.bin | q6_K | 6 | 10.68 GB | 13.18 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q6_K.bin
Manticore-13B.ggmlv3.q6_K | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q6_K.bin | q6_K | 6 | 10.68 GB | 13.18 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q6_K.bin
wizard-vicuna-13B.ggmlv3.q5_1 | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.26 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q5_1.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_1 | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.26 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_1.bin
Manticore-13B.ggmlv3.q5_1 | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.26 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q5_1.bin
gpt4-x-vicuna-13B.ggmlv3.q5_1 | 13 | B | 💾 | 13,000,000,000 | gpt4-x-vicuna-13B.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.00 GB | 5-bit. Even higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML/resolve/main/gpt4-x-vicuna-13B.ggmlv3.q5_1.bin
wizard-mega-13B.ggmlv3.q5_1 | 13 | B | 💾 | 13,000,000,000 | wizard-mega-13B.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.25 GB | 5-bit. Even higher accuracy, and higher resource usage and slower inference. | huggingface.co | TheBloke | wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML/resolve/main/wizard-mega-13B.ggmlv3.q5_1.bin
wizard-vicuna-13B.ggmlv3.q5_K_M | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 9.21 GB | 11.71 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q5_K_M.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_M | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 9.21 GB | 11.71 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_M.bin
Manticore-13B.ggmlv3.q5_K_M | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 9.21 GB | 11.71 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q5_K_M.bin
wizard-vicuna-13B.ggmlv3.q5_0 | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.45 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q5_0.bin
wizard-vicuna-13B.ggmlv3.q5_K_S | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 8.95 GB | 11.45 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q5_K_S.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_0 | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.45 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_0.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_S | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 8.95 GB | 11.45 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_S.bin
Manticore-13B.ggmlv3.q5_0 | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.45 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q5_0.bin
Manticore-13B.ggmlv3.q5_K_S | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 8.95 GB | 11.45 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q5_K_S.bin
gpt4-x-vicuna-13B.ggmlv3.q4_1 | 13 | B | 💾 | 13,000,000,000 | gpt4-x-vicuna-13B.ggmlv3.q4_1.bin | q4_1 | 4 | 8.95 GB | 10.00 GB | 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML/resolve/main/gpt4-x-vicuna-13B.ggmlv3.q4_1.bin
gpt4-x-vicuna-13B.ggmlv3.q5_0 | 13 | B | 💾 | 13,000,000,000 | gpt4-x-vicuna-13B.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.00 GB | 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML/resolve/main/gpt4-x-vicuna-13B.ggmlv3.q5_0.bin
wizard-mega-13B.ggmlv3.q4_1 | 13 | B | 💾 | 13,000,000,000 | wizard-mega-13B.ggmlv3.q4_1.bin | q5_0 | 5 | 8.95 GB | 11.00 GB | 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML/resolve/main/wizard-mega-13B.ggmlv3.q4_1.bin
wizard-mega-13B.ggmlv3.q5_0 | 13 | B | 💾 | 13,000,000,000 | wizard-mega-13B.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.00 GB | 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML/resolve/main/wizard-mega-13B.ggmlv3.q5_0.bin
wizard-vicuna-13B.ggmlv3.q4_1 | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q4_1.bin | q4_1 | 4 | 8.14 GB | 10.64 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q4_1.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1 | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin | q4_1 | 4 | 8.14 GB | 10.64 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin
Manticore-13B.ggmlv3.q4_1 | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q4_1.bin | q4_1 | 4 | 8.14 GB | 10.64 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q4_1.bin
gpt4-x-vicuna-13B.ggmlv3.q4_0 | 13 | B | 💾 | 13,000,000,000 | gpt4-x-vicuna-13B.ggmlv3.q4_0.bin | q4_0 | 4 | 8.14 GB | 10.00 GB | 4-bit. | huggingface.co | TheBloke | gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML/resolve/main/gpt4-x-vicuna-13B.ggmlv3.q4_0.bin
wizard-mega-13B.ggmlv3.q4_0 | 13 | B | 💾 | 13,000,000,000 | wizard-mega-13B.ggmlv3.q4_0.bin | q4_0 | 4 | 8.14 GB | 10.50 GB | 4-bit. | huggingface.co | TheBloke | wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML/resolve/main/wizard-mega-13B.ggmlv3.q4_0.bin
wizard-vicuna-13B.ggmlv3.q4_K_M | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 7.82 GB | 10.32 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q4_K_M.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_M | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 7.82 GB | 10.32 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_M.bin
Manticore-13B.ggmlv3.q4_K_M | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 7.82 GB | 10.32 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q4_K_M.bin
mpt7b-instruct.ggmlv3.q8_0 | 7 | B | 💾 | 7,000,000,000 | mpt7b-instruct.ggmlv3.q8_0.bin | q8_0 | 8 | 7.48 GB | 9.70 GB | 8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use. | huggingface.co | TheBloke | MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/resolve/main/mpt7b-instruct.ggmlv3.q8_0.bin
wizard-vicuna-13B.ggmlv3.q4_0 | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q4_0.bin | q4_0 | 4 | 7.32 GB | 9.82 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q4_0.bin
wizard-vicuna-13B.ggmlv3.q4_K_S | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 7.32 GB | 9.82 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q4_K_S.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0 | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin | q4_0 | 4 | 7.32 GB | 9.82 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_S | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 7.32 GB | 9.82 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_S.bin
Manticore-13B.ggmlv3.q4_0 | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q4_0.bin | q4_0 | 4 | 7.32 GB | 9.82 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q4_0.bin
Manticore-13B.ggmlv3.q4_K_S | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 7.32 GB | 9.82 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q4_K_S.bin
llama-7b.ggmlv3.q8_0 | 7 | B | 💾 | 7,000,000,000 | llama-7b.ggmlv3.q8_0.bin | | | 7.16 GB | | | huggingface.co | TheBloke | LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q8_0.bin
wizardLM-7B.ggmlv3.q8_0 | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q8_0.bin | q8_0 | 8 | 7.16 GB | 9.66 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q8_0.bin
WizardLM-7B-uncensored.ggmlv3.q8_0 | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q8_0.bin | q8_0 | 8 | 7.16 GB | 9.66 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q8_0.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q8_0 | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q8_0.bin | q8_0 | 8 | 7.16 GB | 9.66 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q8_0.bin
wizard-vicuna-13B.ggmlv3.q3_K_L | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 6.87 GB | 9.37 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q3_K_L.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_L | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 6.87 GB | 9.37 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_L.bin
Manticore-13B.ggmlv3.q3_K_L | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 6.87 GB | 9.37 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q3_K_L.bin
wizard-vicuna-13B.ggmlv3.q3_K_M | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 6.25 GB | 8.75 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q3_K_M.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_M | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 6.25 GB | 8.75 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_M.bin
Manticore-13B.ggmlv3.q3_K_M | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 6.25 GB | 8.75 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q3_K_M.bin
wizard-vicuna-13B.ggmlv3.q3_K_S | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 5.59 GB | 8.09 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q3_K_S.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_S | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 5.59 GB | 8.09 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_S.bin
Manticore-13B.ggmlv3.q3_K_S | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 5.59 GB | 8.09 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q3_K_S.bin
wizardLM-7B.ggmlv3.q6_K | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q6_K.bin | q6_K | 6 | 5.53 GB | 8.03 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q6_K.bin
WizardLM-7B-uncensored.ggmlv3.q6_K | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q6_K.bin | q6_K | 6 | 5.53 GB | 8.03 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q6_K.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q6_K | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q6_K.bin | q6_K | 6 | 5.53 GB | 8.03 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q6_K.bin
wizard-vicuna-13B.ggmlv3.q2_K | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q2_K.bin | q2_K | 2 | 5.43 GB | 7.93 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q2_K.bin
Wizard-Vicuna-13B-Uncensored.ggmlv3.q2_K | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q2_K.bin | q2_K | 2 | 5.43 GB | 7.93 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q2_K.bin
Manticore-13B.ggmlv3.q2_K | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q2_K.bin | q2_K | 2 | 5.43 GB | 7.93 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q2_K.bin
llama-7b.ggmlv3.q5_1 | 7 | B | 💾 | 7,000,000,000 | llama-7b.ggmlv3.q5_1.bin | | | 5.06 GB | | | huggingface.co | TheBloke | LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q5_1.bin
wizardLM-7B.ggmlv3.q5_1 | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q5_1.bin | q5_1 | 5 | 5.06 GB | 7.56 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q5_1.bin
WizardLM-7B-uncensored.ggmlv3.q5_1 | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q5_1.bin | q5_1 | 5 | 5.06 GB | 7.56 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q5_1.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_1 | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_1.bin | q5_1 | 5 | 5.06 GB | 7.56 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_1.bin
mpt7b-instruct.ggmlv3.q4_1 | 7 | B | 💾 | 7,000,000,000 | mpt7b-instruct.ggmlv3.q4_1.bin | q4_0 | 4 | 4.99 GB | 7.20 GB | 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/resolve/main/mpt7b-instruct.ggmlv3.q4_1.bin
mpt7b-instruct.ggmlv3.q5_1 | 7 | B | 💾 | 7,000,000,000 | mpt7b-instruct.ggmlv3.q5_1.bin | q5_1 | 5 | 4.99 GB | 7.20 GB | 5-bit. Even higher accuracy, and higher resource usage and slower inference. | huggingface.co | TheBloke | MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/resolve/main/mpt7b-instruct.ggmlv3.q5_1.bin
wizardLM-7B.ggmlv3.q5_K_M | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 4.77 GB | 7.27 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q5_K_M.bin
WizardLM-7B-uncensored.ggmlv3.q5_K_M | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 4.77 GB | 7.27 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q5_K_M.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_K_M | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 4.77 GB | 7.27 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_K_M.bin
llama-7b.ggmlv3.q5_0 | 7 | B | 💾 | 7,000,000,000 | llama-7b.ggmlv3.q5_0.bin | | | 4.63 GB | | | huggingface.co | TheBloke | LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q5_0.bin
wizardLM-7B.ggmlv3.q5_0 | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q5_0.bin | q5_0 | 5 | 4.63 GB | 7.13 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q5_0.bin
wizardLM-7B.ggmlv3.q5_K_S | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 4.63 GB | 7.13 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q5_K_S.bin
WizardLM-7B-uncensored.ggmlv3.q5_0 | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q5_0.bin | q5_0 | 5 | 4.63 GB | 7.13 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q5_0.bin
WizardLM-7B-uncensored.ggmlv3.q5_K_S | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 4.63 GB | 7.13 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q5_K_S.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_K_S | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 4.63 GB | 7.13 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_K_S.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_0 | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_0.bin | q5_0 | 5 | 4.63 GB | 7.13 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_0.bin
mpt7b-instruct.ggmlv3.q5_0 | 7 | B | 💾 | 7,000,000,000 | mpt7b-instruct.ggmlv3.q5_0.bin | q5_0 | 5 | 4.57 GB | 6.80 GB | 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/resolve/main/mpt7b-instruct.ggmlv3.q5_0.bin
llama-7b.ggmlv3.q4_1 | 7 | B | 💾 | 7,000,000,000 | llama-7b.ggmlv3.q4_1.bin | | | 4.21 GB | | | huggingface.co | TheBloke | LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q4_1.bin
wizardLM-7B.ggmlv3.q4_1 | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q4_1.bin | q4_1 | 4 | 4.21 GB | 6.71 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q4_1.bin
WizardLM-7B-uncensored.ggmlv3.q4_1 | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q4_1.bin | q4_1 | 4 | 4.21 GB | 6.71 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q4_1.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1 | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1.bin | q4_1 | 4 | 4.21 GB | 6.71 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1.bin
mpt7b-instruct.ggmlv3.q4_0 | 7 | B | 💾 | 7,000,000,000 | mpt7b-instruct.ggmlv3.q4_0.bin | q4_0 | 4 | 4.16 GB | 6.20 GB | 4-bit. | huggingface.co | TheBloke | MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/resolve/main/mpt7b-instruct.ggmlv3.q4_0.bin
wizardLM-7B.ggmlv3.q4_K_M | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 4.05 GB | 6.55 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q4_K_M.bin
WizardLM-7B-uncensored.ggmlv3.q4_K_M | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 4.05 GB | 6.55 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q4_K_M.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_K_M | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 4.05 GB | 6.55 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_K_M.bin
llama-7b.ggmlv3.q4_0 | 7 | B | 💾 | 7,000,000,000 | llama-7b.ggmlv3.q4_0.bin | | | 3.79 GB | | | huggingface.co | TheBloke | LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q4_0.bin
wizardLM-7B.ggmlv3.q4_0 | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q4_0.bin | q4_0 | 4 | 3.79 GB | 6.29 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q4_0.bin
wizardLM-7B.ggmlv3.q4_K_S | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 3.79 GB | 6.29 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q4_K_S.bin
WizardLM-7B-uncensored.ggmlv3.q4_0 | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q4_0.bin | q4_0 | 4 | 3.79 GB | 6.29 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q4_0.bin
WizardLM-7B-uncensored.ggmlv3.q4_K_S | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 3.79 GB | 6.29 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q4_K_S.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_K_S | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 3.79 GB | 6.29 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_K_S.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0 | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin | q4_0 | 4 | 3.79 GB | 6.29 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin
wizardLM-7B.ggmlv3.q3_K_L | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 3.55 GB | 6.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q3_K_L.bin
WizardLM-7B-uncensored.ggmlv3.q3_K_L | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 3.55 GB | 6.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q3_K_L.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_L | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 3.55 GB | 6.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_L.bin
wizardLM-7B.ggmlv3.q3_K_M | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 3.23 GB | 5.73 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q3_K_M.bin
WizardLM-7B-uncensored.ggmlv3.q3_K_M | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 3.23 GB | 5.73 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q3_K_M.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_M | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 3.23 GB | 5.73 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_M.bin
wizardLM-7B.ggmlv3.q3_K_S | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 2.90 GB | 5.40 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q3_K_S.bin
WizardLM-7B-uncensored.ggmlv3.q3_K_S | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 2.90 GB | 5.40 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q3_K_S.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_S | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 2.90 GB | 5.40 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_S.bin
wizardLM-7B.ggmlv3.q2_K | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q2_K.bin | q2_K | 2 | 2.80 GB | 5.30 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q2_K.bin
WizardLM-7B-uncensored.ggmlv3.q2_K | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q2_K.bin | q2_K | 2 | 2.80 GB | 5.30 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q2_K.bin
Wizard-Vicuna-7B-Uncensored.ggmlv3.q2_K | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q2_K.bin | q2_K | 2 | 2.80 GB | 5.30 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q2_K.bin