https://github.com/mearman/ml-helpers

ML Model Helper Utilities
https://github.com/mearman/ml-helpers
Last synced: 6 months ago
JSON representation
ML Model Helper Utilities
Host: GitHub
URL: https://github.com/mearman/ml-helpers
Owner: Mearman
Created: 2023-06-11T14:43:10.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-06-13T00:15:04.000Z (about 2 years ago)
Last Synced: 2024-12-13T05:31:34.055Z (6 months ago)
Language: Shell
Size: 32.2 KB
Stars: 3
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        Name |  |  |  | Parameters | File | Quant method | Bits | Size | Max RAM required | Notes | Host | Owner | Repo | Repo URL | Download

---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---

 |  |  |  |  |  |  |  |  |  |  |  |  |  |  | 

falcon-7B-Q4_0-ggml | 7 | B | 💾 |  | falcon-7B-Q4_0-ggml.bin | q4_0 | 4 | 4.06 GB |  | see: https://github.com/ggerganov/ggml/pull/231 Falcon LLM Support · Issue #1602 · ggerganov/llama.cpp · GitHub https://github.com/go-skynet/LocalAI/pull/516 support for falcon model · Issue #217 · ggerganov/ggml · GitHub

Falcon LLM Support · Issue #1602 · ggerganov/llama.cpp · GitHub | huggingface.co | RachidAR | falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml/resolve/main/falcon-7B-Q4_0-ggml.bin

falcon-7B-Q4_1-ggml | 7 | B | 💾 |  | falcon-7B-Q4_1-ggml.bin | q4_1 | 4 | 4.51 GB |  |  | huggingface.co | RachidAR | falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml/resolve/main/falcon-7B-Q4_1-ggml.bin

falcon-7B-Q5_0-ggml | 7 | B | 💾 |  | falcon-7B-Q5_0-ggml.bin | q4_0 | 5 | 4.96 GB |  |  | huggingface.co | RachidAR | falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml/resolve/main/falcon-7B-Q5_0-ggml.bin

falcon-7B-Q5_1-ggml | 7 | B | 💾 |  | falcon-7B-Q5_1-ggml.bin | q5_1 | 5 | 5.41 GB |  |  | huggingface.co | RachidAR | falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml/resolve/main/falcon-7B-Q5_1-ggml.bin

falcon-7B-Q8_0-ggml | 7 | B | 💾 |  | falcon-7B-Q8_0-ggml.bin | q8_0 | 8 | 7.67 GB |  |  | huggingface.co | RachidAR | falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml | https://huggingface.co/RachidAR/falcon-7B-ggml/resolve/main/falcon-7B-Q8_0-ggml.bin

ggml-replit-code-v1-3b | 3 | B | 💾 | 3,000,000,000 | ggml-replit-code-v1-3b.bin |  | 16 | 5.20 GB |  | GGML (16bit float) version of Replit V1-3B Code Model. Original model: https://huggingface.co/replit/replit-code-v1-3b | huggingface.co | nomic-ai | ggml-replit-code-v1-3b | https://huggingface.co/nomic-ai/ggml-replit-code-v1-3b | https://huggingface.co/nomic-ai/ggml-replit-code-v1-3b/resolve/main/ggml-replit-code-v1-3b.bin

ggml-gpt4all-l13b-snoozy | 13 | B | 💾 | 13,000,000,000 | ggml-gpt4all-l13b-snoozy.bin |  |  | 7.58 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-gpt4all-l13b-snoozy.bin

ggml-gpt4all-j-v1.1-breezy |  |  | 💾 |  | ggml-gpt4all-j-v1.1-breezy.bin |  |  | 3.53 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-gpt4all-j-v1.1-breezy.bin

ggml-gpt4all-j-v1.2-jazzy |  |  | 💾 |  | ggml-gpt4all-j-v1.2-jazzy.bin |  |  | 3.53 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-gpt4all-j-v1.2-jazzy.bin

ggml-gpt4all-j-v1.3-groovy |  |  | 💾 |  | ggml-gpt4all-j-v1.3-groovy.bin |  |  | 3.53 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin

ggml-gpt4all-j |  |  | 💾 |  | ggml-gpt4all-j.bin |  |  | 3.53 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-gpt4all-j.bin

ggml-mpt-7b-base | 7 | B | 💾 | 7,000,000,000 | ggml-mpt-7b-base.bin |  |  | 4.53 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-mpt-7b-base.bin

ggml-mpt-7b-instruct | 7 | B | 💾 | 7,000,000,000 | ggml-mpt-7b-instruct.bin |  |  | 4.53 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-mpt-7b-instruct.bin

ggml-nous-gpt4-vicuna-13b | 13 | B | 💾 | 13,000,000,000 | ggml-nous-gpt4-vicuna-13b.bin |  |  | 7.58 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-nous-gpt4-vicuna-13b.bin

ggml-stable-vicuna-13B.q4_2 | 13 | B | 💾 | 13,000,000,000 | ggml-stable-vicuna-13B.q4_2.bin |  |  | 7.58 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-stable-vicuna-13B.q4_2.bin

ggml-vicuna-13b-1.1-q4_2 | 13 | B | 💾 | 13,000,000,000 | ggml-vicuna-13b-1.1-q4_2.bin |  |  | 7.58 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-vicuna-13b-1.1-q4_2.bin

ggml-vicuna-7b-1.1-q4_2 | 7 | B | 💾 | 7,000,000,000 | ggml-vicuna-7b-1.1-q4_2.bin |  |  | 3.93 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-vicuna-7b-1.1-q4_2.bin

ggml-wizard-13b-uncensored | 13 | B | 💾 | 13,000,000,000 | ggml-wizard-13b-uncensored.bin |  |  | 7.58 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-wizard-13b-uncensored.bin

ggml-wizardLM-7B.q4_2 | 7 | B | 💾 | 7,000,000,000 | ggml-wizardLM-7B.q4_2.bin |  |  | 3.93 GB |  |  | gpt4all.io |  |  |  | https://gpt4all.io/models/ggml-wizardLM-7B.q4_2.bin

alpaca-lora-65B.ggmlv3.q5_1 | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q5_1.bin | q5_1 | 5 | 48.97 GB | 51.47 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q5_1.bin

guanaco-65B.ggmlv3.q5_1 | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q5_1.bin | q5_1 | 5 | 48.97 GB | 51.47 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q5_1.bin

dromedary-lora-65B.ggmlv3.q5_1 | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q5_1.bin | q5_1 | 5 | 48.97 GB | 51.47 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q5_1.bin

gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1 | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin | q5_1 | 5 | 48.97 GB | 51.47 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin

alpaca-lora-65B.ggmlv3.q5_K_M | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 46.20 GB | 48.70 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q5_K_M.bin

guanaco-65B.ggmlv3.q5_K_M | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 46.20 GB | 48.70 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q5_K_M.bin

dromedary-lora-65B.ggmlv3.q5_K_M | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 46.20 GB | 48.70 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q5_K_M.bin

gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_K_M | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 46.20 GB | 48.70 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_K_M.bin

alpaca-lora-65B.ggmlv3.q5_0 | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q5_0.bin | q5_0 | 5 | 44.89 GB | 47.39 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q5_0.bin

alpaca-lora-65B.ggmlv3.q5_K_S | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 44.89 GB | 47.39 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q5_K_S.bin

guanaco-65B.ggmlv3.q5_0 | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q5_0.bin | q5_0 | 5 | 44.89 GB | 47.39 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q5_0.bin

guanaco-65B.ggmlv3.q5_K_S | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 44.89 GB | 47.39 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q5_K_S.bin

dromedary-lora-65B.ggmlv3.q5_0 | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q5_0.bin | q5_0 | 5 | 44.89 GB | 47.39 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q5_0.bin

dromedary-lora-65B.ggmlv3.q5_K_S | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 44.89 GB | 47.39 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q5_K_S.bin

gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_0 | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_0.bin | q5_0 | 5 | 44.89 GB | 47.39 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_0.bin

gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_K_S | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 44.89 GB | 47.39 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_K_S.bin

alpaca-lora-65B.ggmlv3.q4_1 | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q4_1.bin | q4_1 | 4 | 40.81 GB | 43.31 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q4_1.bin

guanaco-65B.ggmlv3.q4_1 | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q4_1.bin | q4_1 | 4 | 40.81 GB | 43.31 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q4_1.bin

dromedary-lora-65B.ggmlv3.q4_1 | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q4_1.bin | q4_1 | 4 | 40.81 GB | 43.31 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q4_1.bin

gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_1 | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_1.bin | q4_1 | 4 | 40.81 GB | 43.31 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_1.bin

alpaca-lora-65B.ggmlv3.q4_K_M | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 39.28 GB | 41.78 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q4_K_M.bin

guanaco-65B.ggmlv3.q4_K_M | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 39.28 GB | 41.78 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q4_K_M.bin

dromedary-lora-65B.ggmlv3.q4_K_M | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 39.28 GB | 41.78 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q4_K_M.bin

gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_K_M | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 39.28 GB | 41.78 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_K_M.bin

alpaca-lora-65B.ggmlv3.q4_0 | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q4_0.bin | q4_0 | 4 | 36.73 GB | 39.23 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q4_0.bin

alpaca-lora-65B.ggmlv3.q4_K_S | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 36.73 GB | 39.23 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q4_K_S.bin

guanaco-65B.ggmlv3.q4_0 | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q4_0.bin | q4_0 | 4 | 36.73 GB | 39.23 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q4_0.bin

guanaco-65B.ggmlv3.q4_K_S | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 36.73 GB | 39.23 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q4_K_S.bin

dromedary-lora-65B.ggmlv3.q4_0 | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q4_0.bin | q4_0 | 4 | 36.73 GB | 39.23 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q4_0.bin

dromedary-lora-65B.ggmlv3.q4_K_S | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 36.73 GB | 39.23 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q4_K_S.bin

gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_0 | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_0.bin | q4_0 | 4 | 36.73 GB | 39.23 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_0.bin

gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_K_S | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 36.73 GB | 39.23 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q4_K_S.bin

guanaco-33B.ggmlv3.q8_0 | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q8_0.bin |  |  | 34.60 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q8_0.bin

airoboros-33b-gpt4.ggmlv3.q8_0 | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q8_0.bin | q8_0 | 8 | 34.56 GB | 37.06 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q8_0.bin

WizardLM-30B-Uncensored.ggmlv3.q8_0 | 30 | B | 💾 | 30,000,000,000 | WizardLM-30B-Uncensored.ggmlv3.q8_0.bin | q8_0 | 8 | 34.56 GB | 37.06 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/WizardLM-30B-Uncensored.ggmlv3.q8_0.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0 | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin | q8_0 | 8 | 34.56 GB | 37.06 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin

alpaca-lora-65B.ggmlv3.q3_K_L | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 34.55 GB | 37.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q3_K_L.bin

guanaco-65B.ggmlv3.q3_K_L | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 34.55 GB | 37.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q3_K_L.bin

dromedary-lora-65B.ggmlv3.q3_K_L | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 34.55 GB | 37.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q3_K_L.bin

gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_L | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 34.55 GB | 37.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_L.bin

alpaca-lora-65B.ggmlv3.q3_K_M | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 31.40 GB | 33.90 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q3_K_M.bin

guanaco-65B.ggmlv3.q3_K_M | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 31.40 GB | 33.90 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q3_K_M.bin

dromedary-lora-65B.ggmlv3.q3_K_M | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 31.40 GB | 33.90 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q3_K_M.bin

gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_M | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 31.40 GB | 33.90 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_M.bin

alpaca-lora-65B.ggmlv3.q3_K_S | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 28.06 GB | 30.56 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q3_K_S.bin

guanaco-65B.ggmlv3.q3_K_S | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 28.06 GB | 30.56 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q3_K_S.bin

dromedary-lora-65B.ggmlv3.q3_K_S | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 28.06 GB | 30.56 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q3_K_S.bin

gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_S | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 28.06 GB | 30.56 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q3_K_S.bin

alpaca-lora-65B.ggmlv3.q2_K | 65 | B | 💾 | 65,000,000,000 | alpaca-lora-65B.ggmlv3.q2_K.bin | q2_K | 2 | 27.33 GB | 29.83 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML | https://huggingface.co/TheBloke/alpaca-lora-65B-GGML/resolve/main/alpaca-lora-65B.ggmlv3.q2_K.bin

guanaco-65B.ggmlv3.q2_K | 65 | B | 💾 | 65,000,000,000 | guanaco-65B.ggmlv3.q2_K.bin | q2_K | 2 | 27.33 GB | 29.83 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML | https://huggingface.co/TheBloke/guanaco-65B-GGML/resolve/main/guanaco-65B.ggmlv3.q2_K.bin

dromedary-lora-65B.ggmlv3.q2_K | 65 | B | 💾 | 65,000,000,000 | dromedary-lora-65B.ggmlv3.q2_K.bin | q2_K | 2 | 27.33 GB | 29.83 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML | https://huggingface.co/TheBloke/dromedary-65B-lora-GGML/resolve/main/dromedary-lora-65B.ggmlv3.q2_K.bin

gpt4-alpaca-lora_mlp-65B.ggmlv3.q2_K | 65 | B | 💾 | 65,000,000,000 | gpt4-alpaca-lora_mlp-65B.ggmlv3.q2_K.bin | q2_K | 2 | 27.33 GB | 29.83 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML | https://huggingface.co/TheBloke/gpt4-alpaca-lora_mlp-65B-GGML/resolve/main/gpt4-alpaca-lora_mlp-65B.ggmlv3.q2_K.bin

guanaco-33B.ggmlv3.q6_K | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q6_K.bin |  |  | 26.70 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q6_K.bin

airoboros-33b-gpt4.ggmlv3.q6_K | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q6_K.bin | q6_K | 6 | 26.69 GB | 29.19 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q6_K.bin

wizardlm-30b-uncensored.ggmlv3.q6_K | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q6_K.bin | q6_K | 6 | 26.69 GB | 29.19 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q6_K.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q6_K | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q6_K.bin | q6_K | 6 | 26.69 GB | 29.19 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q6_K.bin

guanaco-33B.ggmlv3.q5_1 | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q5_1.bin |  |  | 24.40 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q5_1.bin

airoboros-33b-gpt4.ggmlv3.q5_1 | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q5_1.bin | q5_1 | 5 | 24.40 GB | 26.90 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q5_1.bin

WizardLM-30B-Uncensored.ggmlv3.q5_1 | 30 | B | 💾 | 30,000,000,000 | WizardLM-30B-Uncensored.ggmlv3.q5_1.bin | q5_1 | 5 | 24.40 GB | 26.90 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/WizardLM-30B-Uncensored.ggmlv3.q5_1.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_1 | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_1.bin | q5_1 | 5 | 24.40 GB | 26.90 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_1.bin

airoboros-33b-gpt4.ggmlv3.q5_K_M | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 23.02 GB | 25.52 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q5_K_M.bin

wizardlm-30b-uncensored.ggmlv3.q5_K_M | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 23.02 GB | 25.52 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q5_K_M.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_K_M | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 23.02 GB | 25.52 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_K_M.bin

guanaco-33B.ggmlv3.q5_K_M | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q5_K_M.bin |  |  | 23.00 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q5_K_M.bin

guanaco-33B.ggmlv3.q5_0 | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q5_0.bin |  |  | 22.40 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q5_0.bin

guanaco-33B.ggmlv3.q5_K_S | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q5_K_S.bin |  |  | 22.40 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q5_K_S.bin

airoboros-33b-gpt4.ggmlv3.q5_0 | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q5_0.bin | q5_0 | 5 | 22.37 GB | 24.87 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q5_0.bin

airoboros-33b-gpt4.ggmlv3.q5_K_S | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 22.37 GB | 24.87 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q5_K_S.bin

WizardLM-30B-Uncensored.ggmlv3.q5_0 | 30 | B | 💾 | 30,000,000,000 | WizardLM-30B-Uncensored.ggmlv3.q5_0.bin | q5_0 | 5 | 22.37 GB | 24.87 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/WizardLM-30B-Uncensored.ggmlv3.q5_0.bin

wizardlm-30b-uncensored.ggmlv3.q5_K_S | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 22.37 GB | 24.87 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q5_K_S.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_0 | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_0.bin | q5_0 | 5 | 22.37 GB | 24.87 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_0.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_K_S | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 22.37 GB | 24.87 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q5_K_S.bin

airoboros-33b-gpt4.ggmlv3.q4_1 | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q4_1.bin | q4_1 | 4 | 20.33 GB | 22.83 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q4_1.bin

WizardLM-30B-Uncensored.ggmlv3.q4_1 | 30 | B | 💾 | 30,000,000,000 | WizardLM-30B-Uncensored.ggmlv3.q4_1.bin | q4_1 | 4 | 20.33 GB | 22.83 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/WizardLM-30B-Uncensored.ggmlv3.q4_1.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_1 | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_1.bin | q4_1 | 4 | 20.33 GB | 22.83 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_1.bin

guanaco-33B.ggmlv3.q4_1 | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q4_1.bin |  |  | 20.30 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q4_1.bin

guanaco-33B.ggmlv3.q4_K_M | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q4_K_M.bin |  |  | 19.60 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q4_K_M.bin

airoboros-33b-gpt4.ggmlv3.q4_K_M | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 19.57 GB | 22.07 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q4_K_M.bin

wizardlm-30b-uncensored.ggmlv3.q4_K_M | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 19.57 GB | 22.07 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q4_K_M.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_M | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 19.57 GB | 22.07 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_M.bin

guanaco-33B.ggmlv3.q4_0 | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q4_0.bin |  |  | 18.30 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q4_0.bin

guanaco-33B.ggmlv3.q4_K_S | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q4_K_S.bin |  |  | 18.30 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q4_K_S.bin

airoboros-33b-gpt4.ggmlv3.q4_0 | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q4_0.bin | q4_0 | 4 | 18.30 GB | 20.80 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q4_0.bin

airoboros-33b-gpt4.ggmlv3.q4_K_S | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 18.30 GB | 20.80 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q4_K_S.bin

WizardLM-30B-Uncensored.ggmlv3.q4_0 | 30 | B | 💾 | 30,000,000,000 | WizardLM-30B-Uncensored.ggmlv3.q4_0.bin | q4_0 | 4 | 18.30 GB | 20.80 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/WizardLM-30B-Uncensored.ggmlv3.q4_0.bin

wizardlm-30b-uncensored.ggmlv3.q4_K_S | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 18.30 GB | 20.80 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q4_K_S.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_0 | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_0.bin | q4_0 | 4 | 18.30 GB | 20.80 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_0.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_S | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 18.30 GB | 20.80 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_S.bin

guanaco-33B.ggmlv3.q3_K_L | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q3_K_L.bin |  |  | 17.20 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q3_K_L.bin

airoboros-33b-gpt4.ggmlv3.q3_K_L | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 17.20 GB | 19.70 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q3_K_L.bin

wizardlm-30b-uncensored.ggmlv3.q3_K_L | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 17.20 GB | 19.70 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q3_K_L.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_L | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 17.20 GB | 19.70 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_L.bin

gpt4-x-vicuna-13B.ggmlv3.q8_0 | 13 | B | 💾 | 13,000,000,000 | gpt4-x-vicuna-13B.ggmlv3.q8_0.bin | q8_0 | 8 | 16.00 GB | 18.00 GB | 8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use. | huggingface.co | TheBloke | gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML/resolve/main/gpt4-x-vicuna-13B.ggmlv3.q8_0.bin

airoboros-33b-gpt4.ggmlv3.q3_K_M | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 15.64 GB | 18.14 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q3_K_M.bin

wizardlm-30b-uncensored.ggmlv3.q3_K_M | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 15.64 GB | 18.14 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q3_K_M.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_M | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 15.64 GB | 18.14 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_M.bin

guanaco-33B.ggmlv3.q3_K_M | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q3_K_M.bin |  |  | 15.60 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q3_K_M.bin

wizard-mega-13B.ggmlv3.q8_0 | 13 | B | 💾 | 13,000,000,000 | wizard-mega-13B.ggmlv3.q8_0.bin | q8_0 | 8 | 14.60 GB | 17.00 GB | 8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use. | huggingface.co | TheBloke | wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML/resolve/main/wizard-mega-13B.ggmlv3.q8_0.bin

guanaco-33B.ggmlv3.q3_K_S | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q3_K_S.bin |  |  | 14.00 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q3_K_S.bin

airoboros-33b-gpt4.ggmlv3.q3_K_S | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 13.98 GB | 16.48 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q3_K_S.bin

wizardlm-30b-uncensored.ggmlv3.q3_K_S | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 13.98 GB | 16.48 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q3_K_S.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_S | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 13.98 GB | 16.48 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q3_K_S.bin

wizard-vicuna-13B.ggmlv3.q8_0 | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q8_0.bin | q8_0 | 8 | 13.83 GB | 16.33 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q8_0.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q8_0 | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q8_0.bin | q8_0 | 8 | 13.83 GB | 16.33 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q8_0.bin

Manticore-13B.ggmlv3.q8_0 | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q8_0.bin | q8_0 | 8 | 13.83 GB | 16.33 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q8_0.bin

guanaco-33B.ggmlv3.q2_K | 33 | B | 💾 | 33,000,000,000 | guanaco-33B.ggmlv3.q2_K.bin |  |  | 13.60 GB |  |  | huggingface.co | TheBloke | guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML | https://huggingface.co/TheBloke/guanaco-33B-GGML/resolve/main/guanaco-33B.ggmlv3.q2_K.bin

airoboros-33b-gpt4.ggmlv3.q2_K | 33 | B | 💾 | 33,000,000,000 | airoboros-33b-gpt4.ggmlv3.q2_K.bin | q2_K | 2 | 13.60 GB | 16.10 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML | https://huggingface.co/TheBloke/airoboros-33b-gpt4-GGML/resolve/main/airoboros-33b-gpt4.ggmlv3.q2_K.bin

wizardlm-30b-uncensored.ggmlv3.q2_K | 30 | B | 💾 | 30,000,000,000 | wizardlm-30b-uncensored.ggmlv3.q2_K.bin | q2_K | 2 | 13.60 GB | 16.10 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-30B-Uncensored-GGML/resolve/main/wizardlm-30b-uncensored.ggmlv3.q2_K.bin

Wizard-Vicuna-30B-Uncensored.ggmlv3.q2_K | 30 | B | 💾 | 30,000,000,000 | Wizard-Vicuna-30B-Uncensored.ggmlv3.q2_K.bin | q2_K | 2 | 13.60 GB | 16.10 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q2_K.bin

mpt7b-instruct.ggmlv3.fp16 | 7 | B | 💾 | 7,000,000,000 | mpt7b-instruct.ggmlv3.fp16.bin | fp16 | 16 | 13.30 GB | 16.00 GB | Full 16-bit. | huggingface.co | TheBloke | MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/resolve/main/mpt7b-instruct.ggmlv3.fp16.bin

wizard-vicuna-13B.ggmlv3.q6_K | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q6_K.bin | q6_K | 6 | 10.68 GB | 13.18 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q6_K.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q6_K | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q6_K.bin | q6_K | 6 | 10.68 GB | 13.18 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q6_K.bin

Manticore-13B.ggmlv3.q6_K | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q6_K.bin | q6_K | 6 | 10.68 GB | 13.18 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q6_K.bin

wizard-vicuna-13B.ggmlv3.q5_1 | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.26 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q5_1.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_1 | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.26 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_1.bin

Manticore-13B.ggmlv3.q5_1 | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.26 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q5_1.bin

gpt4-x-vicuna-13B.ggmlv3.q5_1 | 13 | B | 💾 | 13,000,000,000 | gpt4-x-vicuna-13B.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.00 GB | 5-bit. Even higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML/resolve/main/gpt4-x-vicuna-13B.ggmlv3.q5_1.bin

wizard-mega-13B.ggmlv3.q5_1 | 13 | B | 💾 | 13,000,000,000 | wizard-mega-13B.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.25 GB | 5-bit. Even higher accuracy, and higher resource usage and slower inference. | huggingface.co | TheBloke | wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML/resolve/main/wizard-mega-13B.ggmlv3.q5_1.bin

wizard-vicuna-13B.ggmlv3.q5_K_M | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 9.21 GB | 11.71 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q5_K_M.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_M | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 9.21 GB | 11.71 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_M.bin

Manticore-13B.ggmlv3.q5_K_M | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 9.21 GB | 11.71 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q5_K_M.bin

wizard-vicuna-13B.ggmlv3.q5_0 | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.45 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q5_0.bin

wizard-vicuna-13B.ggmlv3.q5_K_S | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 8.95 GB | 11.45 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q5_K_S.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_0 | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.45 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_0.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_S | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 8.95 GB | 11.45 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_S.bin

Manticore-13B.ggmlv3.q5_0 | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.45 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q5_0.bin

Manticore-13B.ggmlv3.q5_K_S | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 8.95 GB | 11.45 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q5_K_S.bin

gpt4-x-vicuna-13B.ggmlv3.q4_1 | 13 | B | 💾 | 13,000,000,000 | gpt4-x-vicuna-13B.ggmlv3.q4_1.bin | q4_1 | 4 | 8.95 GB | 10.00 GB | 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML/resolve/main/gpt4-x-vicuna-13B.ggmlv3.q4_1.bin

gpt4-x-vicuna-13B.ggmlv3.q5_0 | 13 | B | 💾 | 13,000,000,000 | gpt4-x-vicuna-13B.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.00 GB | 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML/resolve/main/gpt4-x-vicuna-13B.ggmlv3.q5_0.bin

wizard-mega-13B.ggmlv3.q4_1 | 13 | B | 💾 | 13,000,000,000 | wizard-mega-13B.ggmlv3.q4_1.bin | q5_0 | 5 | 8.95 GB | 11.00 GB | 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML/resolve/main/wizard-mega-13B.ggmlv3.q4_1.bin

wizard-mega-13B.ggmlv3.q5_0 | 13 | B | 💾 | 13,000,000,000 | wizard-mega-13B.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.00 GB | 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML/resolve/main/wizard-mega-13B.ggmlv3.q5_0.bin

wizard-vicuna-13B.ggmlv3.q4_1 | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q4_1.bin | q4_1 | 4 | 8.14 GB | 10.64 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q4_1.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1 | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin | q4_1 | 4 | 8.14 GB | 10.64 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin

Manticore-13B.ggmlv3.q4_1 | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q4_1.bin | q4_1 | 4 | 8.14 GB | 10.64 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q4_1.bin

gpt4-x-vicuna-13B.ggmlv3.q4_0 | 13 | B | 💾 | 13,000,000,000 | gpt4-x-vicuna-13B.ggmlv3.q4_0.bin | q4_0 | 4 | 8.14 GB | 10.00 GB | 4-bit. | huggingface.co | TheBloke | gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML | https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML/resolve/main/gpt4-x-vicuna-13B.ggmlv3.q4_0.bin

wizard-mega-13B.ggmlv3.q4_0 | 13 | B | 💾 | 13,000,000,000 | wizard-mega-13B.ggmlv3.q4_0.bin | q4_0 | 4 | 8.14 GB | 10.50 GB | 4-bit. | huggingface.co | TheBloke | wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML | https://huggingface.co/TheBloke/wizard-mega-13B-GGML/resolve/main/wizard-mega-13B.ggmlv3.q4_0.bin

wizard-vicuna-13B.ggmlv3.q4_K_M | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 7.82 GB | 10.32 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q4_K_M.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_M | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 7.82 GB | 10.32 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_M.bin

Manticore-13B.ggmlv3.q4_K_M | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 7.82 GB | 10.32 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q4_K_M.bin

mpt7b-instruct.ggmlv3.q8_0 | 7 | B | 💾 | 7,000,000,000 | mpt7b-instruct.ggmlv3.q8_0.bin | q8_0 | 8 | 7.48 GB | 9.70 GB | 8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use. | huggingface.co | TheBloke | MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/resolve/main/mpt7b-instruct.ggmlv3.q8_0.bin

wizard-vicuna-13B.ggmlv3.q4_0 | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q4_0.bin | q4_0 | 4 | 7.32 GB | 9.82 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q4_0.bin

wizard-vicuna-13B.ggmlv3.q4_K_S | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 7.32 GB | 9.82 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q4_K_S.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0 | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin | q4_0 | 4 | 7.32 GB | 9.82 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_S | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 7.32 GB | 9.82 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_S.bin

Manticore-13B.ggmlv3.q4_0 | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q4_0.bin | q4_0 | 4 | 7.32 GB | 9.82 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q4_0.bin

Manticore-13B.ggmlv3.q4_K_S | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 7.32 GB | 9.82 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q4_K_S.bin

llama-7b.ggmlv3.q8_0 | 7 | B | 💾 | 7,000,000,000 | llama-7b.ggmlv3.q8_0.bin |  |  | 7.16 GB |  |  | huggingface.co | TheBloke | LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q8_0.bin

wizardLM-7B.ggmlv3.q8_0 | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q8_0.bin | q8_0 | 8 | 7.16 GB | 9.66 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q8_0.bin

WizardLM-7B-uncensored.ggmlv3.q8_0 | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q8_0.bin | q8_0 | 8 | 7.16 GB | 9.66 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q8_0.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q8_0 | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q8_0.bin | q8_0 | 8 | 7.16 GB | 9.66 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q8_0.bin

wizard-vicuna-13B.ggmlv3.q3_K_L | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 6.87 GB | 9.37 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q3_K_L.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_L | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 6.87 GB | 9.37 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_L.bin

Manticore-13B.ggmlv3.q3_K_L | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 6.87 GB | 9.37 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q3_K_L.bin

wizard-vicuna-13B.ggmlv3.q3_K_M | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 6.25 GB | 8.75 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q3_K_M.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_M | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 6.25 GB | 8.75 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_M.bin

Manticore-13B.ggmlv3.q3_K_M | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 6.25 GB | 8.75 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q3_K_M.bin

wizard-vicuna-13B.ggmlv3.q3_K_S | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 5.59 GB | 8.09 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q3_K_S.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_S | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 5.59 GB | 8.09 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q3_K_S.bin

Manticore-13B.ggmlv3.q3_K_S | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 5.59 GB | 8.09 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q3_K_S.bin

wizardLM-7B.ggmlv3.q6_K | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q6_K.bin | q6_K | 6 | 5.53 GB | 8.03 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q6_K.bin

WizardLM-7B-uncensored.ggmlv3.q6_K | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q6_K.bin | q6_K | 6 | 5.53 GB | 8.03 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q6_K.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q6_K | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q6_K.bin | q6_K | 6 | 5.53 GB | 8.03 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q6_K.bin

wizard-vicuna-13B.ggmlv3.q2_K | 13 | B | 💾 | 13,000,000,000 | wizard-vicuna-13B.ggmlv3.q2_K.bin | q2_K | 2 | 5.43 GB | 7.93 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML | https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/resolve/main/wizard-vicuna-13B.ggmlv3.q2_K.bin

Wizard-Vicuna-13B-Uncensored.ggmlv3.q2_K | 13 | B | 💾 | 13,000,000,000 | Wizard-Vicuna-13B-Uncensored.ggmlv3.q2_K.bin | q2_K | 2 | 5.43 GB | 7.93 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q2_K.bin

Manticore-13B.ggmlv3.q2_K | 13 | B | 💾 | 13,000,000,000 | Manticore-13B.ggmlv3.q2_K.bin | q2_K | 2 | 5.43 GB | 7.93 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML | https://huggingface.co/TheBloke/Manticore-13B-GGML/resolve/main/Manticore-13B.ggmlv3.q2_K.bin

llama-7b.ggmlv3.q5_1 | 7 | B | 💾 | 7,000,000,000 | llama-7b.ggmlv3.q5_1.bin |  |  | 5.06 GB |  |  | huggingface.co | TheBloke | LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q5_1.bin

wizardLM-7B.ggmlv3.q5_1 | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q5_1.bin | q5_1 | 5 | 5.06 GB | 7.56 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q5_1.bin

WizardLM-7B-uncensored.ggmlv3.q5_1 | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q5_1.bin | q5_1 | 5 | 5.06 GB | 7.56 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q5_1.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_1 | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_1.bin | q5_1 | 5 | 5.06 GB | 7.56 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_1.bin

mpt7b-instruct.ggmlv3.q4_1 | 7 | B | 💾 | 7,000,000,000 | mpt7b-instruct.ggmlv3.q4_1.bin | q4_0 | 4 | 4.99 GB | 7.20 GB | 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/resolve/main/mpt7b-instruct.ggmlv3.q4_1.bin

mpt7b-instruct.ggmlv3.q5_1 | 7 | B | 💾 | 7,000,000,000 | mpt7b-instruct.ggmlv3.q5_1.bin | q5_1 | 5 | 4.99 GB | 7.20 GB | 5-bit. Even higher accuracy, and higher resource usage and slower inference. | huggingface.co | TheBloke | MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/resolve/main/mpt7b-instruct.ggmlv3.q5_1.bin

wizardLM-7B.ggmlv3.q5_K_M | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 4.77 GB | 7.27 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q5_K_M.bin

WizardLM-7B-uncensored.ggmlv3.q5_K_M | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 4.77 GB | 7.27 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q5_K_M.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_K_M | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 4.77 GB | 7.27 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_K_M.bin

llama-7b.ggmlv3.q5_0 | 7 | B | 💾 | 7,000,000,000 | llama-7b.ggmlv3.q5_0.bin |  |  | 4.63 GB |  |  | huggingface.co | TheBloke | LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q5_0.bin

wizardLM-7B.ggmlv3.q5_0 | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q5_0.bin | q5_0 | 5 | 4.63 GB | 7.13 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q5_0.bin

wizardLM-7B.ggmlv3.q5_K_S | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 4.63 GB | 7.13 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q5_K_S.bin

WizardLM-7B-uncensored.ggmlv3.q5_0 | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q5_0.bin | q5_0 | 5 | 4.63 GB | 7.13 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q5_0.bin

WizardLM-7B-uncensored.ggmlv3.q5_K_S | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 4.63 GB | 7.13 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q5_K_S.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_K_S | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 4.63 GB | 7.13 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_K_S.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_0 | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_0.bin | q5_0 | 5 | 4.63 GB | 7.13 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_0.bin

mpt7b-instruct.ggmlv3.q5_0 | 7 | B | 💾 | 7,000,000,000 | mpt7b-instruct.ggmlv3.q5_0.bin | q5_0 | 5 | 4.57 GB | 6.80 GB | 5-bit. Higher accuracy, higher resource usage and slower inference. | huggingface.co | TheBloke | MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/resolve/main/mpt7b-instruct.ggmlv3.q5_0.bin

llama-7b.ggmlv3.q4_1 | 7 | B | 💾 | 7,000,000,000 | llama-7b.ggmlv3.q4_1.bin |  |  | 4.21 GB |  |  | huggingface.co | TheBloke | LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q4_1.bin

wizardLM-7B.ggmlv3.q4_1 | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q4_1.bin | q4_1 | 4 | 4.21 GB | 6.71 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q4_1.bin

WizardLM-7B-uncensored.ggmlv3.q4_1 | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q4_1.bin | q4_1 | 4 | 4.21 GB | 6.71 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q4_1.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1 | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1.bin | q4_1 | 4 | 4.21 GB | 6.71 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1.bin

mpt7b-instruct.ggmlv3.q4_0 | 7 | B | 💾 | 7,000,000,000 | mpt7b-instruct.ggmlv3.q4_0.bin | q4_0 | 4 | 4.16 GB | 6.20 GB | 4-bit. | huggingface.co | TheBloke | MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML | https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/resolve/main/mpt7b-instruct.ggmlv3.q4_0.bin

wizardLM-7B.ggmlv3.q4_K_M | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 4.05 GB | 6.55 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q4_K_M.bin

WizardLM-7B-uncensored.ggmlv3.q4_K_M | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 4.05 GB | 6.55 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q4_K_M.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_K_M | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 4.05 GB | 6.55 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_K_M.bin

llama-7b.ggmlv3.q4_0 | 7 | B | 💾 | 7,000,000,000 | llama-7b.ggmlv3.q4_0.bin |  |  | 3.79 GB |  |  | huggingface.co | TheBloke | LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML | https://huggingface.co/TheBloke/LLaMa-7B-GGML/resolve/main/llama-7b.ggmlv3.q4_0.bin

wizardLM-7B.ggmlv3.q4_0 | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q4_0.bin | q4_0 | 4 | 3.79 GB | 6.29 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q4_0.bin

wizardLM-7B.ggmlv3.q4_K_S | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 3.79 GB | 6.29 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q4_K_S.bin

WizardLM-7B-uncensored.ggmlv3.q4_0 | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q4_0.bin | q4_0 | 4 | 3.79 GB | 6.29 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q4_0.bin

WizardLM-7B-uncensored.ggmlv3.q4_K_S | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 3.79 GB | 6.29 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q4_K_S.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_K_S | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 3.79 GB | 6.29 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_K_S.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0 | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin | q4_0 | 4 | 3.79 GB | 6.29 GB | Original llama.cpp quant method, 4-bit. | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin

wizardLM-7B.ggmlv3.q3_K_L | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 3.55 GB | 6.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q3_K_L.bin

WizardLM-7B-uncensored.ggmlv3.q3_K_L | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 3.55 GB | 6.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q3_K_L.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_L | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 3.55 GB | 6.05 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_L.bin

wizardLM-7B.ggmlv3.q3_K_M | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 3.23 GB | 5.73 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q3_K_M.bin

WizardLM-7B-uncensored.ggmlv3.q3_K_M | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 3.23 GB | 5.73 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q3_K_M.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_M | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 3.23 GB | 5.73 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_M.bin

wizardLM-7B.ggmlv3.q3_K_S | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 2.90 GB | 5.40 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q3_K_S.bin

WizardLM-7B-uncensored.ggmlv3.q3_K_S | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 2.90 GB | 5.40 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q3_K_S.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_S | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 2.90 GB | 5.40 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q3_K_S.bin

wizardLM-7B.ggmlv3.q2_K | 7 | B | 💾 | 7,000,000,000 | wizardLM-7B.ggmlv3.q2_K.bin | q2_K | 2 | 2.80 GB | 5.30 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML | https://huggingface.co/TheBloke/wizardLM-7B-GGML/resolve/main/wizardLM-7B.ggmlv3.q2_K.bin

WizardLM-7B-uncensored.ggmlv3.q2_K | 7 | B | 💾 | 7,000,000,000 | WizardLM-7B-uncensored.ggmlv3.q2_K.bin | q2_K | 2 | 2.80 GB | 5.30 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML | https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q2_K.bin

Wizard-Vicuna-7B-Uncensored.ggmlv3.q2_K | 7 | B | 💾 | 7,000,000,000 | Wizard-Vicuna-7B-Uncensored.ggmlv3.q2_K.bin | q2_K | 2 | 2.80 GB | 5.30 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | huggingface.co | TheBloke | Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML | https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q2_K.bin
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mearman/ml-helpers

Awesome Lists containing this project

README