{"id":49332620,"url":"https://github.com/countzero/windows_manage_large_language_models","last_synced_at":"2026-04-26T23:03:35.718Z","repository":{"id":209639843,"uuid":"724510368","full_name":"countzero/windows_manage_large_language_models","owner":"countzero","description":"PowerShell automation to download large language models (LLMs) from Git repositories and quantize them with llama.cpp into the GGUF format.","archived":false,"fork":false,"pushed_at":"2026-03-17T05:49:23.000Z","size":55,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-17T20:31:42.707Z","etag":null,"topics":["gguf","git","large-language-models","lfs","llama-cpp","powershell","quantization","windows"],"latest_commit_sha":null,"homepage":"","language":"PowerShell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/countzero.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-11-28T08:21:07.000Z","updated_at":"2026-03-17T05:49:27.000Z","dependencies_parsed_at":"2023-12-15T22:27:33.379Z","dependency_job_id":"49322d35-0582-4f1f-b64e-a51ae6cb0a25","html_url":"https://github.com/countzero/windows_manage_large_language_models","commit_stats":null,"previous_names":["countzero/windows_manage_large_language_models"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/countzero/windows_manage_large_language_models","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/countzero%2Fwindows_manage_large_language_models","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/countzero%2Fwindows_manage_large_language_models/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/countzero%2Fwindows_manage_large_language_models/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/countzero%2Fwindows_manage_large_language_models/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/countzero","download_url":"https://codeload.github.com/countzero/windows_manage_large_language_models/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/countzero%2Fwindows_manage_large_language_models/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32315712,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T21:09:39.134Z","status":"ssl_error","status_checked_at":"2026-04-26T21:09:21.240Z","response_time":129,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gguf","git","large-language-models","lfs","llama-cpp","powershell","quantization","windows"],"created_at":"2026-04-26T23:03:35.153Z","updated_at":"2026-04-26T23:03:35.711Z","avatar_url":"https://github.com/countzero.png","language":"PowerShell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Windows Manage Large Language Models\n\nPowerShell automation to download large language models (LLMs) via Git and quantize them with llama.cpp to the `GGUF` format.\n\nThink batch quantization like https://huggingface.co/TheBloke does it, but on your local machine :wink:\n\n## Features\n\n- Easy configuration via one `.env` file\n- Automates the synchronization of Git repositories containing large files (LFS)\n- Only fetches one LFS object at a time\n- Displays a progress indicator on downloading LFS objects\n- Automates the quantization from the source models\n- Handles the intermediate files during quantization to reduce disk usage\n- Improves quantization speed by separating read from write loads\n\n## Installation\n\n### Prerequisites\n\nUse https://github.com/countzero/windows_llama.cpp to compile a specific version of the [llama.cpp](https://github.com/ggerganov/llama.cpp) project on your machine. This also makes training data available.\n\n\n### Clone the repository from GitHub\n\nClone the repository to a nice place on your machine via:\n\n```PowerShell\ngit clone git@github.com:countzero/windows_manage_large_language_models.git\n```\n\n### Create a .env file\n\nCreate the following `.env` file in the project directory. Make sure to change the `LLAMA_CPP_DIRECTORY` value.\n\n```Env\n# Path to the llama.cpp project that contains the\n# required conversion and quantization programs.\nLLAMA_CPP_DIRECTORY=C:\\windows_llama.cpp\\vendor\\llama.cpp\n\n# Path to the training data for computing the importance matrix.\nTRAINING_DATA=C:\\windows_llama.cpp\\vendor\\bartowski1182\\calibration_datav5.txt\n\n# This can be used to significantly reduce the time to compute the\n# importance matrix without increasing the final perplexity.\n# We are using 20 chunks (~10k tokens) from the training data.\n# @see https://github.com/ggerganov/llama.cpp/discussions/5263\nTRAINING_DATA_CHUNKS=20\n\n# Path to the Git repositories containing the models.\nSOURCE_DIRECTORY=.\\source\n\n# Path to the quantized models in GGUF format.\nTARGET_DIRECTORY=.\\gguf\n\n# Path to the cache directory for intermediate files.\n#\n# Hint: Ideally this should be located on a different\n# physical drive to improve the quantization speed.\nCACHE_DIRECTORY=.\\cache\n\n# Path to the directory for importance matrix files.\nIMPORTANCE_MATRIX_DIRECTORY=.\\imatrix\n\n#\n# Comma separated list of multimodal projector types.\n#\n# For models with vision capability a \"mmproj\" file will be\n# generated and placed next to the quantized models.\n#\n# Common types for the mmproj files:\n#\n#     F32  : Use float32 for older hardware\n#     BF16 : Use bfloat16 for current hardware (recommended)\n#     F16  : Use float16 for older hardware under VRAM constraints\n#\nMULTIMODAL_PROJECTOR_TYPES=BF16\n\n#\n# Comma separated list of quantization types.\n#\n# Possible llama.cpp quantization types:\n#\n#      2  or  Q4_0    :  4.34G, +0.4685 ppl @ Llama-3-8B\n#      3  or  Q4_1    :  4.78G, +0.4511 ppl @ Llama-3-8B\n#      8  or  Q5_0    :  5.21G, +0.1316 ppl @ Llama-3-8B\n#      9  or  Q5_1    :  5.65G, +0.1062 ppl @ Llama-3-8B\n#     19  or  IQ2_XXS :  2.06 bpw quantization\n#     20  or  IQ2_XS  :  2.31 bpw quantization\n#     28  or  IQ2_S   :  2.5  bpw quantization\n#     29  or  IQ2_M   :  2.7  bpw quantization\n#     24  or  IQ1_S   :  1.56 bpw quantization\n#     31  or  IQ1_M   :  1.75 bpw quantization\n#     36  or  TQ1_0   :  1.69 bpw ternarization\n#     37  or  TQ2_0   :  2.06 bpw ternarization\n#     10  or  Q2_K    :  2.96G, +3.5199 ppl @ Llama-3-8B\n#     21  or  Q2_K_S  :  2.96G, +3.1836 ppl @ Llama-3-8B\n#     23  or  IQ3_XXS :  3.06 bpw quantization\n#     26  or  IQ3_S   :  3.44 bpw quantization\n#     27  or  IQ3_M   :  3.66 bpw quantization mix\n#     12  or  Q3_K    : alias for Q3_K_M\n#     22  or  IQ3_XS  :  3.3 bpw quantization\n#     11  or  Q3_K_S  :  3.41G, +1.6321 ppl @ Llama-3-8B\n#     12  or  Q3_K_M  :  3.74G, +0.6569 ppl @ Llama-3-8B\n#     13  or  Q3_K_L  :  4.03G, +0.5562 ppl @ Llama-3-8B\n#     25  or  IQ4_NL  :  4.50 bpw non-linear quantization\n#     30  or  IQ4_XS  :  4.25 bpw non-linear quantization\n#     15  or  Q4_K    : alias for Q4_K_M\n#     14  or  Q4_K_S  :  4.37G, +0.2689 ppl @ Llama-3-8B\n#     15  or  Q4_K_M  :  4.58G, +0.1754 ppl @ Llama-3-8B\n#     17  or  Q5_K    : alias for Q5_K_M\n#     16  or  Q5_K_S  :  5.21G, +0.1049 ppl @ Llama-3-8B\n#     17  or  Q5_K_M  :  5.33G, +0.0569 ppl @ Llama-3-8B\n#     18  or  Q6_K    :  6.14G, +0.0217 ppl @ Llama-3-8B\n#      7  or  Q8_0    :  7.96G, +0.0026 ppl @ Llama-3-8B\n#      1  or  F16     : 14.00G, +0.0020 ppl @ Mistral-7B\n#     32  or  BF16    : 14.00G, -0.0050 ppl @ Mistral-7B\n#      0  or  F32     : 26.00G              @ 7B\n#             COPY    : only copy tensors, no quantizing\n#\n# Hint: A very good quantization with minimal quality loss is\n# Q5_K_M. Quantization below 4-bit causes measurable quality\n# loss, try to avoid going too low and use IQ4_XS as a minimum.\n# @see https://github.com/ggerganov/llama.cpp/tree/master/examples/perplexity\n#\nQUANTIZATION_TYPES=Q5_K_M,IQ4_XS\n```\n\n\u003e [!NOTE]\n\u003e All i-quants (`IQ*`) and the small k-quants (`Q2_K` and `Q2_K_S`) require an [importance matrix](https://github.com/ggerganov/llama.cpp/tree/master/examples/imatrix). Since an importance matrix is also improving the quality of larger quantization types this script will always automatically compute it for each model and use it for the quantization.\n\n## Usage\n\n### 1. Clone a model\n\nClone a Git repository containing an LLM into the `SOURCE_DIRECTORY` without checking out any files and downloading any large files (lfs).\n\n```PowerShell\ngit -C \"./source\" clone --no-checkout https://huggingface.co/openchat/openchat-3.6-8b-20240522\n```\n\n### 2. Download model sources\n\nDownload all files across all Git repositories that are inside the `SOURCE_DIRECTORY`.\n\n```PowerShell\n./download_model_sources.ps1\n```\n\n**Hint:** This can also be used to update already existing sources from the remote repositories.\n\n### 3. Quantize model weights\n\nQuantize all model weights that are inside the `SOURCE_DIRECTORY` into the `TARGET_DIRECTORY` to create a specific `GGUF` file for each `QUANTIZATION_TYPES`.\n\n```PowerShell\n./quantize_weights_for_llama.cpp.ps1\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcountzero%2Fwindows_manage_large_language_models","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcountzero%2Fwindows_manage_large_language_models","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcountzero%2Fwindows_manage_large_language_models/lists"}