{"id":13908201,"url":"https://github.com/armbues/SiLLM","last_synced_at":"2025-07-18T07:30:39.856Z","repository":{"id":233032815,"uuid":"742061877","full_name":"armbues/SiLLM","owner":"armbues","description":"SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.","archived":false,"fork":false,"pushed_at":"2025-06-16T08:17:15.000Z","size":633,"stargazers_count":272,"open_issues_count":3,"forks_count":27,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-06-16T09:35:56.950Z","etag":null,"topics":["apple-silicon","dpo","large-language-models","llm","llm-inference","llm-training","lora","mlx"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/armbues.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-01-11T17:24:31.000Z","updated_at":"2025-06-16T08:17:18.000Z","dependencies_parsed_at":"2024-04-15T16:29:17.920Z","dependency_job_id":"d16af70e-9d74-4eaf-93eb-4709efc52de4","html_url":"https://github.com/armbues/SiLLM","commit_stats":null,"previous_names":["armbues/sillm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/armbues/SiLLM","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/armbues%2FSiLLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/armbues%2FSiLLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/armbues%2FSiLLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/armbues%2FSiLLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/armbues","download_url":"https://codeload.github.com/armbues/SiLLM/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/armbues%2FSiLLM/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265548841,"owners_count":23786295,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-silicon","dpo","large-language-models","llm","llm-inference","llm-training","lora","mlx"],"created_at":"2024-08-06T23:02:32.782Z","updated_at":"2025-07-18T07:30:39.847Z","avatar_url":"https://github.com/armbues.png","language":"Python","funding_links":[],"categories":["Python","HarmonyOS","Libraries and Tools","LLM \u0026 Inference"],"sub_categories":["Windows Manager","2024"],"readme":"![sillm](https://github.com/armbues/SiLLM/assets/4117144/859002e9-d209-480b-adb2-7276cd360cbe)\n\n# SiLLM - Silicon LLM Training \u0026 Inference Toolkit\nSiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the [MLX](https://github.com/ml-explore/mlx/) framework. Building upon the foundation provided by [MLX Examples](https://github.com/ml-explore/mlx-examples), this project introduces additional features specifically designed to enhance LLM operations with MLX in a streamlined package.\n\n- **LLM Loading**: load LLMs for chat and training in different formats (Huggingface, Torch, GGUF, MLX)\n- **LoRA Training**: train LLMs using *Low-rank Adaptation*\n- **DPO Training**: train LLMs with *Direct Preference Optimization*\n- **Experimental Features**: speculative decoding, beam search, logit distillation, ...\n\n## Features\n\n- Web app for a seamless chat experience running on local hardware\n- API server with OpenAI compatible chat endpoints\n- Model architecture support for all major model types\n- Conversation templates for all major model types\n- Loss functions for DPO: sigmoid, hinge, IPO, DPOP\n- Training loss plots using matplotlib\n- Perplexity calculation\n\n## Experimental\nOne of the main goals of SiLLM is to enable experimentation with the inner workings of large language models and make new techniques accessible to a wider audience running on Apple Silicon hardware.\n\n- Speculative Decoding\n- Beam search\n- Training using logit distillation\n- Logit filters\n- Control vectors and feature ablation\n\n## Installation\n\nUsing pip:\n``` sh\npip install sillm-mlx\n```\n\n## Usage\n\n### Chat web application\nThe web app uses [Chainlit](https://github.com/Chainlit/chainlit) to provide a frontend for conversational AI running locally on Apple Silicon hardware.\n\nhttps://github.com/armbues/SiLLM/assets/4117144/ab537795-5020-4241-aa89-3b19b9de263b\n\nTo use the web app, clone the repository and start the app using chainlit:\n``` sh\ngit clone https://github.com/armbues/SiLLM.git\ncd SiLLM/app\npip install -r requirements.txt\npython -m chainlit run app.py -w\n```\nSet the environment variables `SILLM_MODEL_DIR` and `SILLM_ADAPTER_DIR` to load local models/adapters.\n\n### Command-line interface (CLI) scripts\nRun the CLI scripts with the argument -h to see a print-out of all available arguments.\n\n#### Chat:\nSimple CLI interface for chatting with an LLM in the terminal.\n``` sh\npython -m sillm.chat /path/to/model\n```\nRunning sillm.chat in the terminal with Gemma-2B-it on a MacBook Air M2 with 16GB memory:\n\nhttps://github.com/armbues/SiLLM/assets/4117144/42e2d0f8-3bd8-44ca-9f78-8c4a885b8939\n\n#### Server:\nRun an API server with basic functionality compatible with OpenAI compatible chat endpoints.\n``` sh\npython -m sillm.server /path/to/model --port 8000\n```\n\n#### LoRA Fine-tuning:\nFine-tune a model with low-rank adaptation (LoRA).\n``` sh\npython -m sillm.lora /path/to/model -d /path/to/dataset -o /output/adapters\n```\n\n#### DPO Fine-tuning:\nFine-tune a model with LoRA and direct preference optimization (DPO).\n``` sh\npython -m sillm.dpo /path/to/model -d /path/to/dataset -o /output/adapters\n```\n\n#### Conversion\nConvert a model while merging adapters or quantizing the weights.\n\nExample of merging an adapter into a model:\n``` sh\npython -m sillm.convert /path/to/input/model /path/to/output/model -a /path/to/adapters\n```\n\n#### Quantization\nQuantize a model serially (without loading it entirely into memory):\n``` sh\npython -m sillm.quantize /path/to/input/model /path/to/output/model --bits 4\n```\n\n### Python\nMinimal example of loading a model with SiLLM and generating a text completion:\n``` python\nimport sillm\n\nmodel = sillm.load(\"/path/to/model\")\nfor s, _ in model.generate(\"On a beautiful Sunday morning,\"):\n    print(s, flush=True, end=\"\")\n```\n\n### Examples\n\nThe repository [SiLLM-examples](https://github.com/armbues/SiLLM-examples) contains Python code examples for using the SiLLM framework for training and running LLMs.\n\n#### LoRA Fine-tuning\nLoRA training [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) with the Nvidia [HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer) dataset.\n\n#### DPO Fine-tuning\nDPO training [Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) with the [DPO Mix 7K](https://huggingface.co/datasets/argilla/dpo-mix-7k) dataset. The training consists of a supervised fine tuning (SFT) followed by direct preference optimization (DPO).\n\n#### MMLU Benchmark\nImplementation of the \"Massive Multitask Language Understanding\" benchmark using the [MMLU](https://huggingface.co/datasets/cais/mmlu) dataset.\n\n#### Perplexity\nCalculating perplexity scores for a sample [dataset](https://huggingface.co/datasets/Cohere/wikipedia-2023-11-embed-multilingual-v3) of entry paragraphs from Wikipedia articles.\n\n## Model Support\nSiLLM generally supports loading LLMs of major open weights model architectures/families, including: *Llama 2/3*, *Mistral*, *Mixtral*, *Gemma*, *Phi*, *Qwen*.\n\n## Roadmap\n\n- Fine tuning with GRPO\n\n## License\nThis project uses the [MIT License](LICENSE).\n\n## Acknowledgments\nBig thanks to the Apple MLX team for implementing and maintaining the [MLX](https://github.com/ml-explore/mlx/) framework that makes it possible to unlock the power of Apple Silicon and run/train LLMs on MacBooks and other Apple devices. Thank you to all the contributors of the [MLX Examples](https://github.com/ml-explore/mlx-examples) project and developers sharing model implementations online.\nLast but not least, thank you to the larger community sharing open weights models, fine tunes, and datasets - without you all the gen AI progress would happen behind locked doors!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farmbues%2FSiLLM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farmbues%2FSiLLM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farmbues%2FSiLLM/lists"}