Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/mlc-ai/mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
https://github.com/mlc-ai/mlc-llm

language-model llm machine-learning-compilation tvm

Last synced: about 1 month ago
JSON representation

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Host: GitHub
URL: https://github.com/mlc-ai/mlc-llm
Owner: mlc-ai
License: apache-2.0
Created: 2023-04-29T01:59:25.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-04-13T15:08:13.000Z (2 months ago)
Last Synced: 2024-04-13T17:35:12.643Z (2 months ago)
Topics: language-model, llm, machine-learning-compilation, tvm
Language: Python
Homepage: https://llm.mlc.ai/docs
Size: 26.2 MB
Stars: 16,620
Watchers: 160
Forks: 1,275
Open Issues: 200
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

awesome-stars - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (Python)
awesome-production-machine-learning - MLC LLM - ai/mlc-llm.svg?style=social) - MLC LLM is a universal solution that allows any language models to be deployed natively on a diverse set of hardware backends and native applications, plus a productive framework for everyone to further optimize model performance for their own use cases. (Industry Strength NLP)
awesome-edge-computing - MLC LLM
awesome-stars - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (Python)
awesome - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (Python)
awesome-stars - mlc-ai/mlc-llm
awesome-llamas - mlc-llm - Running LLaMA2 on iOS devices natively using GPU acceleration, see [example](https://twitter.com/bohanhou1998/status/1681682445937295360) (Libraries)
awesome-stars - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (llm)
awesome-stars - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (llm)
awesome-stars-coconut - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (Python)
awesome-ai - MLC LLM - ai/mlc-llm?style=social) - 代表了一种新的思路，serverless，允许在手机、电脑等终端上直接运行 LLM (LLM)
awesome-stars - mlc-ai/mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. (Python)
my-awesome - mlc-ai/mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. (Python)
awesome-stars - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (Python)
awesome - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (Python)
awesome-stars - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (Python)
awesome-stars - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (Python)
awesome-stars - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (Python)
AiTreasureBox - mlc-ai/mlc-llm - 06-12_17504_1](https://img.shields.io/github/stars/mlc-ai/mlc-llm.svg) |Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.| (Repos)
awesome-starts - mlc-ai/mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. (Python)
awesome-from-stars - mlc-ai/mlc-llm
my-awesome-stars - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (Python)
awesome-llm-and-aigc - MLC LLM - ai/mlc-llm?style=social"/> : Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. [mlc.ai/mlc-llm](https://mlc.ai/mlc-llm/) (Summary)
awesome-stars - mlc-llm - ai | 17456 | (Python)
awesome-stars - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (llm)
awesome-stars - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (Python)
awesome-local-llms - mlc-llm
awesome-stars - mlc-ai/mlc-llm - `★17584` Universal LLM Deployment Engine with ML Compilation (Python)
awesome-stars - mlc-ai/mlc-llm - `★17166` Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. (Python)
Awesome-LLM-Productization - mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. (Models and Tools / LLM Deployment)
awesome-cuda-tensorrt-fpga - MLC LLM - ai/mlc-llm?style=social"/> : Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. [mlc.ai/mlc-llm](https://mlc.ai/mlc-llm/) (Frameworks)
awesome-stars - mlc-llm - ai | 17556 | (Python)
awesome-stars - mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation (Python)
StarryDivineSky - mlc-ai/mlc-llm

README

        [discord-url]: https://discord.gg/9Xpy2HGBuD

# MLC LLM

[Documentation](https://llm.mlc.ai/docs) | [Blog](https://blog.mlc.ai/) | [Discord][discord-url]

**M**achine **L**earning **C**ompilation for **L**arge **L**anguage **M**odels (MLC LLM) is a high-performance universal deployment solution that allows native deployment of any large language models with native APIs with compiler acceleration. The mission of this project is to enable everyone to develop, optimize and deploy AI models natively on everyone's devices with ML compilation techniques.

**Universal deployment.** MLC LLM supports the following platforms and hardware:

  

    

       

      AMD GPU

      NVIDIA GPU

      Apple GPU

      Intel GPU

    

  

  

    

      Linux / Win

      ✅ Vulkan, ROCm

      ✅ Vulkan, CUDA

      N/A

      ✅ Vulkan

    

    

      macOS

      ✅ Metal (dGPU)

      N/A

      ✅ Metal

      ✅ Metal (iGPU)

    

    

      Web Browser

      ✅ WebGPU and WASM 

    

    

      iOS / iPadOS

      ✅ Metal on Apple A-series GPU

    

    

      Android

      ✅ OpenCL on Adreno GPU

      ✅ OpenCL on Mali GPU

    

  

## Quick Start

We introduce the quick start examples of chat CLI, Python API and REST server here to use MLC LLM.

We use 4-bit quantized 8B Llama-3 model for demonstration purpose.

The pre-quantized Llama-3 weights is available at https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC.

You can also try out unquantized Llama-3 model by replacing `q4f16_1` to `q0f16` in the examples below.

Please visit our [documentation](https://llm.mlc.ai/docs/index.html) for detailed quick start and introduction.

### Installation

MLC LLM is available via [pip](https://llm.mlc.ai/docs/install/mlc_llm.html#install-mlc-packages).

It is always recommended to install it in an isolated conda virtual environment.

To verify the installation, activate your virtual environment, run

```bash

python -c "import mlc_llm; print(mlc_llm.__path__)"

```

You are expected to see the installation path of MLC LLM Python package.

### Chat CLI

We can try out the chat CLI in MLC LLM with 4-bit quantized 8B Llama-3 model.

```bash

mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC

```

It may take 1-2 minutes for the first time running this command.

After waiting, this command launch a chat interface where you can enter your prompt and chat with the model.

```

You can use the following special commands:

/help               print the special commands

/exit               quit the cli

/stats              print out the latest stats (token/sec)

/reset              restart a fresh chat

/set [overrides]    override settings in the generation config. For example,

                      `/set temperature=0.5;max_gen_len=100;stop=end,stop`

                      Note: Separate stop words in the `stop` option with commas (,).

Multi-line input: Use escape+enter to start a new line.

user: What's the meaning of life

assistant:

What a profound and intriguing question! While there's no one definitive answer, I'd be happy to help you explore some perspectives on the meaning of life.

The concept of the meaning of life has been debated and...

```

### Python API

We can run the Llama-3 model with the chat completion Python API of MLC LLM.

You can save the code below into a Python file and run it.

```python

from mlc_llm import MLCEngine

# Create engine

model = "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC"

engine = MLCEngine(model)

# Run chat completion in OpenAI API.

for response in engine.chat.completions.create(

    messages=[{"role": "user", "content": "What is the meaning of life?"}],

    model=model,

    stream=True,

):

    for choice in response.choices:

        print(choice.delta.content, end="", flush=True)

print("\n")

engine.terminate()

```

**The Python API of `mlc_llm.MLCEngine` fully aligns with OpenAI API**.

You can use MLCEngine in the same way of using

[OpenAI's Python package](https://github.com/openai/openai-python?tab=readme-ov-file#usage)

for both synchronous and asynchronous generation.

If you would like to do concurrent asynchronous generation, you can use `mlc_llm.AsyncMLCEngine` instead.

### REST Server

We can launch a REST server to serve the 4-bit quantized Llama-3 model for OpenAI chat completion requests.

The server has fully OpenAI API completeness.

```bash

mlc_llm serve HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC

```

The server is hooked at `http://127.0.0.1:8000` by default, and you can use `--host` and `--port`

to set a different host and port.

When the server is ready (showing `INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)`),

we can open a new shell and send a cURL request via the following command:

```bash

curl -X POST \

  -H "Content-Type: application/json" \

  -d '{

        "model": "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC",

        "messages": [

            {"role": "user", "content": "Hello! Our project is MLC LLM. What is the name of our project?"}

        ]

  }' \

  http://127.0.0.1:8000/v1/chat/completions

```

## Universal Deployment APIs

MLC LLM provides multiple sets of APIs across platforms and environments. These include

* [Python API](https://llm.mlc.ai/docs/deploy/python_engine.html)

* [OpenAI-compatible Rest-API](https://llm.mlc.ai/docs/deploy/rest.html)

* [C++ API](https://llm.mlc.ai/docs/deploy/cli.html)

* [JavaScript API](https://llm.mlc.ai/docs/deploy/javascript.html) and [Web LLM](https://github.com/mlc-ai/web-llm)

* [Swift API for iOS App](https://llm.mlc.ai/docs/deploy/ios.html)

* [Java API and Android App](https://llm.mlc.ai/docs/deploy/android.html)

## Citation

Please consider citing our project if you find it useful:

```bibtex

@software{mlc-llm,

    author = {MLC team},

    title = {{MLC-LLM}},

    url = {https://github.com/mlc-ai/mlc-llm},

    year = {2023}

}

```

The underlying techniques of MLC LLM include:

  References (Click to expand)

  ```bibtex

  @inproceedings{tensorir,

      author = {Feng, Siyuan and Hou, Bohan and Jin, Hongyi and Lin, Wuwei and Shao, Junru and Lai, Ruihang and Ye, Zihao and Zheng, Lianmin and Yu, Cody Hao and Yu, Yong and Chen, Tianqi},

      title = {TensorIR: An Abstraction for Automatic Tensorized Program Optimization},

      year = {2023},

      isbn = {9781450399166},

      publisher = {Association for Computing Machinery},

      address = {New York, NY, USA},

      url = {https://doi.org/10.1145/3575693.3576933},

      doi = {10.1145/3575693.3576933},

      booktitle = {Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},

      pages = {804–817},

      numpages = {14},

      keywords = {Tensor Computation, Machine Learning Compiler, Deep Neural Network},

      location = {Vancouver, BC, Canada},

      series = {ASPLOS 2023}

  }

  @inproceedings{metaschedule,

      author = {Shao, Junru and Zhou, Xiyou and Feng, Siyuan and Hou, Bohan and Lai, Ruihang and Jin, Hongyi and Lin, Wuwei and Masuda, Masahiro and Yu, Cody Hao and Chen, Tianqi},

      booktitle = {Advances in Neural Information Processing Systems},

      editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},

      pages = {35783--35796},

      publisher = {Curran Associates, Inc.},

      title = {Tensor Program Optimization with Probabilistic Programs},

      url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/e894eafae43e68b4c8dfdacf742bcbf3-Paper-Conference.pdf},

      volume = {35},

      year = {2022}

  }

  @inproceedings{tvm,

      author = {Tianqi Chen and Thierry Moreau and Ziheng Jiang and Lianmin Zheng and Eddie Yan and Haichen Shen and Meghan Cowan and Leyuan Wang and Yuwei Hu and Luis Ceze and Carlos Guestrin and Arvind Krishnamurthy},

      title = {{TVM}: An Automated {End-to-End} Optimizing Compiler for Deep Learning},

      booktitle = {13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)},

      year = {2018},

      isbn = {978-1-939133-08-3},

      address = {Carlsbad, CA},

      pages = {578--594},

      url = {https://www.usenix.org/conference/osdi18/presentation/chen},

      publisher = {USENIX Association},

      month = oct,

  }

  ```

## Links

- You might want to check out our online public [Machine Learning Compilation course](https://mlc.ai) for a systematic

walkthrough of our approaches.

- [WebLLM](https://webllm.mlc.ai/) is a companion project using MLC LLM's WebGPU and WebAssembly backend.

- [WebStableDiffusion](https://websd.mlc.ai/) is a companion project for diffusion models with the WebGPU backend.