An open API service indexing awesome lists of open source software.

https://github.com/kyegomez/exa

Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minimal learning curve.
https://github.com/kyegomez/exa

inference-engine llama2 llama2-7b llamacpp llamas llm-inference llms opensource

Last synced: 19 days ago
JSON representation

Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minimal learning curve.

Awesome Lists containing this project

README

        

[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Exa
Boost your GPU's LLM performance by 300% on everyday GPU hardware, as validated by renowned developers, in just 5 minutes of setup and with no additional hardware costs.

-----

## Principles
- Radical Simplicity (Utilizing super-powerful LLMs with as minimal lines of code as possible)
- Ultra-Optimizated Peformance (High Performance code that extract all the power from these LLMs)
- Fludity & Shapelessness (Plug in and play and re-architecture as you please)

---

## 📦 Install 📦
```bash
$ pip3 install exxa
```
-----

## Usage

## 🎉 Features 🎉

- **World-Class Quantization**: Get the most out of your models with top-tier performance and preserved accuracy! 🏋️‍♂️

- **Automated PEFT**: Simplify your workflow! Let our toolkit handle the optimizations. 🛠️

- **LoRA Configuration**: Dive into the potential of flexible LoRA configurations, a game-changer for performance! 🌌

- **Seamless Integration**: Designed to work seamlessly with popular models like LLAMA, Falcon, and more! 🤖

----

## 💌 Feedback & Contributions 💌

We're excited about the journey ahead and would love to have you with us! For feedback, suggestions, or contributions, feel free to open an issue or a pull request. Let's shape the future of fine-tuning together! 🌱

[Check out our project board for our current backlog and features we're implementing](https://github.com/users/kyegomez/projects/8/views/2)

# License
MIT

# Todo

- Setup utils logger classes for metric logging with useful metadata such as token inference per second, latency, memory consumption
- Add cuda c++ extensions for radically optimized classes for high performance quantization + inference on the edge