https://github.com/kyegomez/exa
Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minimal learning curve.
https://github.com/kyegomez/exa
inference-engine llama2 llama2-7b llamacpp llamas llm-inference llms opensource
Last synced: 19 days ago
JSON representation
Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minimal learning curve.
- Host: GitHub
- URL: https://github.com/kyegomez/exa
- Owner: kyegomez
- License: mit
- Created: 2023-09-05T13:16:25.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-11T09:03:27.000Z (7 months ago)
- Last Synced: 2025-05-11T01:01:51.603Z (21 days ago)
- Topics: inference-engine, llama2, llama2-7b, llamacpp, llamas, llm-inference, llms, opensource
- Language: Python
- Homepage: https://exa.apac.ai
- Size: 2.44 MB
- Stars: 26
- Watchers: 2
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://discord.gg/qUtxnK2NMf)
# Exa
Boost your GPU's LLM performance by 300% on everyday GPU hardware, as validated by renowned developers, in just 5 minutes of setup and with no additional hardware costs.-----
## Principles
- Radical Simplicity (Utilizing super-powerful LLMs with as minimal lines of code as possible)
- Ultra-Optimizated Peformance (High Performance code that extract all the power from these LLMs)
- Fludity & Shapelessness (Plug in and play and re-architecture as you please)---
## 📦 Install 📦
```bash
$ pip3 install exxa
```
-----## Usage
## 🎉 Features 🎉
- **World-Class Quantization**: Get the most out of your models with top-tier performance and preserved accuracy! 🏋️♂️
- **Automated PEFT**: Simplify your workflow! Let our toolkit handle the optimizations. 🛠️- **LoRA Configuration**: Dive into the potential of flexible LoRA configurations, a game-changer for performance! 🌌
- **Seamless Integration**: Designed to work seamlessly with popular models like LLAMA, Falcon, and more! 🤖
----
## 💌 Feedback & Contributions 💌
We're excited about the journey ahead and would love to have you with us! For feedback, suggestions, or contributions, feel free to open an issue or a pull request. Let's shape the future of fine-tuning together! 🌱
[Check out our project board for our current backlog and features we're implementing](https://github.com/users/kyegomez/projects/8/views/2)
# License
MIT# Todo
- Setup utils logger classes for metric logging with useful metadata such as token inference per second, latency, memory consumption
- Add cuda c++ extensions for radically optimized classes for high performance quantization + inference on the edge