Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kreasof-ai/homunculus-project

Long term project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.
https://github.com/kreasof-ai/homunculus-project

bitnet deep-learning flash-attention jupyter-notebook large-language-models low-rank-adaptation machine-learning python pytorch pytorch-lightning transformer vision-transformer

Last synced: about 1 month ago
JSON representation

Long term project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.

Awesome Lists containing this project

README

        

![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)
![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=for-the-badge&logo=PyTorch&logoColor=white)
![Jupyter Notebook](https://img.shields.io/badge/jupyter-%23FA0F00.svg?style=for-the-badge&logo=jupyter&logoColor=white)

[![Follow me on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/follow-me-on-HF-md.svg)](https://huggingface.co/ChavyvAkvar)

# Homunculus Project - Experimental Custom Transformer Architecture
By [Habibullah Akbar](https://chavyv.vercel.app).

Key features:
- Seamless integration with vision encoder. Along with selective RoPE for each image and text embedding sequence.
- Internal iteration, making deeper abstraction while keeping the same parameter count.
- GeGLU activation function, inspired by [Gemma 2 models](https://blog.google/technology/developers/google-gemma-2/).
- Custom KV-caching, making sure each internal iteration has an independent KV-cache.
- BPE tokenizer based on KBBI.
- Grouped Query Attention.
- PyTorch Lightning implementation.
- DeepSpeed and ZeRO-3 integration. Automatically offload the memory overflow into CPU and NVMe.
- Finetuning scripts example with LoRA adapters, with and without quantization.
- Add BitNet implementation.
- Flash Attention implementation.
- Speech encoder.
- 2D and 3D RoPE.
- Diffusion Transformer for image detokenization.
- Influential token extraction from attention heatmap.
- Jupyter notebook example, both for training and finetuning.
- Dual license open-source for individuals, paid for commercial uses.

![Internal latent loop (9)](https://github.com/user-attachments/assets/fe74e8b8-2f74-4b20-9f36-6f61c6946f2a)

The iterable Transformer model, where the model can *rethink* its internal cognitive process with an internal confidence score as a guide. Akin of slow thinking mechanism.
So this is the simple explanation of how it works:
- We put an adjustable parameter to handle internal looping, the default value is 1.
- If the loss value is high, this iteration is triggered, with max iterations set to 10.
- We train an independent layer to output a confidence score, trained by loss value from the main training process.
- When inference, both the next token and confidence scores are outputted and can determine how many iterations are needed for the current inference.

YouTube progress documentation playlist:
- First short brief (27 July 2024): [https://youtu.be/NjK1BJyhrlI](https://youtu.be/NjK1BJyhrlI)

Soon:
- Short-term memory injection.
- [SageAttention](https://github.com/thu-ml/SageAttention) implementation.
- Speech generation integration.
- [Discrete Latent Representation](https://arxiv.org/abs/2312.01203)."
- [Grokfast](https://arxiv.org/abs/2405.20233)
- Mamba2 block (?).
- Kolmogorov Arnold Network (KAN).
- Mixture of Experts block.
- Fast object detection integration, possibly YOLO or RT-DETR.
- OCR model integration.
- [MIinference](https://github.com/microsoft/MInference).
- Pre-train model integration, possibly Gemma 2 since it uses the same activation function.
- Citation to all of the papers used as references or inspirations.

> UPDATE LICENSE:
***This software is dual-licensed under the terms of the GNU Affero General Public License (AGPL) and a commercial license. For commercial use, please contact Habibullah Akbar at akbar2habibullah.gmail to obtain a commercial license. Commercial use is defined as any use of the software for financial gain, including but not limited to, selling, licensing, or distributing the software as part of a product or service.***