{"id":23642213,"url":"https://github.com/agora-lab-ai/omegavit","last_synced_at":"2025-08-31T18:32:47.891Z","repository":{"id":268830009,"uuid":"905595387","full_name":"Agora-Lab-AI/OmegaViT","owner":"Agora-Lab-AI","description":"OmegaViT (ΩViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space modeling, and mixture of experts to achieve superior performance across various computer vision tasks. ","archived":false,"fork":false,"pushed_at":"2024-12-19T06:39:25.000Z","size":0,"stargazers_count":1,"open_issues_count":4,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-19T06:46:54.281Z","etag":null,"topics":["agora","agoralab","ai","ml","open-ai","ssm","transformer","vit"],"latest_commit_sha":null,"homepage":"https://agoralab.xyz","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Agora-Lab-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["kyegomez"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2024-12-19T06:33:14.000Z","updated_at":"2024-12-19T06:39:29.000Z","dependencies_parsed_at":"2024-12-19T06:46:58.382Z","dependency_job_id":"862235f5-5bdb-4789-a2b0-a6bec95d3130","html_url":"https://github.com/Agora-Lab-AI/OmegaViT","commit_stats":null,"previous_names":["agora-lab-ai/omegavit"],"tags_count":0,"template":false,"template_full_name":"kyegomez/Python-Package-Template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Agora-Lab-AI%2FOmegaViT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Agora-Lab-AI%2FOmegaViT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Agora-Lab-AI%2FOmegaViT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Agora-Lab-AI%2FOmegaViT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Agora-Lab-AI","download_url":"https://codeload.github.com/Agora-Lab-AI/OmegaViT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":231615478,"owners_count":18400983,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agora","agoralab","ai","ml","open-ai","ssm","transformer","vit"],"created_at":"2024-12-28T10:48:36.377Z","updated_at":"2024-12-28T10:48:36.885Z","avatar_url":"https://github.com/Agora-Lab-AI.png","language":"Python","funding_links":["https://github.com/sponsors/kyegomez"],"categories":[],"sub_categories":[],"readme":"# OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts\n\n[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge\u0026logo=discord\u0026logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge\u0026logo=youtube\u0026logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge\u0026logo=linkedin\u0026logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge\u0026logo=x\u0026logoColor=white)](https://x.com/kyegomezb)\n\n\n\n\n[![PyPI version](https://badge.fury.io/py/omegavit.svg)](https://badge.fury.io/py/omegavit)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Build Status](https://github.com/Agora-Lab-AI/OmegaViT/workflows/build/badge.svg)](https://github.com/Agora-Lab-AI/OmegaViT/actions)\n[![Documentation Status](https://readthedocs.org/projects/omegavit/badge/?version=latest)](https://omegavit.readthedocs.io/en/latest/?badge=latest)\n\nOmegaViT (ΩViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space modeling, and mixture of experts to achieve superior performance across various computer vision tasks. The model can process images of any resolution while maintaining computational efficiency.\n\n## Key Features\n\n- **Flexible Resolution Processing**: Handles arbitrary input image sizes through adaptive patch embedding\n- **Multi-Query Attention (MQA)**: Reduces computational complexity while maintaining model expressiveness\n- **Rotary Embeddings**: Enables better modeling of relative positions and spatial relationships\n- **State Space Models (SSM)**: Integrates efficient sequence modeling every third layer\n- **Mixture of Experts (MoE)**: Implements conditional computation for enhanced model capacity\n- **Comprehensive Logging**: Built-in loguru integration for detailed execution tracking\n- **Shape-Aware Design**: Continuous tensor shape tracking for reliable processing\n\n## Architecture\n\n```mermaid\nflowchart TB\n    subgraph Input\n        img[Input Image]\n    end\n    \n    subgraph PatchEmbed[Flexible Patch Embedding]\n        conv[Convolution]\n        norm1[LayerNorm]\n        conv --\u003e norm1\n    end\n    \n    subgraph TransformerBlocks[Transformer Blocks x12]\n        subgraph Block1[Block n]\n            direction TB\n            mqa[Multi-Query Attention]\n            ln1[LayerNorm]\n            moe1[Mixture of Experts]\n            ln2[LayerNorm]\n            ln1 --\u003e mqa --\u003e ln2 --\u003e moe1\n        end\n        \n        subgraph Block2[Block n+1]\n            direction TB\n            mqa2[Multi-Query Attention]\n            ln3[LayerNorm]\n            moe2[Mixture of Experts]\n            ln4[LayerNorm]\n            ln3 --\u003e mqa2 --\u003e ln4 --\u003e moe2\n        end\n        \n        subgraph Block3[Block n+2 SSM]\n            direction TB\n            ssm[State Space Model]\n            ln5[LayerNorm]\n            moe3[Mixture of Experts]\n            ln6[LayerNorm]\n            ln5 --\u003e ssm --\u003e ln6 --\u003e moe3\n        end\n    end\n    \n    subgraph Output\n        gap[Global Average Pooling]\n        classifier[Classification Head]\n    end\n    \n    img --\u003e PatchEmbed --\u003e TransformerBlocks --\u003e gap --\u003e classifier\n```\n\n## Multi-Query Attention Detail\n\n```mermaid\nflowchart LR\n    input[Input Features]\n    \n    subgraph MQA[Multi-Query Attention]\n        direction TB\n        q[Q Linear]\n        k[K Linear]\n        v[V Linear]\n        rotary[Rotary Embeddings]\n        attn[Attention Weights]\n        \n        input --\u003e q \u0026 k \u0026 v\n        q \u0026 k --\u003e rotary\n        rotary --\u003e attn\n        attn --\u003e v\n    end\n    \n    MQA --\u003e output[Output Features]\n\n```\n\n## Installation\n\n```bash\npip install omegavit\n```\n\n## Quick Start\n\n```python\nimport sys\nfrom omegavit.main import create_advanced_vit, train_step\nimport torch\nfrom loguru import logger\n\ndef main():\n    \"\"\"Main training function.\"\"\"\n    logger.info(\"Starting training setup\")\n\n    # Setup\n    device = torch.device(\n        \"cuda\" if torch.cuda.is_available() else \"cpu\"\n    )\n    model = create_advanced_vit().to(device)\n    optimizer = torch.optim.AdamW(\n        model.parameters(), lr=1e-4, weight_decay=0.05\n    )\n\n    # Example input for testing\n    batch_size = 8\n    example_input = torch.randn(batch_size, 3, 224, 224).to(device)\n    example_labels = torch.randint(0, 1000, (batch_size,)).to(device)\n\n    logger.info(\"Running forward pass with example input\")\n    output = model(example_input)\n    logger.info(f\"Output shape: {output.shape}\")\n\n    # Example training step\n    loss = train_step(\n        model, optimizer, (example_input, example_labels), device\n    )\n    logger.info(f\"Example training step loss: {loss:.4f}\")\n\n\nif __name__ == \"__main__\":\n    # Configure logger\n    logger.remove()\n    logger.add(\n        \"advanced_vit.log\",\n        rotation=\"500 MB\",\n        level=\"DEBUG\",\n        format=\"{time:YYYY-MM-DD HH:mm:ss} | {level} | {message}\",\n    )\n    logger.add(sys.stdout, level=\"INFO\")\n\n    main()\n\n```\n\n## Model Configurations\n\n| Parameter | Default | Description |\n|-----------|---------|-------------|\n| hidden_size | 768 | Dimension of transformer layers |\n| num_attention_heads | 12 | Number of attention heads |\n| num_experts | 8 | Number of expert networks in MoE |\n| expert_capacity | 32 | Tokens per expert in MoE |\n| num_layers | 12 | Number of transformer blocks |\n| patch_size | 16 | Size of image patches |\n| ssm_state_size | 16 | Hidden state size in SSM |\n\n## Performance\n\n*Note: Benchmarks coming soon*\n\n## Citation\n\nIf you use OmegaViT in your research, please cite:\n\n```bibtex\n@article{omegavit2024,\n  title={OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts},\n  author={Agora Lab},\n  journal={arXiv preprint arXiv:XXXX.XXXXX},\n  year={2024}\n}\n```\n\n## Contributing\n\nWe welcome contributions! Please see our [contributing guidelines](CONTRIBUTING.md) for details.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\nSpecial thanks to the Agora Lab AI team and the open-source community for their valuable contributions and feedback.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagora-lab-ai%2Fomegavit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagora-lab-ai%2Fomegavit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagora-lab-ai%2Fomegavit/lists"}