Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kyegomez/kosmosg
My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"
https://github.com/kyegomez/kosmosg
attention-is-all-you-need attention-mechanism attention-mechanisms computer-vision multimodal multimodal-learning
Last synced: about 1 month ago
JSON representation
My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"
- Host: GitHub
- URL: https://github.com/kyegomez/kosmosg
- Owner: kyegomez
- License: mit
- Created: 2023-10-08T02:10:42.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-07T03:32:43.000Z (3 months ago)
- Last Synced: 2024-11-01T20:20:12.030Z (about 2 months ago)
- Topics: attention-is-all-you-need, attention-mechanism, attention-mechanisms, computer-vision, multimodal, multimodal-learning
- Language: Python
- Homepage: https://discord.gg/qUtxnK2NMf
- Size: 2.79 MB
- Stars: 14
- Watchers: 3
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: docs/contributing.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)
# KosmosG
My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"## Installation
`pip install kosmosg`## Usage
```python
import torch
from kosmosg.main import KosmosG# usage
img = torch.randn(1, 3, 256, 256)
text = torch.randint(0, 20000, (1, 1024))model = KosmosG()
output = model(img, text)
print(output)
```## Architecture
`text, image => KosmosG => text tokens with multi modality understanding`## License
MIT## Todo
- Create Aligner in pytorch
- Create Diffusion module
- Integrate these pieces
- Create a training script