Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kyegomez/kosmosg

My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"
https://github.com/kyegomez/kosmosg

attention-is-all-you-need attention-mechanism attention-mechanisms computer-vision multimodal multimodal-learning

Last synced: about 1 month ago
JSON representation

My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"

Host: GitHub
URL: https://github.com/kyegomez/kosmosg
Owner: kyegomez
License: mit
Created: 2023-10-08T02:10:42.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-10-07T03:32:43.000Z (3 months ago)
Last Synced: 2024-11-01T20:20:12.030Z (about 2 months ago)
Topics: attention-is-all-you-need, attention-mechanism, attention-mechanisms, computer-vision, multimodal, multimodal-learning
Language: Python
Homepage: https://discord.gg/qUtxnK2NMf
Size: 2.79 MB
Stars: 14
Watchers: 3
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- Contributing: docs/contributing.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

        [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# KosmosG

My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"

## Installation

`pip install kosmosg`

## Usage

```python

import torch

from kosmosg.main import KosmosG

# usage

img = torch.randn(1, 3, 256, 256)

text = torch.randint(0, 20000, (1, 1024))

model = KosmosG()

output = model(img, text)

print(output)

```

## Architecture

`text, image => KosmosG => text tokens with multi modality understanding`

## License

MIT

## Todo

- Create Aligner in pytorch

- Create Diffusion module

- Integrate these pieces

- Create a training script