{"id":22166928,"url":"https://github.com/origamidream/gato","last_synced_at":"2025-04-06T22:10:18.563Z","repository":{"id":39493798,"uuid":"506975245","full_name":"OrigamiDream/gato","owner":"OrigamiDream","description":"Unofficial Gato: A Generalist Agent","archived":false,"fork":false,"pushed_at":"2024-01-14T03:40:54.000Z","size":80,"stargazers_count":211,"open_issues_count":2,"forks_count":30,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-03-30T21:14:31.256Z","etag":null,"topics":["gato","generalist-agent","neural-network","tensorflow","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OrigamiDream.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-24T10:50:08.000Z","updated_at":"2025-03-26T14:59:45.000Z","dependencies_parsed_at":"2024-12-02T05:38:33.191Z","dependency_job_id":null,"html_url":"https://github.com/OrigamiDream/gato","commit_stats":{"total_commits":37,"total_committers":1,"mean_commits":37.0,"dds":0.0,"last_synced_commit":"0498c52def6042b911dd2b4505f0a7d3f0af8c6d"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OrigamiDream%2Fgato","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OrigamiDream%2Fgato/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OrigamiDream%2Fgato/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OrigamiDream%2Fgato/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OrigamiDream","download_url":"https://codeload.github.com/OrigamiDream/gato/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247557767,"owners_count":20958047,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gato","generalist-agent","neural-network","tensorflow","transformer"],"created_at":"2024-12-02T05:26:07.111Z","updated_at":"2025-04-06T22:10:18.532Z","avatar_url":"https://github.com/OrigamiDream.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eUnofficial Gato: A Generalist Agent\u003c/h1\u003e\n\n[[Deepmind Publication]](https://www.deepmind.com/publications/a-generalist-agent)\n[[arXiv Paper]](https://arxiv.org/pdf/2205.06175.pdf)\n\nThis repository contains Deepmind's Gato architecture imitation in TensorFlow.\n\nSince Deepmind only mentions parts of the architecture in its paper, We still don't know much about the model.\u003cbr\u003e\nHowever, I believe the paper is enough to imitate the architecture, I'm trying to do that with the open source community's help.\n\nCurrently, the repository supports the following operations:\n- Gato (via [`Gato`](https://github.com/OrigamiDream/gato/blob/main/gato/models/__init__.py#L12))\n- Transformer (via [`Transformer`](https://github.com/OrigamiDream/gato/blob/main/gato/models/__init__.py#L61))\n- Patch Position Encodings (via [`PatchPositionEncoding`](https://github.com/OrigamiDream/gato/blob/main/gato/models/embedding.py#L38))\n- Embedding Function (via [`ResidualEmbedding`](https://github.com/OrigamiDream/gato/blob/main/gato/models/embedding.py#L139))\n- Local Observation Position Encodings (via [`LocalPositionEncoding`](https://github.com/OrigamiDream/gato/blob/main/gato/models/embedding.py#L199))\n- Tokenizing Continuous Values (via [`ContinuousValueTokenizer`](https://github.com/OrigamiDream/gato/blob/main/gato/models/tokenizers.py#L30))\n- Shared Embedding (via [`DiscreteEmbedding`](https://github.com/OrigamiDream/gato/blob/main/gato/models/embedding.py#L237))\n\nAction tokens are still a mystery in the paper, I need your help.\n\nHowever, the repository lacks the following miscellaneous.\n- Datasets (most important, Issue: [#1](https://github.com/OrigamiDream/gato/issues/1), [ThomasRochefortB/torch-gato](https://github.com/ThomasRochefortB/torch-gato/blob/main/datasets/README.md))\n- \u003cs\u003ePre-trained tokenizers\u003c/s\u003e (No longer required because of E2E model)\n- Training strategy (E2E, WIP)\n\nBut, you can still explore the basic architecture of the Gato based on the paper.\n\n### Usage\n```bash\n$ pip install gato-tf\n```\n```python\nimport tensorflow as tf\nfrom gato import Gato, GatoConfig\n\n# Create model instance\nconfig = GatoConfig.small()\ngato = Gato(config)\n\n# Fake inputs for Gato\ninput_dim = config.input_dim\ninput_ids = tf.concat([\n  # ...\n  # observation 1\n  tf.random.uniform((1, 1, input_dim)),  # image patch 0\n  tf.random.uniform((1, 1, input_dim)),  # image patch 1\n  tf.random.uniform((1, 1, input_dim)),  # image patch 2\n  # ...\n  tf.random.uniform((1, 1, input_dim)),  # image patch 19\n  tf.fill((1, 1, input_dim), value=0.25),  # continuous value\n  tf.fill((1, 1, input_dim), value=624.0),  # discrete (actions, texts)\n\n  # observation 2\n  tf.random.uniform((1, 1, input_dim)),  # image patch 0\n  tf.random.uniform((1, 1, input_dim)),  # image patch 1\n  tf.random.uniform((1, 1, input_dim)),  # image patch 2\n  # ...\n  tf.random.uniform((1, 1, input_dim)),  # image patch 19\n  tf.fill((1, 1, input_dim), value=0.12),  # continuous value\n  tf.fill((1, 1, input_dim), value=295.0)  # discrete (actions, texts)\n  # ...\n], axis=1)\nencoding = tf.constant([\n  # 0 - image patch embedding\n  # 1 - continuous value embedding\n  # 2 - discrete embedding (actions, texts)\n  [0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 1, 2]\n])\nrow_pos = (\n  tf.constant([[0.00, 0.25, 0.50, 0.75, 0, 0, 0.00, 0.25, 0.50, 0.75, 0, 0]]),  # pos_from\n  tf.constant([[0.25, 0.50, 0.75, 1.00, 0, 0, 0.25, 0.50, 0.75, 1.00, 0, 0]])   # pos_to\n)\ncol_pos = (\n  tf.constant([[0.00, 0.00, 0.00, 0.80, 0, 0, 0.00, 0.00, 0.00, 0.80, 0, 0]]),  # pos_from\n  tf.constant([[0.20, 0.20, 0.20, 1.00, 0, 0, 0.20, 0.20, 0.20, 1.00, 0, 0]])   # pos_to\n)\nobs = (\n  tf.constant([[ 0,  1,  2, 19, 20, 21,  0,  1,  2, 19, 20, 21]]),  # obs token\n  tf.constant([[ 1,  1,  1,  1,  1,  0,  1,  1,  1,  1,  1,  0]])   # obs token masking (for action tokens)\n)\nhidden_states = gato((input_ids, (encoding, row_pos, col_pos), obs))\n```\n### Dataset and Model Architecture\n\u003cpicture\u003e\n  \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://user-images.githubusercontent.com/5837620/215323793-7f7bcfdb-d8be-40d3-8e58-a053511f95d5.png\"\u003e\n  \u003cimg alt=\"gato dataset and model architecture\" src=\"https://user-images.githubusercontent.com/5837620/215323795-3a433516-f5ca-4272-9999-3df87ae521ba.png\"\u003e\n\u003c/picture\u003e\n\n## Paper Reviews\n\n### Full Episode Sequence\n\n\u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://user-images.githubusercontent.com/5837620/175756389-31d183c9-054e-4829-93a6-df79781ca212.png\"\u003e\n    \u003cimg alt=\"gato dataset architecture\" src=\"https://user-images.githubusercontent.com/5837620/175756409-75605dbc-7756-4509-ba93-c0ad08eea309.png\"\u003e\n\u003c/picture\u003e\n\n### Architecture Variants\n\n\u003e Appendix C.1. Transformer Hyperparameters\n\nIn the paper, Deepmind tested Gato with 3 architecture variants, `1.18B`, `364M`, and `79M`.\u003cbr\u003e\nI have named them as `large()`, `baseline()` and `small()` respectively in `GatoConfig`.\n\n| Hyperparameters          | Large(1.18B) | Baseline(364M) | Small(79M) |\n|--------------------------|--------------|----------------|------------|\n| Transformer blocks       | 24           | 12             | 8          |\n| Attention heads          | 16           | 12             | 24         |\n| Layer width              | 2048         | 1536           | 768        |\n| Feedforward hidden size  | 8192         | 6144           | 3072       |\n| Key/value size           | 128          | 128            | 32         |\n\n\n### Residual Embedding\n\n\u003e Appendix C.2. Embedding Function\n\nThere are no mentions that how many residual networks must be stacked for token embeddings.\u003cbr\u003e\nTherefore, I remain configurable in `GatoConfig`.\n\nWhatever how many residual layers are existing, full-preactivation is a key.\n\nThe blocks are consisted of:\n- Version 2 ResNet architecture (based on ResNet50V2)\n- GroupNorm (instead of LayerNorm)\n- GeLU (instead of ReLU)\n\n### Position Encodings\n\n\u003e Appendix C.3. Position Encodings\n\n#### Patch Position Encodings\n\nLike [Vision Transformer (ViT)](https://github.com/google-research/vision_transformer) by Google, Gato takes the input images as raster-ordered 16x16 patches.\u003cbr\u003e\nUnlike the Vision Transformer model, however, Gato divides its patch encoding strategy into 2 phases, training and evaluation.\n\nFor high-performance computation in TensorFlow, I have used the following expressions.\n\n$C$ and $R$ mean column and row-wise, and $F$ and $T$ mean `from` and `to` respectively.\n\n$$\n\\begin{align}\n  v^R_F \u0026= \\begin{bmatrix}\n    0 \u0026 32 \u0026 64 \u0026 96\n  \\end{bmatrix} \\\\\n  v^R_T \u0026= \\begin{bmatrix}\n    32 \u0026 64 \u0026 96 \u0026 128\n  \\end{bmatrix} \\\\\n  v^C_F \u0026= \\begin{bmatrix}\n    0 \u0026 26 \u0026 51 \u0026 77 \u0026 102\n  \\end{bmatrix} \\\\\n  v^C_T \u0026= \\begin{bmatrix}\n    26 \u0026 51 \u0026 77 \u0026 102 \u0026 128\n  \\end{bmatrix} \\\\\n  \\\\\n  P_R \u0026= \\begin{cases}\n    \\mathsf{if} \\ \\mathsf{training} \u0026 v^R_F + \\mathsf{uniform}(v^R_T - v^R_F) \\\\\n    \\mathsf{otherwise} \u0026 \\mathsf{round}(\\frac{v^R_F + v^R_T}{2})\n  \\end{cases} \\\\\n  P_C \u0026= \\begin{cases}\n    \\mathsf{if} \\ \\mathsf{training} \u0026 v^C_F + \\mathsf{uniform}(v^C_T - v^C_F) \\\\\n    \\mathsf{otherwise} \u0026 \\mathsf{round}(\\frac{v^C_F + v^C_T}{2})\n  \\end{cases} \\\\\n  \\\\\n  E^R_P \u0026= P_R \\cdot 1^{\\mathsf{T}}_C \\\\\n  E^C_P \u0026= 1^{\\mathsf{T}}_R \\cdot P_C \\\\\n  \\\\\n  \\therefore E \u0026= E_I + E^R_P + E^C_P\n\\end{align}\n$$\n\n#### Local Observation Position Encodings\n\nIn the definition of Appendix B., text tokens, image patch tokens, and discrete \u0026 continuous values are observation tokens\u003cbr\u003e\nWhen Gato receives those values, they must be encoded with their own (local) time steps.\n\n## Requirements\n\n```bash\npip install tensorflow\u003e=2.11.0\n```\n\n## Contributing\n\nThis repository is still a work in progress.\u003cbr\u003e\nCurrently, no downloads and no executables are provided.\n\nI welcome many contributors who can help.\n\n## License\nLicensed under the [MIT license](https://github.com/OrigamiDream/gato/blob/main/LICENSE).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forigamidream%2Fgato","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Forigamidream%2Fgato","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forigamidream%2Fgato/lists"}