{"id":13561338,"url":"https://github.com/srush/Transformer-Puzzles","last_synced_at":"2025-04-03T17:30:37.531Z","repository":{"id":159685142,"uuid":"634782120","full_name":"srush/Transformer-Puzzles","owner":"srush","description":"Puzzles for exploring transformers","archived":false,"fork":false,"pushed_at":"2023-05-04T18:55:23.000Z","size":151,"stargazers_count":335,"open_issues_count":1,"forks_count":28,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-30T21:13:09.909Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/srush.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-05-01T07:09:28.000Z","updated_at":"2025-03-22T16:57:38.000Z","dependencies_parsed_at":null,"dependency_job_id":"badfea42-2959-4862-8e29-b9d5d5689820","html_url":"https://github.com/srush/Transformer-Puzzles","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srush%2FTransformer-Puzzles","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srush%2FTransformer-Puzzles/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srush%2FTransformer-Puzzles/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srush%2FTransformer-Puzzles/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/srush","download_url":"https://codeload.github.com/srush/Transformer-Puzzles/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247046754,"owners_count":20874715,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T13:00:55.056Z","updated_at":"2025-04-03T17:30:37.276Z","avatar_url":"https://github.com/srush.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# Transformer Puzzles\n\n\u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/srush/Transformer-Puzzles/blob/main/TransformerPuzzlers.ipynb\"\u003e\n  \u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\n\u003c/a\u003e\n\n\n\n\u003c!-- #region id=\"e9e822cb\" --\u003e\nThis notebook is a collection of short coding puzzles based on the internals of the Transformer. The puzzles are written in Python and can be done in this notebook. After completing these you will have a much better intutive sense of how a Transformer can compute certain logical operations. \n\nThese puzzles are based on [Thinking Like Transformers](https://arxiv.org/pdf/2106.06981.pdf) by Gail Weiss, Yoav Goldberg, Eran Yahav and derived from this [blog post](https://srush.github.io/raspy/).\n\u003c!-- #endregion --\u003e\n\n![image](https://user-images.githubusercontent.com/35882/235678934-44c83052-9743-4de7-a46c-49a517923da1.png)\n\n\n\u003c!-- #region id=\"8e962052\" --\u003e\n## Goal\n\n**Can we produce a Transformer that does basic elementary school addition?**\n\ni.e. given a string \"19492+23919\" can we produce the correct output? \n\u003c!-- #endregion --\u003e\n\n\u003c!-- #region id=\"d332140b\" --\u003e\n## Rules\n\nEach exercise consists of a function with a argument `seq` and output `seq`. Like a transformer we cannot change length. Operations need to act on the entire sequence in parallel. There is a global `indices` which tells use the position in the sequence. If we want to do something different on certain positions we can use `where` like in Numpy or PyTorch. To run the seq we need to give it an initial input. \n\u003c!-- #endregion --\u003e\n\n\n```python colab={\"base_uri\": \"https://localhost:8080/\", \"height\": 96} id=\"1b28dc98\" outputId=\"f1ac1157-3db8-40c0-dbb2-7d9bad8943a0\"\ndef even_vals(seq=tokens):\n    \"Keep even positions, set odd positions to -1\"\n    x = indices % 2\n    # Note that all operations broadcast so you can use scalars.\n    return where(x == 0, seq, -1)\nseq = even_vals()\n\n# Give the initial input tokens\nseq.input([0,1,2,3,4])\n```\n\n\u003c!-- #region id=\"9dc23f88\" --\u003e\nThe main operation you can use is \"attention\". You do this by defining a selector which forms a matrix based on `key` and `query`.\n\u003c!-- #endregion --\u003e\n\n```python colab={\"base_uri\": \"https://localhost:8080/\", \"height\": 176} id=\"e2ee0ff8\" outputId=\"a61ac19c-2550-4f3c-d653-50c323cdfd59\"\nbefore = key(indices) \u003c query(indices)\nbefore\n```\n\n\u003c!-- #region id=\"a4de0a14\" --\u003e\nWe can combine selectors with logical operations.\n\u003c!-- #endregion --\u003e\n\n```python colab={\"base_uri\": \"https://localhost:8080/\", \"height\": 201} id=\"c315ba6d\" outputId=\"270d50fa-649c-438b-8606-d3d078478162\"\nbefore_or_same = before | (key(indices) == query(indices))\nbefore_or_same\n```\n\n\u003c!-- #region id=\"00bc66a3\" --\u003e\nOnce you have a selector, you can apply \"attention\" to sum over the grey positions. For example to compute cumulative such we run the following function. \n\u003c!-- #endregion --\u003e\n\n```python colab={\"base_uri\": \"https://localhost:8080/\", \"height\": 326} id=\"e79c8c8b\" outputId=\"44db7f90-502d-497c-c5ba-4062c09f0a9a\"\ndef cumsum(seq=tokens):\n    return before_or_same.value(seq)\nseq = cumsum()\nseq.input([0, 1, 2, 3, 4])\n```\n\nGood luck!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrush%2FTransformer-Puzzles","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsrush%2FTransformer-Puzzles","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrush%2FTransformer-Puzzles/lists"}