{"id":13738385,"url":"https://github.com/antofuller/configaformers","last_synced_at":"2025-05-08T16:33:33.584Z","repository":{"id":200634263,"uuid":"400295480","full_name":"antofuller/configaformers","owner":"antofuller","description":"A python library for highly configurable transformers - easing model architecture search and experimentation.","archived":true,"fork":false,"pushed_at":"2021-11-30T02:49:33.000Z","size":1848,"stargazers_count":49,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-11-15T07:34:18.780Z","etag":null,"topics":["artificial-intelligence","deep-learning","pytorch","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antofuller.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-08-26T20:19:50.000Z","updated_at":"2024-10-05T14:52:50.000Z","dependencies_parsed_at":"2023-10-17T11:41:31.509Z","dependency_job_id":null,"html_url":"https://github.com/antofuller/configaformers","commit_stats":null,"previous_names":["antofuller/configaformers"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antofuller%2Fconfigaformers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antofuller%2Fconfigaformers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antofuller%2Fconfigaformers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antofuller%2Fconfigaformers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antofuller","download_url":"https://codeload.github.com/antofuller/configaformers/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253105504,"owners_count":21855044,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","deep-learning","pytorch","transformers"],"created_at":"2024-08-03T03:02:20.794Z","updated_at":"2025-05-08T16:33:33.220Z","avatar_url":"https://github.com/antofuller.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# configaformers\nA python library for highly configurable transformers - easing model architecture search and experimentation.\n\nSpecial thanks to lucidrains (https://github.com/lucidrains) and Kharr.\n\n## Notable Features\nThe main purpose of this library is to allow users to quickly construct transformers by editing config files. We will also provide prebuilt configurations to common or promising model architectures.\n\nAnother feature is our model compiler. When a model is initialized it will print out (on your console) all modules, shapes, input and output names. It also performs shape checking which helps catch errors prior to running data through the model.\n\n## Setup\nRequirements: PyTorch and einops\n```bash\ngit clone https://github.com/antofuller/configaformers.git\ncd configaformers\n```\n\n## Usage\nQuick demo that will configure a 768-wide, 12-layer transformer, with a language modeling head.\n\nImport, and create token embedding block:\n\n```python\nfrom model_builder import ConfigaFormer\nfrom prebuilt_blocks import get_transformer_block\n\nmodel_dim = 768\nnum_heads = 12\nvocab_size = 50257\n\n# Token embedding block\nemb = [{'type': 'embedding',\n        'output_dim': model_dim,\n        'num_classes': vocab_size}]\n```\n\nUse our prebuilt transformer block:\n\n```python\nt_block = transformer_block(num_heads=num_heads, dim=model_dim)\n```\n\nCreate language modeling head:\n\n```python\nto_logits = [{'type': 'linear',\n              'output_dim': vocab_size,\n              'output_name': 'logits'}]\n```\n\nCreate blocks, initialize input shapes, and init the model:\n\n```python\nmy_blocks = [{\"config\": emb,\n              \"repeat\": 1},\n             {\"config\": t_block,\n              \"repeat\": 12},\n             {\"config\": to_logits,\n              \"repeat\": 1},\n             ]\n\ninput_streams = {'emb_ids': ['B', 'L_in'], 'attn_offset': ['B', num_heads, 'L_in', 'L_in'],}\n\nmodel = ConfigaFormer(blocks=my_blocks, input_shapes=input_streams).cuda()\n```\n\nThis will print out the transformer config:\n\n```bash\nBlock #1, 1x\nembedding -\u003e Input(s): emb_ids (BSZ, L_in) - Output(s): x (BSZ, L_in, 768)\n\n\nBlock #2, 12x\nmake_stream -\u003e Input(s): x (BSZ, L_in, 768) - Output(s): residual (BSZ, L_in, 768)\nnorm -\u003e Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)\nlinear -\u003e Input(s): x (BSZ, L_in, 768) - Output(s): queries (BSZ, L_in, 768)\nlinear -\u003e Input(s): x (BSZ, L_in, 768) - Output(s): keys (BSZ, L_in, 768)\nlinear -\u003e Input(s): x (BSZ, L_in, 768) - Output(s): values (BSZ, L_in, 768)\nmake_heads -\u003e Input(s): queries (BSZ, L_in, 768) - Output(s): queries (BSZ, 12, L_in, 64)\nmake_heads -\u003e Input(s): keys (BSZ, L_in, 768) - Output(s): keys (BSZ, 12, L_in, 64)\nmake_heads -\u003e Input(s): values (BSZ, L_in, 768) - Output(s): values (BSZ, 12, L_in, 64)\nmha_dots -\u003e Input(s): queries (BSZ, 12, L_in, 64), keys (BSZ, 12, L_in, 64) - Output(s): attn_dots (BSZ, 12, L_in, L_in)\nmerge_streams -\u003e Input(s): attn_dots (BSZ, 12, L_in, L_in), attn_offset (B, 12, L_in, L_in) - Output(s): attn_dots (BSZ, 12, L_in, L_in)\nmha_sum -\u003e Input(s): values (BSZ, 12, L_in, 64), attn_dots (BSZ, 12, L_in, L_in) - Output(s): x (BSZ, 12, L_in, 64)\nmerge_heads -\u003e Input(s): x (BSZ, 12, L_in, 64) - Output(s): x (BSZ, L_in, 768)\nlinear -\u003e Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)\nmerge_streams -\u003e Input(s): x (BSZ, L_in, 768), residual (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)\nmake_stream -\u003e Input(s): x (BSZ, L_in, 768) - Output(s): residual (BSZ, L_in, 768)\nnorm -\u003e Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)\nlinear -\u003e Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 3072)\nactivation -\u003e Input(s): x (BSZ, L_in, 3072) - Output(s): x (BSZ, L_in, 3072)\nlinear -\u003e Input(s): x (BSZ, L_in, 3072) - Output(s): x (BSZ, L_in, 768)\nmerge_streams -\u003e Input(s): x (BSZ, L_in, 768), residual (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)\n\n\nBlock #3, 1x\nlinear -\u003e Input(s): x (BSZ, L_in, 768) - Output(s): logits (BSZ, L_in, 50257)\n```\n\nBefore running, we need to get the attention offset (in this case, AliBi with a causal mask):\n\n```python\nfrom utils import get_alibi\n\nattn_offset = get_alibi(num_heads=12, max_length=1024)\n```\n\nNow we can use the model:\n\n```python\n# Prepare attention offset by repeating it over the batch dimension\nattn_offset = attn_offset.repeat(bsz, 1, 1, 1)\n\ninput_data = {'emb_ids': batch_ids.view(bsz, 1024).cuda(),\n              'attn_offset': attn_offset.cuda()}\n\nlogits = model(input_data)['logits'].view(bsz, 1024, 50257)\n```\n\n## Features on the way...\n1. Revamp rearrange module\n2. Product-Key memories\n3. Create more prebuilt blocks\n4. Improve attention offsets and masking\n5. Experiment with Triton for speed-up","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantofuller%2Fconfigaformers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantofuller%2Fconfigaformers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantofuller%2Fconfigaformers/lists"}