{"id":15521299,"url":"https://github.com/kengz/torcharc","last_synced_at":"2025-10-28T01:13:29.183Z","repository":{"id":40773520,"uuid":"271695702","full_name":"kengz/torcharc","owner":"kengz","description":"TorchArc: Build PyTorch models by specifying architectures.","archived":false,"fork":false,"pushed_at":"2025-01-12T06:05:36.000Z","size":1461,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-27T16:49:50.950Z","etag":null,"topics":["deep-learning","model-generation","neural-network","neural-network-architectures","pytorch","torch","yaml"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kengz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-12T03:02:11.000Z","updated_at":"2025-01-12T06:05:10.000Z","dependencies_parsed_at":"2024-12-22T06:23:03.388Z","dependency_job_id":"35105e14-84f6-4096-95a9-c9203100983b","html_url":"https://github.com/kengz/torcharc","commit_stats":{"total_commits":48,"total_committers":1,"mean_commits":48.0,"dds":0.0,"last_synced_commit":"be00f9fd89bdcace1b4c94646585bdeec30f1e32"},"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/kengz/torcharc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2Ftorcharc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2Ftorcharc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2Ftorcharc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2Ftorcharc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kengz","download_url":"https://codeload.github.com/kengz/torcharc/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2Ftorcharc/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260322791,"owners_count":22991895,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","model-generation","neural-network","neural-network-architectures","pytorch","torch","yaml"],"created_at":"2024-10-02T10:33:39.495Z","updated_at":"2025-10-28T01:13:24.142Z","avatar_url":"https://github.com/kengz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TorchArc\n\n![Test](https://github.com/github/docs/actions/workflows/test.yml/badge.svg)\n\nBuild PyTorch models by specifying architectures.\n\n## Installation\n\nBring [your own PyTorch](https://docs.astral.sh/uv/guides/integration/pytorch/), then install this package:\n\n```bash\npip install torcharc\n```\n\n## Usage\n\n1. specify model architecture in a YAML spec file, e.g. at `spec_filepath = \"./example/spec/basic/mlp.yaml\"`\n2. `import torcharc`.\n   1. (optional) if you have custom torch.nn.Module, e.g. `MyModule`, register it with `torcharc.register_nn(MyModule)`\n3. build with: `model = torcharc.build(spec_filepath)`\n\nThe returned model is a PyTorch `nn.Module`, fully-compatible with `torch.compile`, and mostly compatible with PyTorch JIT script and trace.\n\nSee more examples below, then see how it works at the end.\n\n---\n\n### Example: build model from spec file\n\n```python\nfrom pathlib import Path\n\nimport torch\nimport yaml\n\nimport torcharc\n\n\nfilepath = Path(\".\") / \"torcharc\" / \"example\" / \"spec\" / \"basic\" / \"mlp.yaml\"\n\n# The following are equivalent:\n\n# 1. build from YAML spec file\nmodel = torcharc.build(filepath)\n\n# 2. build from dictionary\nwith filepath.open(\"r\") as f:\n    spec_dict = yaml.safe_load(f)\nmodel = torcharc.build(spec_dict)\n\n# 3. use the underlying Pydantic validator to build the model\nspec = torcharc.Spec(**spec_dict)\nmodel = spec.build()\n```\n\n---\n\n### Example: basic MLP\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/basic/mlp.yaml](torcharc/example/spec/basic/mlp.yaml)\n\n```yaml\nmodules:\n  mlp:\n    Sequential:\n      - Linear:\n          in_features: 128\n          out_features: 64\n      - ReLU:\n      - Linear:\n          in_features: 64\n          out_features: 10\n\ngraph:\n  input: x\n  modules:\n    mlp: [x]\n  output: mlp\n```\n\n\u003c/details\u003e\n\n```python\nmodel = torcharc.build(torcharc.SPEC_DIR / \"basic\" / \"mlp.yaml\")\nassert isinstance(model, torch.nn.Module)\n\n# Run the model and check the output shape\nx = torch.randn(4, 128)\ny = model(x)\nassert y.shape == (4, 10)\n\n# Test compatibility with compile, script and trace\ncompiled_model = torch.compile(model)\nassert compiled_model(x).shape == y.shape\nscripted_model = torch.jit.script(model)\nassert scripted_model(x).shape == y.shape\ntraced_model = torch.jit.trace(model, (x))\nassert traced_model(x).shape == y.shape\n\nmodel\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel\u003c/summary\u003e\n\n```\nGraphModule(\n  (mlp): Sequential(\n    (0): Linear(in_features=128, out_features=64, bias=True)\n    (1): ReLU()\n    (2): Linear(in_features=64, out_features=10, bias=True)\n  )\n)\n```\n\n![](images/mlp.png)\n\n\u003c/details\u003e\n\n---\n\n### Example: MLP (Lazy)\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/basic/mlp_lazy.yaml](torcharc/example/spec/basic/mlp_lazy.yaml)\n\n```yaml\n# NOTE lazy modules is recommended for ease https://pytorch.org/docs/stable/generated/torch.nn.LazyLinear.html\nmodules:\n  mlp:\n    Sequential:\n      - LazyLinear:\n          out_features: 64\n      - ReLU:\n      - LazyLinear:\n          out_features: 10\n\ngraph:\n  input: x\n  modules:\n    mlp: [x]\n  output: mlp\n```\n\n\u003c/details\u003e\n\n```python\n# PyTorch Lazy layers will infer the input size from the first forward pass. This is recommended as it greatly simplifies the model definition.\nmodel = torcharc.build(torcharc.SPEC_DIR / \"basic\" / \"mlp_lazy.yaml\")\nmodel  # shows LazyLinear before first forward pass\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel (before forward pass)\u003c/summary\u003e\n\n```\nGraphModule(\n  (mlp): Sequential(\n    (0): LazyLinear(in_features=0, out_features=64, bias=True)\n    (1): ReLU()\n    (2): LazyLinear(in_features=0, out_features=10, bias=True)\n  )\n)\n```\n\n\u003c/details\u003e\n\n```python\n# Run the model and check the output shape\nx = torch.randn(4, 128)\ny = model(x)\nassert y.shape == (4, 10)\n\n# Because it is lazy - wait till first forward pass to run compile, script or trace\ncompiled_model = torch.compile(model)\n\nmodel  # shows Linear after first forward pass\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel (after forward pass)\u003c/summary\u003e\n\n```\nGraphModule(\n  (mlp): Sequential(\n    (0): Linear(in_features=128, out_features=64, bias=True)\n    (1): ReLU()\n    (2): Linear(in_features=64, out_features=10, bias=True)\n  )\n)\n```\n\n![](images/mlp.png)\n\n\u003c/details\u003e\n\n---\n\n### Example: MNIST conv\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/mnist/conv.yaml](torcharc/example/spec/mnist/conv.yaml)\n\n```yaml\n# MNIST Conv2d model example from PyTorch https://github.com/pytorch/examples/blob/main/mnist/main.py\nmodules:\n  conv:\n    Sequential:\n      - LazyConv2d:\n          out_channels: 32\n          kernel_size: 3\n      - ReLU:\n      - LazyConv2d:\n          out_channels: 64\n          kernel_size: 3\n      - ReLU:\n      - MaxPool2d:\n          kernel_size: 2\n      - Dropout:\n          p: 0.25\n      - Flatten:\n      - LazyLinear:\n          out_features: 128\n      - ReLU:\n      - Dropout:\n          p: 0.5\n      - LazyLinear:\n          out_features: 10\n      - LogSoftmax:\n          dim: 1\n\ngraph:\n  input: image\n  modules:\n    conv: [image]\n  output: conv\n```\n\n\u003c/details\u003e\n\n```python\nmodel = torcharc.build(torcharc.SPEC_DIR / \"mnist\" / \"conv.yaml\")\n\n# Run the model and check the output shape\nx = torch.randn(4, 1, 28, 28)\ny = model(x)\nassert y.shape == (4, 10)\n\nmodel\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel\u003c/summary\u003e\n\n```\nGraphModule(\n  (conv): Sequential(\n    (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))\n    (1): ReLU()\n    (2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))\n    (3): ReLU()\n    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n    (5): Dropout(p=0.25, inplace=False)\n    (6): Flatten(start_dim=1, end_dim=-1)\n    (7): Linear(in_features=9216, out_features=128, bias=True)\n    (8): ReLU()\n    (9): Dropout(p=0.5, inplace=False)\n    (10): Linear(in_features=128, out_features=10, bias=True)\n    (11): LogSoftmax(dim=1)\n  )\n)\n```\n\n![](images/conv.png)\n\n\u003c/details\u003e\n\n---\n\n### Example: MLP (Compact)\n\nUse compact spec that expands into Sequential spec - this is useful for architecture search.\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/compact/mlp.yaml](torcharc/example/spec/compact/mlp.yaml)\n\n```yaml\n# modules:\n#   mlp:\n#     Sequential:\n#       - LazyLinear:\n#           out_features: 64\n#       - ReLU:\n#       - LazyLinear:\n#           out_features: 64\n#       - ReLU:\n#       - LazyLinear:\n#           out_features: 32\n#       - ReLU:\n#       - LazyLinear:\n#           out_features: 16\n#       - ReLU:\n\n# the above can be written compactly as follows\n\nmodules:\n  mlp:\n    compact:\n      layer:\n        type: LazyLinear\n        keys: [out_features]\n        args: [64, 64, 32, 16]\n      postlayer:\n        - ReLU:\n\ngraph:\n  input: x\n  modules:\n    mlp: [x]\n  output: mlp\n```\n\n\u003c/details\u003e\n\n```python\nmodel = torcharc.build(torcharc.SPEC_DIR / \"compact\" / \"mlp.yaml\")\n\n# Run the model and check the output shape\nx = torch.randn(4, 128)\ny = model(x)\nassert y.shape == (4, 16)\n\nmodel\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel\u003c/summary\u003e\n\n```\nGraphModule(\n  (mlp): Sequential(\n    (0): Linear(in_features=128, out_features=64, bias=True)\n    (1): ReLU()\n    (2): Linear(in_features=64, out_features=64, bias=True)\n    (3): ReLU()\n    (4): Linear(in_features=64, out_features=32, bias=True)\n    (5): ReLU()\n    (6): Linear(in_features=32, out_features=16, bias=True)\n    (7): ReLU()\n  )\n)\n```\n\n![](images/mlp_compact.png)\n\n\u003c/details\u003e\n\n---\n\n### Example: Conv (Compact)\n\nUse compact spec that expands into Sequential spec - this is useful for architecture search.\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/compact/conv.yaml](torcharc/example/spec/compact/conv.yaml)\n\n```yaml\n# modules:\n#   conv:\n#     Sequential:\n#       - LazyBatchNorm2d:\n#       - LazyConv2d:\n#           out_channels: 16\n#           kernel_size: 2\n#       - ReLU:\n#       - Dropout:\n#           p: 0.1\n#       - LazyBatchNorm2d:\n#       - LazyConv2d:\n#           out_channels: 32\n#           kernel_size: 3\n#       - ReLU:\n#       - Dropout:\n#           p: 0.1\n#       - LazyBatchNorm2d:\n#       - LazyConv2d:\n#           out_channels: 64\n#           kernel_size: 4\n#       - ReLU:\n#       - Dropout:\n#           p: 0.1\n#   classifier:\n#     Sequential:\n#       - Flatten:\n#       - LazyLinear:\n#           out_features: 10\n\n# the above can be written compactly as follows\n\nmodules:\n  conv:\n    compact:\n      prelayer:\n        - LazyBatchNorm2d:\n      layer:\n        type: LazyConv2d\n        keys: [out_channels, kernel_size]\n        args: [[16, 2], [32, 3], [64, 4]]\n      postlayer:\n        - ReLU:\n        - Dropout:\n            p: 0.1\n  classifier:\n    Sequential:\n      - Flatten:\n      - LazyLinear:\n          out_features: 10\n\ngraph:\n  input: image\n  modules:\n    conv: [image]\n    classifier: [conv]\n  output: classifier\n```\n\n\u003c/details\u003e\n\n```python\nmodel = torcharc.build(torcharc.SPEC_DIR / \"compact\" / \"conv_classifier.yaml\")\n\n# Run the model and check the output shape\nx = torch.randn(4, 1, 28, 28)\ny = model(x)\nassert y.shape == (4, 10)\n\nmodel\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel\u003c/summary\u003e\n\n```\nGraphModule(\n  (conv): Sequential(\n    (0): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n    (1): Conv2d(1, 16, kernel_size=(2, 2), stride=(1, 1))\n    (2): ReLU()\n    (3): Dropout(p=0.1, inplace=False)\n    (4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n    (5): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))\n    (6): ReLU()\n    (7): Dropout(p=0.1, inplace=False)\n    (8): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n    (9): Conv2d(32, 64, kernel_size=(4, 4), stride=(1, 1))\n    (10): ReLU()\n    (11): Dropout(p=0.1, inplace=False)\n  )\n  (classifier): Sequential(\n    (0): Flatten(start_dim=1, end_dim=-1)\n    (1): Linear(in_features=30976, out_features=10, bias=True)\n  )\n)\n```\n\n![](images/conv_classifier_compact.png)\n\n\u003c/details\u003e\n\n---\n\n### Example: Reuse syntax: Stereo Conv\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/basic/stereo_conv_reuse.yaml](torcharc/example/spec/basic/stereo_conv_reuse.yaml)\n\n```yaml\n# stereoscopic vision conv with shared (reused) conv\nmodules:\n  # single conv shared for left and right\n  conv:\n    Sequential:\n      - LazyConv2d:\n          out_channels: 32\n          kernel_size: 3\n      - ReLU:\n      - LazyConv2d:\n          out_channels: 64\n          kernel_size: 3\n      - ReLU:\n      - MaxPool2d:\n          kernel_size: 2\n      - Dropout:\n          p: 0.25\n      - Flatten:\n\n  # separate identical mlp for left and right for processing\n  left_mlp: \u0026left_mlp\n    Sequential:\n      - LazyLinear:\n          out_features: 256\n      - ReLU:\n      - LazyLinear:\n          out_features: 10\n      - ReLU:\n  right_mlp:\n    \u003c\u003c: *left_mlp\n\ngraph:\n  input: [left_image, right_image]\n  modules:\n    # reuse syntax: \u003cmodule\u003e~\u003csuffix\u003e\n    conv~left: [left_image]\n    conv~right: [right_image]\n    left_mlp: [conv~left]\n    right_mlp: [conv~right]\n  output: [left_mlp, right_mlp]\n```\n\n\u003c/details\u003e\n\n```python\n# To use the same module for many passes, specify with the reuse syntax in graph spec: \u003cmodule\u003e~\u003csuffix\u003e.\n# E.g. stereoscopic model with `left` and `right` inputs over a shared `conv` module: `conv~left` and `conv~right`.\nmodel = torcharc.build(torcharc.SPEC_DIR / \"basic\" / \"stereo_conv_reuse.yaml\")\n\n# Run the model and check the output shape\nleft_image = right_image = torch.randn(4, 3, 32, 32)\nleft, right = model(left_image=left_image, right_image=right_image)\nassert left.shape == (4, 10)\nassert right.shape == (4, 10)\n\nmodel\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel\u003c/summary\u003e\n\n```\nGraphModule(\n  (conv): Sequential(\n    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))\n    (1): ReLU()\n    (2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))\n    (3): ReLU()\n    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n    (5): Dropout(p=0.25, inplace=False)\n    (6): Flatten(start_dim=1, end_dim=-1)\n  )\n  (left_mlp): Sequential(\n    (0): Linear(in_features=12544, out_features=256, bias=True)\n    (1): ReLU()\n    (2): Linear(in_features=256, out_features=10, bias=True)\n    (3): ReLU()\n  )\n  (right_mlp): Sequential(\n    (0): Linear(in_features=12544, out_features=256, bias=True)\n    (1): ReLU()\n    (2): Linear(in_features=256, out_features=10, bias=True)\n    (3): ReLU()\n  )\n)\n```\n\n![](images/stereo_conv_reuse.png)\n\n\u003c/details\u003e\n\n---\n\n### Example: transformer\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/transformer/transformer.yaml](torcharc/example/spec/transformer/transformer.yaml)\n\n```yaml\nmodules:\n  src_embed:\n    LazyLinear:\n      out_features: \u0026embed_dim 128\n  tgt_embed:\n    LazyLinear:\n      out_features: *embed_dim\n  transformer:\n    Transformer:\n      d_model: *embed_dim\n      nhead: 8\n      num_encoder_layers: 2\n      num_decoder_layers: 2\n      dim_feedforward: 1024\n      dropout: 0.1\n      batch_first: true\n  mlp:\n    LazyLinear:\n      out_features: 10\n\ngraph:\n  input: [src_x, tgt_x]\n  modules:\n    src_embed: [src_x]\n    tgt_embed: [tgt_x]\n    transformer:\n      src: src_embed\n      tgt: tgt_embed\n    mlp: [transformer]\n  output: mlp\n```\n\n\u003c/details\u003e\n\n```python\nmodel = torcharc.build(torcharc.SPEC_DIR / \"transformer\" / \"transformer.yaml\")\n\n# Run the model and check the output shape\nsrc_x = torch.randn(4, 10, 64)\ntgt_x = torch.randn(4, 20, 64)\ny = model(src_x=src_x, tgt_x=tgt_x)\nassert y.shape == (4, 20, 10)\n\nmodel\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel\u003c/summary\u003e\n\n```\nGraphModule(\n  (src_embed): Linear(in_features=64, out_features=128, bias=True)\n  (tgt_embed): Linear(in_features=64, out_features=128, bias=True)\n  (transformer): Transformer(\n    (encoder): TransformerEncoder(\n      (layers): ModuleList(\n        (0-1): 2 x TransformerEncoderLayer(\n          (self_attn): MultiheadAttention(\n            (out_proj): NonDynamicallyQuantizableLinear(in_features=128, out_features=128, bias=True)\n          )\n          (linear1): Linear(in_features=128, out_features=1024, bias=True)\n          (dropout): Dropout(p=0.1, inplace=False)\n          (linear2): Linear(in_features=1024, out_features=128, bias=True)\n          (norm1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)\n          (norm2): LayerNorm((128,), eps=1e-05, elementwise_affine=True)\n          (dropout1): Dropout(p=0.1, inplace=False)\n          (dropout2): Dropout(p=0.1, inplace=False)\n        )\n      )\n      (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)\n    )\n    (decoder): TransformerDecoder(\n      (layers): ModuleList(\n        (0-1): 2 x TransformerDecoderLayer(\n          (self_attn): MultiheadAttention(\n...\n      (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)\n    )\n  )\n  (mlp): Linear(in_features=128, out_features=10, bias=True)\n)\n```\n\n\u003c/details\u003e\n\n---\n\n### Example: Get modules: Attention\n\n\u003e See more in [torcharc/module/get.py](torcharc/module/get.py)\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/transformer/attn.yaml](torcharc/example/spec/transformer/attn.yaml)\n\n```yaml\nmodules:\n  src_embed:\n    LazyLinear:\n      out_features: \u0026embed_dim 64\n  tgt_embed:\n    LazyLinear:\n      out_features: *embed_dim\n  attn:\n    MultiheadAttention:\n      embed_dim: *embed_dim\n      num_heads: 4\n      batch_first: True\n  attn_output:\n    # attn output is tuple; get the first element to pass to mlp\n    Get:\n      key: 0\n  mlp:\n    LazyLinear:\n      out_features: 10\n\ngraph:\n  input: [src_x, tgt_x]\n  modules:\n    src_embed: [src_x]\n    tgt_embed: [tgt_x]\n    attn:\n      query: src_embed\n      key: tgt_embed\n      value: tgt_embed\n    attn_output: [attn]\n    mlp: [attn_output]\n  output: mlp\n```\n\n\u003c/details\u003e\n\n```python\n# Some cases need \"Get\" module, e.g. get first output from MultiheadAttention\nmodel = torcharc.build(torcharc.SPEC_DIR / \"transformer\" / \"attn.yaml\")\n\n# Run the model and check the output shape\nsrc_x = torch.rand(4, 10, 64)  # (batch_size, seq_len, embed_dim)\ntgt_x = torch.rand(4, 20, 64)  # (batch_size, seq_len, embed_dim)\n# Get module gets the first output from MultiheadAttention\ny = model(src_x=src_x, tgt_x=tgt_x)\nassert y.shape == (4, 10, 10)\n\nmodel\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel\u003c/summary\u003e\n\n```\nGraphModule(\n  (src_embed): Linear(in_features=64, out_features=64, bias=True)\n  (tgt_embed): Linear(in_features=64, out_features=64, bias=True)\n  (attn): MultiheadAttention(\n    (out_proj): NonDynamicallyQuantizableLinear(in_features=64, out_features=64, bias=True)\n  )\n  (attn_output): Get()\n  (mlp): Linear(in_features=64, out_features=10, bias=True)\n)\n```\n\n![](images/attn.png)\n\n\u003c/details\u003e\n\n---\n\n### Example: Merge modules: DLRM\n\n\u003e See more in [torcharc/module/merge.py](torcharc/module/merge.py)\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/advanced/dlrm_sum.yaml](torcharc/example/spec/advanced/dlrm_sum.yaml)\n\n```yaml\n# basic DLRM architecture ref: https://github.com/facebookresearch/dlrm and ref: https://catalog.ngc.nvidia.com/orgs/nvidia/resources/dlrm_for_pytorch\nmodules:\n  # NOTE dense mlp and each embedding needs to be the same size\n  dense_mlp:\n    Sequential:\n      - LazyLinear:\n          out_features: 512\n      - ReLU:\n      - LazyLinear:\n          out_features: 256\n      - ReLU:\n      - LazyLinear:\n          out_features: \u0026feat_dim 128\n      - ReLU:\n\n  cat_embed_0:\n    Embedding:\n      num_embeddings: 1000\n      embedding_dim: *feat_dim\n  cat_embed_1:\n    Embedding:\n      num_embeddings: 1000\n      embedding_dim: *feat_dim\n  cat_embed_2:\n    Embedding:\n      num_embeddings: 1000\n      embedding_dim: *feat_dim\n\n  # pairwise interactions (original mentions sum, pairwise dot, cat) - but modern day this can be anything, e.g. self-attention\n  merge:\n    MergeSum:\n\n  # final classifier for probability of click\n  classifier:\n    Sequential:\n      - LazyLinear:\n          out_features: 256\n      - ReLU:\n      - LazyLinear:\n          out_features: 256\n      - ReLU:\n      - LazyLinear:\n          out_features: 1\n      - Sigmoid:\n\ngraph:\n  input: [dense, cat_0, cat_1, cat_2]\n  modules:\n    dense_mlp: [dense]\n    cat_embed_0: [cat_0]\n    cat_embed_1: [cat_1]\n    cat_embed_2: [cat_2]\n    merge: [[dense_mlp, cat_embed_0, cat_embed_1, cat_embed_2]]\n    classifier: [merge]\n  output: classifier\n```\n\n\u003c/details\u003e\n\n```python\n# For more complex models, merge modules can be used to combine multiple inputs into a single output.\nmodel = torcharc.build(torcharc.SPEC_DIR / \"advanced\" / \"dlrm_sum.yaml\")\n\n# Run the model and check the output shape\ndense = torch.randn(4, 256)\ncat_0 = torch.randint(0, 1000, (4,))\ncat_1 = torch.randint(0, 1000, (4,))\ncat_2 = torch.randint(0, 1000, (4,))\ny = model(dense, cat_0, cat_1, cat_2)\nassert y.shape == (4, 1)\n\nmodel\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel\u003c/summary\u003e\n\n```\nGraphModule(\n  (dense_mlp): Sequential(\n    (0): Linear(in_features=256, out_features=512, bias=True)\n    (1): ReLU()\n    (2): Linear(in_features=512, out_features=256, bias=True)\n    (3): ReLU()\n    (4): Linear(in_features=256, out_features=128, bias=True)\n    (5): ReLU()\n  )\n  (cat_embed_0): Embedding(1000, 128)\n  (cat_embed_1): Embedding(1000, 128)\n  (cat_embed_2): Embedding(1000, 128)\n  (merge): MergeSum()\n  (classifier): Sequential(\n    (0): Linear(in_features=128, out_features=256, bias=True)\n    (1): ReLU()\n    (2): Linear(in_features=256, out_features=256, bias=True)\n    (3): ReLU()\n    (4): Linear(in_features=256, out_features=1, bias=True)\n    (5): Sigmoid()\n  )\n)\n```\n\n![](images/dlrm.png)\n\n\u003c/details\u003e\n\n---\n\n### Example: Merge modules: FiLM\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/merge/film.yaml](torcharc/example/spec/merge/film.yaml)\n\n```yaml\n# FiLM https://distill.pub/2018/feature-wise-transformations/\nmodules:\n  feat:\n    LazyLinear:\n      out_features: \u0026feat_dim 10\n  cond:\n    LazyLinear:\n      out_features: \u0026cond_dim 4\n  merge_0_1:\n    MergeFiLM:\n      feature_dim: *feat_dim\n      conditioner_dim: *cond_dim\n  tail:\n    Linear:\n      in_features: *feat_dim\n      out_features: 1\n\ngraph:\n  input: [x_0, x_1]\n  modules:\n    feat: [x_0]\n    cond: [x_1]\n    merge_0_1:\n      feature: feat\n      conditioner: cond\n    tail: [merge_0_1]\n  output: tail\n```\n\n\u003c/details\u003e\n\n```python\n# Custom MergeFiLM module for Feature-wise Linear Modulation https://distill.pub/2018/feature-wise-transformations/\nmodel = torcharc.build(torcharc.SPEC_DIR / \"advanced\" / \"film_image_state.yaml\")\n\n# Run the model and check the output shape\nimage = torch.randn(4, 3, 32, 32)\nstate = torch.randn(4, 4)\ny = model(image=image, state=state)\nassert y.shape == (4, 10)\n\nmodel\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel\u003c/summary\u003e\n\n```\nGraphModule(\n  (conv_0): Sequential(\n    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1))\n    (1): ReLU()\n  )\n  (gyro): Linear(in_features=4, out_features=10, bias=True)\n  (film_0): MergeFiLM(\n    (gamma): Linear(in_features=10, out_features=64, bias=True)\n    (beta): Linear(in_features=10, out_features=64, bias=True)\n  )\n  (conv_1): Sequential(\n    (0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1))\n    (1): ReLU()\n    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n    (3): Dropout(p=0.25, inplace=False)\n  )\n  (film_1): MergeFiLM(\n    (gamma): Linear(in_features=10, out_features=32, bias=True)\n    (beta): Linear(in_features=10, out_features=32, bias=True)\n  )\n  (flatten): Sequential(\n    (0): Flatten(start_dim=1, end_dim=-1)\n    (1): Linear(in_features=6272, out_features=256, bias=True)\n  )\n  (classifier): Sequential(\n...\n    (1): ReLU()\n    (2): Linear(in_features=64, out_features=10, bias=True)\n    (3): LogSoftmax(dim=1)\n  )\n)\n```\n\n![](images/film.png)\n\n\u003c/details\u003e\n\n---\n\n### Example: Fork modules: Chunk\n\n\u003e See more in [torcharc/module/fork.py](torcharc/module/fork.py)\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/fork/chunk.yaml](torcharc/example/spec/fork/chunk.yaml)\n\n```yaml\nmodules:\n  head:\n    Linear:\n      in_features: 32\n      out_features: 10\n  fork_0_1:\n    ForkChunk:\n      chunks: 2\n  tail_0:\n    Get:\n      key: 0\n  tail_1:\n    Get:\n      key: 1\n\ngraph:\n  input: x\n  modules:\n    head: [x]\n    fork_0_1: [head]\n    tail_0: [fork_0_1]\n    tail_1: [fork_0_1]\n  output: [tail_0, tail_1]\n```\n\n\u003c/details\u003e\n\n```python\n# Fork module can be used to split the input into multiple outputs.\nmodel = torcharc.build(torcharc.SPEC_DIR / \"fork\" / \"chunk.yaml\")\n\n# Run the model and check the output shape\nx = torch.randn(4, 32)\ny = model(x)\nassert len(y) == 2  # tail_0, tail_1\n\nmodel\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel\u003c/summary\u003e\n\n```\nGraphModule(\n  (head): Linear(in_features=32, out_features=10, bias=True)\n  (fork_0_1): ForkChunk()\n  (tail_0): Get()\n  (tail_1): Get()\n)\n```\n\n![](images/chunk.png)\n\n\u003c/details\u003e\n\n---\n\n### Example: reduction op functional modules: Reduce\n\n\u003e See more in [torcharc/module/fn.py](torcharc/module/fn.py)\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/fn/reduce_mean.yaml](torcharc/example/spec/fn/reduce_mean.yaml)\n\n```yaml\nmodules:\n  text_token_embed:\n    Embedding:\n      num_embeddings: 10000\n      embedding_dim: \u0026embed_dim 128\n  text_tfmr:\n    Transformer:\n      d_model: *embed_dim\n      nhead: 8\n      num_encoder_layers: 2\n      num_decoder_layers: 2\n      dropout: 0.1\n      batch_first: true\n  text_embed:\n    # sentence embedding has shape [batch_size, seq_len, embed_dim],\n    # one way to reduce it to a single sentence embedding is by pooling, e.g. reduce with mean over seq\n    Reduce:\n      name: mean\n      dim: 1\n\ngraph:\n  input: [text]\n  modules:\n    text_token_embed: [text]\n    text_tfmr:\n      src: text_token_embed\n      tgt: text_token_embed\n    text_embed: [text_tfmr]\n  output: text_embed\n```\n\n\u003c/details\u003e\n\n```python\n# Sometimes we need to reduce the input tensor, e.g. pooling token embeddings into a single sentence embedding\nmodel = torcharc.build(torcharc.SPEC_DIR / \"fn\" / \"reduce_mean.yaml\")\n\n# Run the model and check the output shape\ntext = torch.randint(0, 1000, (4, 10))\ny = model(text)\nassert y.shape == (4, 128)  # reduced embedding\n\nmodel\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel\u003c/summary\u003e\n\n```\nGraphModule(\n  (text_token_embed): Embedding(10000, 128)\n  (text_tfmr): Transformer(\n    (encoder): TransformerEncoder(\n      (layers): ModuleList(\n        (0-1): 2 x TransformerEncoderLayer(\n          (self_attn): MultiheadAttention(\n            (out_proj): NonDynamicallyQuantizableLinear(in_features=128, out_features=128, bias=True)\n          )\n          (linear1): Linear(in_features=128, out_features=2048, bias=True)\n          (dropout): Dropout(p=0.1, inplace=False)\n          (linear2): Linear(in_features=2048, out_features=128, bias=True)\n          (norm1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)\n          (norm2): LayerNorm((128,), eps=1e-05, elementwise_affine=True)\n          (dropout1): Dropout(p=0.1, inplace=False)\n          (dropout2): Dropout(p=0.1, inplace=False)\n        )\n      )\n      (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)\n    )\n    (decoder): TransformerDecoder(\n      (layers): ModuleList(\n        (0-1): 2 x TransformerDecoderLayer(\n          (self_attn): MultiheadAttention(\n            (out_proj): NonDynamicallyQuantizableLinear(in_features=128, out_features=128, bias=True)\n...\n      (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)\n    )\n  )\n  (text_embed): Reduce()\n)\n```\n\n\u003c/details\u003e\n\n---\n\n### Example: general functional modules: TorchFn\n\n\u003e See more in [torcharc/module/fn.py](torcharc/module/fn.py)\n\n\u003cdetails\u003e\u003csummary\u003espec file\u003c/summary\u003e\n\nFile: [torcharc/example/spec/fn/fn_topk.yaml](torcharc/example/spec/fn/fn_topk.yaml)\n\n```yaml\nmodules:\n  mlp:\n    Sequential:\n      - Flatten:\n      - LazyLinear:\n          out_features: 512\n      - ReLU:\n      - LazyLinear:\n          out_features: 512\n      - ReLU:\n      - LazyLinear:\n          out_features: 10\n      # use generic torch function with caveat of incompatible with JIT script\n      - TorchFn:\n          name: topk\n          k: 3\n\ngraph:\n  input: x\n  modules:\n    mlp: [x]\n  output: mlp\n```\n\n\u003c/details\u003e\n\n```python\n# While TorchArc uses module-based design, not all torch functions are available as modules. Use TorchFn to wrap any torch function into a module - with the caveat that it does not work with JIT script (but does work with torch compile and JIT trace).\nmodel = torcharc.build(torcharc.SPEC_DIR / \"fn\" / \"fn_topk.yaml\")\n\n# Run the model and check the output shape\nx = torch.randn(4, 128)\ny = model(x)  # topk output with k=3\nassert y.indices.shape == (4, 3)\nassert y.values.shape == (4, 3)\n\nmodel\n```\n\n\u003cdetails\u003e\u003csummary\u003emodel\u003c/summary\u003e\n\n```\nGraphModule(\n  (mlp): Sequential(\n    (0): Flatten(start_dim=1, end_dim=-1)\n    (1): Linear(in_features=128, out_features=512, bias=True)\n    (2): ReLU()\n    (3): Linear(in_features=512, out_features=512, bias=True)\n    (4): ReLU()\n    (5): Linear(in_features=512, out_features=10, bias=True)\n    (6): TorchFn()\n  )\n)\n```\n\n\u003c/details\u003e\n\n---\n\n### Example: more\n\nSee more examples:\n\n- demo notebook from above [torcharc/example/notebook/demo.py](torcharc/example/notebook/demo.py)\n- Lightning MNIST example usage [torcharc/example/notebook/lightning_mnist.py](torcharc/example/notebook/lightning_mnist.py)\n- spec files [torcharc/example/spec/](torcharc/example/spec/)\n- unit tests [test/example/spec/](test/example/spec/)\n\n## How does it work\n\nTorchArc uses `fx.GraphModule` to build a model from a spec file:\n\n1. Spec is defined via Pydantic [torcharc/validator/](torcharc/validator/). This defines:\n   - `modules`: the `torch.nn.Module`s as the main compute graph nodes\n   - `graph`: the graph structure, i.e. how the modules are connected\n2. build model using `torch.fx.GraphModule(modules, graph)`\n\nSee more in the pydantic spec definition:\n\n- [torcharc/validator/spec.py](torcharc/validator/spec.py): the top level spec used by torcharc\n- [torcharc/validator/modules.py](torcharc/validator/modules.py): the module spec where `torch.nn.Module`s are defined\n- [torcharc/validator/graph.py](torcharc/validator/graph.py): the graph spec where the graph structure is defined\n\n### Guiding principles\n\nThe design of TorchArc is guided as follows:\n\n1. simple: the module spec is straightforward:\n   1. it is simply torch.nn.Module class name with kwargs.\n   1. it includes official torch.nn.Modules, Sequential, and custom-defined modules registered via `torcharc.register_nn`\n   1. it uses modules only for - torcharc is not meant to replace custom functions that is best written in PyTorch code\n1. expressive: it can be used to build both simple and advanced model architecture easily\n1. portable: it returns torch.nn.Module that can be used anywhere; it is not a framework.\n1. performant: PyTorch-native, fully-compatible with `torch.compile`, and mostly with torch JIT script and trace\n1. parametrizable: data-based model-building unlocks fast experimentation, e.g. by building logic for hyperparameter / architecture search\n\n## Development\n\n### Setup\n\n[Install uv](https://docs.astral.sh/uv/getting-started/installation/) for dependency management if you haven't already. Then run:\n\n```bash\n# setup virtualenv\nuv sync\n```\n\n### Unit Tests\n\n```bash\nuv run pytest\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkengz%2Ftorcharc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkengz%2Ftorcharc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkengz%2Ftorcharc/lists"}