{"id":15639298,"url":"https://github.com/graykode/modelsummary","last_synced_at":"2025-08-22T03:32:32.691Z","repository":{"id":57442661,"uuid":"173745720","full_name":"graykode/modelsummary","owner":"graykode","description":"All Model summary in PyTorch similar to `model.summary()` in Keras","archived":false,"fork":false,"pushed_at":"2019-04-24T09:01:32.000Z","size":30,"stargazers_count":87,"open_issues_count":3,"forks_count":26,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-12-11T04:32:15.987Z","etag":null,"topics":["keras","python","pytorch","summary"],"latest_commit_sha":null,"homepage":"https://www.reddit.com/r/MachineLearning/comments/ax83h7/p_modelsummary_pip_project_in_pytorch/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/graykode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-04T13:01:36.000Z","updated_at":"2024-10-14T23:56:16.000Z","dependencies_parsed_at":"2022-09-26T17:21:17.853Z","dependency_job_id":null,"html_url":"https://github.com/graykode/modelsummary","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fmodelsummary","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fmodelsummary/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fmodelsummary/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fmodelsummary/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/graykode","download_url":"https://codeload.github.com/graykode/modelsummary/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230554330,"owners_count":18244234,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["keras","python","pytorch","summary"],"created_at":"2024-10-03T11:25:19.142Z","updated_at":"2024-12-20T08:09:05.315Z","avatar_url":"https://github.com/graykode.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## modelsummary (Pytorch Model summary)\n\n\u003e Keras style model.summary() in PyTorch, [torchsummary](https://github.com/sksq96/pytorch-summary)\n\nThis is Pytorch library for visualization Improved tool of [torchsummary](https://github.com/sksq96/pytorch-summary) and [torchsummaryX](https://github.com/nmhkahn/torchsummaryX). I was inspired by [torchsummary](https://github.com/sksq96/pytorch-summary) and I written down code which i referred to. **It is not care with number of Input parameter!** \n\n```python\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom modelsummary import summary\n\nclass Net(nn.Module):\n    def __init__(self):\n        super(Net, self).__init__()\n        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)\n        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)\n        self.conv2_drop = nn.Dropout2d()\n        self.fc1 = nn.Linear(320, 50)\n        self.fc2 = nn.Linear(50, 10)\n\n    def forward(self, x):\n        x = F.relu(F.max_pool2d(self.conv1(x), 2))\n        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))\n        x = x.view(-1, 320)\n        x = F.relu(self.fc1(x))\n        x = F.dropout(x, training=self.training)\n        x = self.fc2(x)\n        return F.log_softmax(x, dim=1)\n\n# show input shape\nsummary(Net(), torch.zeros((1, 1, 28, 28)), show_input=True)\n\n# show output shape\nsummary(Net(), torch.zeros((1, 1, 28, 28)), show_input=False)\n```\n\n```\n-----------------------------------------------------------------------\n             Layer (type)                Input Shape         Param #\n=======================================================================\n                 Conv2d-1            [-1, 1, 28, 28]             260\n                 Conv2d-2           [-1, 10, 12, 12]           5,020\n              Dropout2d-3             [-1, 20, 8, 8]               0\n                 Linear-4                  [-1, 320]          16,050\n                 Linear-5                   [-1, 50]             510\n=======================================================================\nTotal params: 21,840\nTrainable params: 21,840\nNon-trainable params: 0\n-----------------------------------------------------------------------\n\n-----------------------------------------------------------------------\n             Layer (type)               Output Shape         Param #\n=======================================================================\n                 Conv2d-1           [-1, 10, 24, 24]             260\n                 Conv2d-2             [-1, 20, 8, 8]           5,020\n              Dropout2d-3             [-1, 20, 8, 8]               0\n                 Linear-4                   [-1, 50]          16,050\n                 Linear-5                   [-1, 10]             510\n=======================================================================\nTotal params: 21,840\nTrainable params: 21,840\nNon-trainable params: 0\n-----------------------------------------------------------------------\n```\n\n\n\n## Quick Start \n\nJust download with pip `modelsummary`\n\n`pip install modelsummary` and `from modelsummary import summary`\n\nYou can use this library like this. If you see more detail, Please see example code.\n\n```\nfrom modelsummary import summary\n\nmodel = your_model_name()\n\n# show input shape\nsummary(model, (input tensor you want), show_input=True)\n\n# show output shape\nsummary(model, (input tensor you want), show_input=False)\n\n# show hierarchical struct\nsummary(model, (input tensor you want), show_hierarchical=True)\n```\n\n\n\nsummary function has this parameter options`def summary(model, *inputs, batch_size=-1, show_input=True, show_hierarchical=False)`\n\n#### Options\n\n- model : your model class\n- *input : your input tensor **datas** (Asterisk)\n- batch_size : `-1` is same with tensor `None`\n- show_input : show input shape data, **if this parameter is False, it will show output shape** **default : True**\n- show_hierarchical : show hierarchical data structure, **default : False**\n\n\n\n## Result\n\nRun example using Transformer Model in [Attention is all you need paper(2017)](https://arxiv.org/abs/1706.03762)\n\n1) showing input shape\n\n```\n# show input shape\nsummary(model, enc_inputs, dec_inputs, show_input=True)\n\n-----------------------------------------------------------------------\n             Layer (type)                Input Shape         Param #\n=======================================================================\n                Encoder-1                    [-1, 5]               0\n              Embedding-2                    [-1, 5]           3,072\n              Embedding-3                    [-1, 5]           3,072\n           EncoderLayer-4               [-1, 5, 512]               0\n     MultiHeadAttention-5               [-1, 5, 512]               0\n                 Linear-6               [-1, 5, 512]         262,656\n                 Linear-7               [-1, 5, 512]         262,656\n                 Linear-8               [-1, 5, 512]         262,656\n  PoswiseFeedForwardNet-9               [-1, 5, 512]               0\n                Conv1d-10               [-1, 512, 5]       1,050,624\n                Conv1d-11              [-1, 2048, 5]       1,049,088\n          EncoderLayer-12               [-1, 5, 512]               0\n    MultiHeadAttention-13               [-1, 5, 512]               0\n                Linear-14               [-1, 5, 512]         262,656\n                Linear-15               [-1, 5, 512]         262,656\n                Linear-16               [-1, 5, 512]         262,656\n PoswiseFeedForwardNet-17               [-1, 5, 512]               0\n                Conv1d-18               [-1, 512, 5]       1,050,624\n                Conv1d-19              [-1, 2048, 5]       1,049,088\n          EncoderLayer-20               [-1, 5, 512]               0\n    MultiHeadAttention-21               [-1, 5, 512]               0\n                Linear-22               [-1, 5, 512]         262,656\n                Linear-23               [-1, 5, 512]         262,656\n                Linear-24               [-1, 5, 512]         262,656\n PoswiseFeedForwardNet-25               [-1, 5, 512]               0\n                Conv1d-26               [-1, 512, 5]       1,050,624\n                Conv1d-27              [-1, 2048, 5]       1,049,088\n          EncoderLayer-28               [-1, 5, 512]               0\n    MultiHeadAttention-29               [-1, 5, 512]               0\n                Linear-30               [-1, 5, 512]         262,656\n                Linear-31               [-1, 5, 512]         262,656\n                Linear-32               [-1, 5, 512]         262,656\n PoswiseFeedForwardNet-33               [-1, 5, 512]               0\n                Conv1d-34               [-1, 512, 5]       1,050,624\n                Conv1d-35              [-1, 2048, 5]       1,049,088\n          EncoderLayer-36               [-1, 5, 512]               0\n    MultiHeadAttention-37               [-1, 5, 512]               0\n                Linear-38               [-1, 5, 512]         262,656\n                Linear-39               [-1, 5, 512]         262,656\n                Linear-40               [-1, 5, 512]         262,656\n PoswiseFeedForwardNet-41               [-1, 5, 512]               0\n                Conv1d-42               [-1, 512, 5]       1,050,624\n                Conv1d-43              [-1, 2048, 5]       1,049,088\n          EncoderLayer-44               [-1, 5, 512]               0\n    MultiHeadAttention-45               [-1, 5, 512]               0\n                Linear-46               [-1, 5, 512]         262,656\n                Linear-47               [-1, 5, 512]         262,656\n                Linear-48               [-1, 5, 512]         262,656\n PoswiseFeedForwardNet-49               [-1, 5, 512]               0\n                Conv1d-50               [-1, 512, 5]       1,050,624\n                Conv1d-51              [-1, 2048, 5]       1,049,088\n               Decoder-52                    [-1, 5]               0\n             Embedding-53                    [-1, 5]           3,584\n             Embedding-54                    [-1, 5]           3,072\n          DecoderLayer-55               [-1, 5, 512]               0\n    MultiHeadAttention-56               [-1, 5, 512]               0\n                Linear-57               [-1, 5, 512]         262,656\n                Linear-58               [-1, 5, 512]         262,656\n                Linear-59               [-1, 5, 512]         262,656\n    MultiHeadAttention-60               [-1, 5, 512]               0\n                Linear-61               [-1, 5, 512]         262,656\n                Linear-62               [-1, 5, 512]         262,656\n                Linear-63               [-1, 5, 512]         262,656\n PoswiseFeedForwardNet-64               [-1, 5, 512]               0\n                Conv1d-65               [-1, 512, 5]       1,050,624\n                Conv1d-66              [-1, 2048, 5]       1,049,088\n          DecoderLayer-67               [-1, 5, 512]               0\n    MultiHeadAttention-68               [-1, 5, 512]               0\n                Linear-69               [-1, 5, 512]         262,656\n                Linear-70               [-1, 5, 512]         262,656\n                Linear-71               [-1, 5, 512]         262,656\n    MultiHeadAttention-72               [-1, 5, 512]               0\n                Linear-73               [-1, 5, 512]         262,656\n                Linear-74               [-1, 5, 512]         262,656\n                Linear-75               [-1, 5, 512]         262,656\n PoswiseFeedForwardNet-76               [-1, 5, 512]               0\n                Conv1d-77               [-1, 512, 5]       1,050,624\n                Conv1d-78              [-1, 2048, 5]       1,049,088\n          DecoderLayer-79               [-1, 5, 512]               0\n    MultiHeadAttention-80               [-1, 5, 512]               0\n                Linear-81               [-1, 5, 512]         262,656\n                Linear-82               [-1, 5, 512]         262,656\n                Linear-83               [-1, 5, 512]         262,656\n    MultiHeadAttention-84               [-1, 5, 512]               0\n                Linear-85               [-1, 5, 512]         262,656\n                Linear-86               [-1, 5, 512]         262,656\n                Linear-87               [-1, 5, 512]         262,656\n PoswiseFeedForwardNet-88               [-1, 5, 512]               0\n                Conv1d-89               [-1, 512, 5]       1,050,624\n                Conv1d-90              [-1, 2048, 5]       1,049,088\n          DecoderLayer-91               [-1, 5, 512]               0\n    MultiHeadAttention-92               [-1, 5, 512]               0\n                Linear-93               [-1, 5, 512]         262,656\n                Linear-94               [-1, 5, 512]         262,656\n                Linear-95               [-1, 5, 512]         262,656\n    MultiHeadAttention-96               [-1, 5, 512]               0\n                Linear-97               [-1, 5, 512]         262,656\n                Linear-98               [-1, 5, 512]         262,656\n                Linear-99               [-1, 5, 512]         262,656\nPoswiseFeedForwardNet-100               [-1, 5, 512]               0\n               Conv1d-101               [-1, 512, 5]       1,050,624\n               Conv1d-102              [-1, 2048, 5]       1,049,088\n         DecoderLayer-103               [-1, 5, 512]               0\n   MultiHeadAttention-104               [-1, 5, 512]               0\n               Linear-105               [-1, 5, 512]         262,656\n               Linear-106               [-1, 5, 512]         262,656\n               Linear-107               [-1, 5, 512]         262,656\n   MultiHeadAttention-108               [-1, 5, 512]               0\n               Linear-109               [-1, 5, 512]         262,656\n               Linear-110               [-1, 5, 512]         262,656\n               Linear-111               [-1, 5, 512]         262,656\nPoswiseFeedForwardNet-112               [-1, 5, 512]               0\n               Conv1d-113               [-1, 512, 5]       1,050,624\n               Conv1d-114              [-1, 2048, 5]       1,049,088\n         DecoderLayer-115               [-1, 5, 512]               0\n   MultiHeadAttention-116               [-1, 5, 512]               0\n               Linear-117               [-1, 5, 512]         262,656\n               Linear-118               [-1, 5, 512]         262,656\n               Linear-119               [-1, 5, 512]         262,656\n   MultiHeadAttention-120               [-1, 5, 512]               0\n               Linear-121               [-1, 5, 512]         262,656\n               Linear-122               [-1, 5, 512]         262,656\n               Linear-123               [-1, 5, 512]         262,656\nPoswiseFeedForwardNet-124               [-1, 5, 512]               0\n               Conv1d-125               [-1, 512, 5]       1,050,624\n               Conv1d-126              [-1, 2048, 5]       1,049,088\n               Linear-127               [-1, 5, 512]           3,584\n=======================================================================\nTotal params: 39,396,352\nTrainable params: 39,390,208\nNon-trainable params: 6,144\n```\n\n2) showing output shape\n\n```\n# show output shape\nsummary(model, enc_inputs, dec_inputs, show_input=False)\n\n-----------------------------------------------------------------------\n             Layer (type)               Output Shape         Param #\n=======================================================================\n              Embedding-1               [-1, 5, 512]           3,072\n              Embedding-2               [-1, 5, 512]           3,072\n                 Linear-3               [-1, 5, 512]         262,656\n                 Linear-4               [-1, 5, 512]         262,656\n                 Linear-5               [-1, 5, 512]         262,656\n     MultiHeadAttention-6              [-1, 8, 5, 5]               0\n                 Conv1d-7              [-1, 2048, 5]       1,050,624\n                 Conv1d-8               [-1, 512, 5]       1,049,088\n  PoswiseFeedForwardNet-9               [-1, 5, 512]               0\n          EncoderLayer-10              [-1, 8, 5, 5]               0\n                Linear-11               [-1, 5, 512]         262,656\n                Linear-12               [-1, 5, 512]         262,656\n                Linear-13               [-1, 5, 512]         262,656\n    MultiHeadAttention-14              [-1, 8, 5, 5]               0\n                Conv1d-15              [-1, 2048, 5]       1,050,624\n                Conv1d-16               [-1, 512, 5]       1,049,088\n PoswiseFeedForwardNet-17               [-1, 5, 512]               0\n          EncoderLayer-18              [-1, 8, 5, 5]               0\n                Linear-19               [-1, 5, 512]         262,656\n                Linear-20               [-1, 5, 512]         262,656\n                Linear-21               [-1, 5, 512]         262,656\n    MultiHeadAttention-22              [-1, 8, 5, 5]               0\n                Conv1d-23              [-1, 2048, 5]       1,050,624\n                Conv1d-24               [-1, 512, 5]       1,049,088\n PoswiseFeedForwardNet-25               [-1, 5, 512]               0\n          EncoderLayer-26              [-1, 8, 5, 5]               0\n                Linear-27               [-1, 5, 512]         262,656\n                Linear-28               [-1, 5, 512]         262,656\n                Linear-29               [-1, 5, 512]         262,656\n    MultiHeadAttention-30              [-1, 8, 5, 5]               0\n                Conv1d-31              [-1, 2048, 5]       1,050,624\n                Conv1d-32               [-1, 512, 5]       1,049,088\n PoswiseFeedForwardNet-33               [-1, 5, 512]               0\n          EncoderLayer-34              [-1, 8, 5, 5]               0\n                Linear-35               [-1, 5, 512]         262,656\n                Linear-36               [-1, 5, 512]         262,656\n                Linear-37               [-1, 5, 512]         262,656\n    MultiHeadAttention-38              [-1, 8, 5, 5]               0\n                Conv1d-39              [-1, 2048, 5]       1,050,624\n                Conv1d-40               [-1, 512, 5]       1,049,088\n PoswiseFeedForwardNet-41               [-1, 5, 512]               0\n          EncoderLayer-42              [-1, 8, 5, 5]               0\n                Linear-43               [-1, 5, 512]         262,656\n                Linear-44               [-1, 5, 512]         262,656\n                Linear-45               [-1, 5, 512]         262,656\n    MultiHeadAttention-46              [-1, 8, 5, 5]               0\n                Conv1d-47              [-1, 2048, 5]       1,050,624\n                Conv1d-48               [-1, 512, 5]       1,049,088\n PoswiseFeedForwardNet-49               [-1, 5, 512]               0\n          EncoderLayer-50              [-1, 8, 5, 5]               0\n               Encoder-51              [-1, 8, 5, 5]               0\n             Embedding-52               [-1, 5, 512]           3,584\n             Embedding-53               [-1, 5, 512]           3,072\n                Linear-54               [-1, 5, 512]         262,656\n                Linear-55               [-1, 5, 512]         262,656\n                Linear-56               [-1, 5, 512]         262,656\n    MultiHeadAttention-57              [-1, 8, 5, 5]               0\n                Linear-58               [-1, 5, 512]         262,656\n                Linear-59               [-1, 5, 512]         262,656\n                Linear-60               [-1, 5, 512]         262,656\n    MultiHeadAttention-61              [-1, 8, 5, 5]               0\n                Conv1d-62              [-1, 2048, 5]       1,050,624\n                Conv1d-63               [-1, 512, 5]       1,049,088\n PoswiseFeedForwardNet-64               [-1, 5, 512]               0\n          DecoderLayer-65              [-1, 8, 5, 5]               0\n                Linear-66               [-1, 5, 512]         262,656\n                Linear-67               [-1, 5, 512]         262,656\n                Linear-68               [-1, 5, 512]         262,656\n    MultiHeadAttention-69              [-1, 8, 5, 5]               0\n                Linear-70               [-1, 5, 512]         262,656\n                Linear-71               [-1, 5, 512]         262,656\n                Linear-72               [-1, 5, 512]         262,656\n    MultiHeadAttention-73              [-1, 8, 5, 5]               0\n                Conv1d-74              [-1, 2048, 5]       1,050,624\n                Conv1d-75               [-1, 512, 5]       1,049,088\n PoswiseFeedForwardNet-76               [-1, 5, 512]               0\n          DecoderLayer-77              [-1, 8, 5, 5]               0\n                Linear-78               [-1, 5, 512]         262,656\n                Linear-79               [-1, 5, 512]         262,656\n                Linear-80               [-1, 5, 512]         262,656\n    MultiHeadAttention-81              [-1, 8, 5, 5]               0\n                Linear-82               [-1, 5, 512]         262,656\n                Linear-83               [-1, 5, 512]         262,656\n                Linear-84               [-1, 5, 512]         262,656\n    MultiHeadAttention-85              [-1, 8, 5, 5]               0\n                Conv1d-86              [-1, 2048, 5]       1,050,624\n                Conv1d-87               [-1, 512, 5]       1,049,088\n PoswiseFeedForwardNet-88               [-1, 5, 512]               0\n          DecoderLayer-89              [-1, 8, 5, 5]               0\n                Linear-90               [-1, 5, 512]         262,656\n                Linear-91               [-1, 5, 512]         262,656\n                Linear-92               [-1, 5, 512]         262,656\n    MultiHeadAttention-93              [-1, 8, 5, 5]               0\n                Linear-94               [-1, 5, 512]         262,656\n                Linear-95               [-1, 5, 512]         262,656\n                Linear-96               [-1, 5, 512]         262,656\n    MultiHeadAttention-97              [-1, 8, 5, 5]               0\n                Conv1d-98              [-1, 2048, 5]       1,050,624\n                Conv1d-99               [-1, 512, 5]       1,049,088\nPoswiseFeedForwardNet-100               [-1, 5, 512]               0\n         DecoderLayer-101              [-1, 8, 5, 5]               0\n               Linear-102               [-1, 5, 512]         262,656\n               Linear-103               [-1, 5, 512]         262,656\n               Linear-104               [-1, 5, 512]         262,656\n   MultiHeadAttention-105              [-1, 8, 5, 5]               0\n               Linear-106               [-1, 5, 512]         262,656\n               Linear-107               [-1, 5, 512]         262,656\n               Linear-108               [-1, 5, 512]         262,656\n   MultiHeadAttention-109              [-1, 8, 5, 5]               0\n               Conv1d-110              [-1, 2048, 5]       1,050,624\n               Conv1d-111               [-1, 512, 5]       1,049,088\nPoswiseFeedForwardNet-112               [-1, 5, 512]               0\n         DecoderLayer-113              [-1, 8, 5, 5]               0\n               Linear-114               [-1, 5, 512]         262,656\n               Linear-115               [-1, 5, 512]         262,656\n               Linear-116               [-1, 5, 512]         262,656\n   MultiHeadAttention-117              [-1, 8, 5, 5]               0\n               Linear-118               [-1, 5, 512]         262,656\n               Linear-119               [-1, 5, 512]         262,656\n               Linear-120               [-1, 5, 512]         262,656\n   MultiHeadAttention-121              [-1, 8, 5, 5]               0\n               Conv1d-122              [-1, 2048, 5]       1,050,624\n               Conv1d-123               [-1, 512, 5]       1,049,088\nPoswiseFeedForwardNet-124               [-1, 5, 512]               0\n         DecoderLayer-125              [-1, 8, 5, 5]               0\n              Decoder-126              [-1, 8, 5, 5]               0\n               Linear-127                 [-1, 5, 7]           3,584\n=======================================================================\nTotal params: 39,396,352\nTrainable params: 39,390,208\nNon-trainable params: 6,144\n-----------------------------------------------------------------------\n```\n\n3) showing hierarchical summary\n\n```\nTransformer(\n  (encoder): Encoder(\n    (src_emb): Embedding(6, 512), 3,072 params\n    (pos_emb): Embedding(6, 512), 3,072 params\n    (layers): ModuleList(\n      (0): EncoderLayer(\n        (enc_self_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (pos_ffn): PoswiseFeedForwardNet(\n          (conv1): Conv1d(512, 2048, kernel_size=(1,), stride=(1,)), 1,050,624 params\n          (conv2): Conv1d(2048, 512, kernel_size=(1,), stride=(1,)), 1,049,088 params\n        ), 2,099,712 params\n      ), 2,887,680 params\n      (1): EncoderLayer(\n        (enc_self_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (pos_ffn): PoswiseFeedForwardNet(\n          (conv1): Conv1d(512, 2048, kernel_size=(1,), stride=(1,)), 1,050,624 params\n          (conv2): Conv1d(2048, 512, kernel_size=(1,), stride=(1,)), 1,049,088 params\n        ), 2,099,712 params\n      ), 2,887,680 params\n      (2): EncoderLayer(\n        (enc_self_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (pos_ffn): PoswiseFeedForwardNet(\n          (conv1): Conv1d(512, 2048, kernel_size=(1,), stride=(1,)), 1,050,624 params\n          (conv2): Conv1d(2048, 512, kernel_size=(1,), stride=(1,)), 1,049,088 params\n        ), 2,099,712 params\n      ), 2,887,680 params\n      (3): EncoderLayer(\n        (enc_self_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (pos_ffn): PoswiseFeedForwardNet(\n          (conv1): Conv1d(512, 2048, kernel_size=(1,), stride=(1,)), 1,050,624 params\n          (conv2): Conv1d(2048, 512, kernel_size=(1,), stride=(1,)), 1,049,088 params\n        ), 2,099,712 params\n      ), 2,887,680 params\n      (4): EncoderLayer(\n        (enc_self_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (pos_ffn): PoswiseFeedForwardNet(\n          (conv1): Conv1d(512, 2048, kernel_size=(1,), stride=(1,)), 1,050,624 params\n          (conv2): Conv1d(2048, 512, kernel_size=(1,), stride=(1,)), 1,049,088 params\n        ), 2,099,712 params\n      ), 2,887,680 params\n      (5): EncoderLayer(\n        (enc_self_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (pos_ffn): PoswiseFeedForwardNet(\n          (conv1): Conv1d(512, 2048, kernel_size=(1,), stride=(1,)), 1,050,624 params\n          (conv2): Conv1d(2048, 512, kernel_size=(1,), stride=(1,)), 1,049,088 params\n        ), 2,099,712 params\n      ), 2,887,680 params\n    ), 17,326,080 params\n  ), 17,332,224 params\n  (decoder): Decoder(\n    (tgt_emb): Embedding(7, 512), 3,584 params\n    (pos_emb): Embedding(6, 512), 3,072 params\n    (layers): ModuleList(\n      (0): DecoderLayer(\n        (dec_self_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (dec_enc_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (pos_ffn): PoswiseFeedForwardNet(\n          (conv1): Conv1d(512, 2048, kernel_size=(1,), stride=(1,)), 1,050,624 params\n          (conv2): Conv1d(2048, 512, kernel_size=(1,), stride=(1,)), 1,049,088 params\n        ), 2,099,712 params\n      ), 3,675,648 params\n      (1): DecoderLayer(\n        (dec_self_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (dec_enc_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (pos_ffn): PoswiseFeedForwardNet(\n          (conv1): Conv1d(512, 2048, kernel_size=(1,), stride=(1,)), 1,050,624 params\n          (conv2): Conv1d(2048, 512, kernel_size=(1,), stride=(1,)), 1,049,088 params\n        ), 2,099,712 params\n      ), 3,675,648 params\n      (2): DecoderLayer(\n        (dec_self_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (dec_enc_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (pos_ffn): PoswiseFeedForwardNet(\n          (conv1): Conv1d(512, 2048, kernel_size=(1,), stride=(1,)), 1,050,624 params\n          (conv2): Conv1d(2048, 512, kernel_size=(1,), stride=(1,)), 1,049,088 params\n        ), 2,099,712 params\n      ), 3,675,648 params\n      (3): DecoderLayer(\n        (dec_self_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (dec_enc_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (pos_ffn): PoswiseFeedForwardNet(\n          (conv1): Conv1d(512, 2048, kernel_size=(1,), stride=(1,)), 1,050,624 params\n          (conv2): Conv1d(2048, 512, kernel_size=(1,), stride=(1,)), 1,049,088 params\n        ), 2,099,712 params\n      ), 3,675,648 params\n      (4): DecoderLayer(\n        (dec_self_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (dec_enc_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (pos_ffn): PoswiseFeedForwardNet(\n          (conv1): Conv1d(512, 2048, kernel_size=(1,), stride=(1,)), 1,050,624 params\n          (conv2): Conv1d(2048, 512, kernel_size=(1,), stride=(1,)), 1,049,088 params\n        ), 2,099,712 params\n      ), 3,675,648 params\n      (5): DecoderLayer(\n        (dec_self_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (dec_enc_attn): MultiHeadAttention(\n          (W_Q): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_K): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n          (W_V): Linear(in_features=512, out_features=512, bias=True), 262,656 params\n        ), 787,968 params\n        (pos_ffn): PoswiseFeedForwardNet(\n          (conv1): Conv1d(512, 2048, kernel_size=(1,), stride=(1,)), 1,050,624 params\n          (conv2): Conv1d(2048, 512, kernel_size=(1,), stride=(1,)), 1,049,088 params\n        ), 2,099,712 params\n      ), 3,675,648 params\n    ), 22,053,888 params\n  ), 22,060,544 params\n  (projection): Linear(in_features=512, out_features=7, bias=False), 3,584 params\n), 39,396,352 params\n\n```\n\n\n\n## Reference\n\n```python\ncode_reference = { 'https://github.com/pytorch/pytorch/issues/2001',\n\t\t\t\t'https://gist.github.com/HTLife/b6640af9d6e7d765411f8aa9aa94b837',\n\t\t\t\t'https://github.com/sksq96/pytorch-summary',\n\t\t\t\t'Inspired by https://github.com/sksq96/pytorch-summary'}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraykode%2Fmodelsummary","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgraykode%2Fmodelsummary","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraykode%2Fmodelsummary/lists"}