{"id":20007186,"url":"https://github.com/cheind/mingru","last_synced_at":"2025-09-20T01:32:45.838Z","repository":{"id":261403767,"uuid":"884192795","full_name":"cheind/mingru","owner":"cheind","description":"Torch MinGRU implementation based on \"Were RNNs All We Needed?\"","archived":false,"fork":false,"pushed_at":"2024-12-05T09:42:10.000Z","size":585,"stargazers_count":7,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-05T10:29:46.993Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cheind.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-06T10:13:08.000Z","updated_at":"2024-12-05T09:42:16.000Z","dependencies_parsed_at":"2024-11-06T11:47:42.490Z","dependency_job_id":"c11fb30c-28ac-43ff-bd61-0dc42b12ea71","html_url":"https://github.com/cheind/mingru","commit_stats":null,"previous_names":["cheind/mingru"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cheind%2Fmingru","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cheind%2Fmingru/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cheind%2Fmingru/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cheind%2Fmingru/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cheind","download_url":"https://codeload.github.com/cheind/mingru/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233612498,"owners_count":18702591,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T06:14:58.634Z","updated_at":"2025-09-20T01:32:45.463Z","avatar_url":"https://github.com/cheind.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# torch-mingru\nPyTorch (convolutional) MinGRU implementation based on \n\n\u003e Feng, Leo, et al. \"Were RNNs All We Needed?\" (2024).\n\nConvolutional MinGRU based on\n\n\u003e Heindl, Christoph et al. \"Convolutional MinGRU\" (2024).\n\n## Features\nIn alignment with torch recurrent modules, **mingru** provides the following core modules\n - `mingru.MinGRUCell` single layer MinGRU\n - `mingru.MinGRU` multi-layer stacked MinGRU \n - `mingru.MinConv2dGRUCell` single layer convolutional MinGRU\n - `mingru.MinConv2dGRU` multi-layer stacked convolutional MinGRU\n\nEach module supports the following features (if applicable to type)\n - **Parallel**: Efficient log-space parallel evaluation support plus sequential support for testing. Automatically dispatches to the most efficient implementation.\n - **Multilayer**: Stack multiple MinGRU layers via `hidden_sizes=` arguments. When `len(hidden_sizes)\u003e1`, the output hidden states of layer $i$ are passed as inputs to $i+1$. Varying hidden sizes are supported.\n - **Dropout**: Via parameter `dropout=`, when \u003e 0 all inputs of each layer are effected except for the last layer.\n - **Residual**: Residual connections betweeen outputs of minGRU layers via `residual=` argument.\n - **Bias**: Biases in linear layers can be enabled and disabled via the `bias=` argument.\n - **Bidirectional**: Bi-directional processing can be enabled by wrapping RNNs via `mingru.Bidirectional`.\n - **Normalization**: LayerNorm and GroupNorms between stacked MinGRUs via `norm=`argument.\n - **Scripting**: MinGRU is compatible with `torch.jit.script`.\n - **Compatibility**: Interface of `mingru.*` is mostly compatible with that of `torch.nn.GRU/GRUCell`, except that and sequence-first arguments are not supported and bi-directional is provided by `mingru.Bidirectional` wrapper. Cells in **mingru** also support sequence arguments to benefit from parallel computation.\n\n## Installation\n\n```shell\n# Install directly from github\npip install git+https://github.com/cheind/mingru.git\n```\n\n## Usage\n\n### MinGRU\n\nThe following snippet demonstrates a multi-layer stacked MinGRU.\n\n```python\nimport torch\nimport mingru\n\n# Instantiate\nB, input_size, hidden_sizes, S = 10, 3, [32, 64], 128\nrnn = mingru.MinGRU(\n    input_size=input_size,\n    hidden_sizes=hidden_sizes,\n    dropout=0.0,\n    residual=True,\n).eval()\n\n# Invoke for input x with sequence length S and batch-size B\n# This will implicitly assume a 'zero' hidden state\n# for each layer.\nx = torch.randn(B, S, input_size)\nout, h = rnn(x)\nassert out.shape == (B, S, 64)\nassert h[0].shape == (B, 1, 32)\nassert h[1].shape == (B, 1, 64)\n\n# Invoke with initial/previous hidden states.\nh = rnn.init_hidden_state(x)\nout, h = rnn(torch.randn(B, S, input_size), h=h)\n\n# Sequential prediction pattern\nh = rnn.init_hidden_state(x)\nout_seq = []\nfor i in range(x.shape[1]):\n    out, h = rnn(x[:, i : i + 1], h=h)\n    out_seq.append(out)\nout_seq = torch.cat(out_seq, 1)\nassert out_seq.shape == (B, S, 64)\n\n# Parallel prediction pattern\nout_par, h = rnn(x, rnn.init_hidden_state(x))\nassert torch.allclose(out_seq, out_par, atol=1e-4)\n```\n\n### MinConv2dGRU\n\nFollowing sample demonstrates convolutional multi-layer stacked MinGRUs.\n\n\n```python\nimport torch\nimport mingru\n\nB, S = 5, 10\ninput_size = 3\nhidden_sizes = [16, 32, 64]\nkernel_sizes = [3, 3, 3]\npadding = 1\nstride = 2\n\nrnn = mingru.MinConv2dGRU(\n    input_size=input_size,\n    hidden_sizes=hidden_sizes,\n    kernel_sizes=kernel_sizes,\n    paddings=padding,\n    strides=stride,\n    dropout=0.0,\n    residual=True,\n).eval()\n\n# Invoke for input x with sequence length S and batch-size B\n# This will implicitly assume a 'zero' hidden state\n# for each layer.\nx = torch.randn(B, S, input_size, 64, 64)\nout, h = rnn(x)\nassert out.shape == (B, S, 64, 8, 8)\nassert h[0].shape == (B, 1, 16, 32, 32)\nassert h[1].shape == (B, 1, 32, 16, 16)\nassert h[2].shape == (B, 1, 64, 8, 8)\n\n# Invoke with initial/previous hidden states.\nh = rnn.init_hidden_state(x)\nout, h = rnn(x, h=h)\n\n# Sequential prediction pattern\nh = rnn.init_hidden_state(x)\nout_seq = []\nfor i in range(x.shape[1]):\n    out, h = rnn(x[:, i : i + 1], h=h)\n    out_seq.append(out)\nout_seq = torch.cat(out_seq, 1)\nassert out_seq.shape == (B, S, 64, 8, 8)\n\n# Parallel prediction pattern\nout_par, h = rnn(x, rnn.init_hidden_state(x))\nassert torch.allclose(out_seq, out_par, atol=1e-4)\n```\n\n### Examples\n\n#### Selective Copying\nFor a more complete example check the [examples/selective_copying.py](./examples/selective_copying.py), which attempts to learn to selectively pick specific tokens in order from a generated sequence.\n\n```shell\npython -m examples.selective_copying\n    ...\n    Step [1941/2000], Loss: 0.0002, Accuracy: 99.61%\n    Step [1961/2000], Loss: 0.0002, Accuracy: 100.00%\n    Step [1981/2000], Loss: 0.0002, Accuracy: 99.61%\n    Validation Accuracy: 100.00%\n```\n\nPer default, the example is configured for a small usecase (sequence length 64, vocab size 6, memorize 4), but you might just change to a much larger test by adopting `cfg` dict at the end of the file.\n\nTask is based on\n\u003e Gu, Albert, and Tri Dao. \"Mamba: Linear-time sequence modeling with selective state spaces.\" (2023).\n\n#### Video Classification\nTrains a video classification network using convolutional MinGRUs from scratch using UCF101 train/test splits. Mimicks the\n(first) architecture of \n\n\u003e Ballas, Nicolas, Li Yao1 Chris Pal, and Aaron Courville. \"Delving deeper into convolution networks for learning video representation.\" (2015).\n\nOn fold 1 this achieves a validation top-1 accuracy 95% and 78% on test, which replicates the results from the paper. The architecture uses a VGG16 backbone trained on ImageNet. One can expect better test results when pre-training is done on larger video action datasets.\n\nFirst, register these environment variables\n\n```shell\n# Set path to UCF dataset and annotations\nexport UCF101_PATH=/path/to/UCF/dir\nexport UCF101_ANNPATH=/path/to/ann/dir\n```\n\n##### Train\n\n```shell\npython -m examples.video_classification train -f 1\n    ...\n    2024-12-01 07:53:26,868: Epoch 7, Step 75961, Loss: 0.0042, Accuracy: 100.00%\n    2024-12-01 07:53:43,763: Epoch 7, Step 75981, Loss: 0.1159, Accuracy: 93.75%\n    2024-12-01 07:54:05,992: Epoch 7, Step 76000, Validation Accuracy: 99.50%, Validation Loss: 0.00\n```\n\n##### Test\n\nTest protocol is based on Paper using 25 clips from each video and perform average/majority voting\n\n```shell\npython -m examples.video_classification test -f 1 tmp/video_classifier_best.pt\n    ...\n    2024-12-01 08:19:27,585: Acc: 0.7048961511382305\n    2024-12-01 08:19:27,762: Acc: 0.7047927727099405\n    2024-12-01 08:19:27,799: Test accuracy 0.70\n```\n\n#### Generative Predictive Text \n\nTrains and samples from a GPT2-like model, but uses stacked MinGRUs instead of transformers. Adapted from \n[nanoGPT](https://github.com/karpathy/nanoGPT).\n\n##### Train\nDataset is currently restricted to a single text file. We use [Tiny-Shakespeare](https://huggingface.co/datasets/Trelis/tiny-shakespeare)\n\n```shell\npython -m examples.nlp train tmp/tinyshakespeare.txt\n```\n\n##### Sample\n```shell\npython -m examples.nlp sample --num-tokens 512 tmp/tinyshakespeare.nlp_best.pt\n\n    ISABELLA:\n    One of my sister must confess come,\n    And two spain under mine honour humbly out:\n    Yea, you'll be made in wicked Pompe. What, ho!\n    This is a gallful device shall rise.\n    I do beseech you, gentle my lord,\n    And bring him well, and nothing but my life,\n    But your beauty knows stands with your beauty,\n    In your mistress and your brother come.\n    ...\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcheind%2Fmingru","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcheind%2Fmingru","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcheind%2Fmingru/lists"}