{"id":13639179,"url":"https://github.com/tkocisky/oxnn","last_synced_at":"2025-04-19T22:31:43.399Z","repository":{"id":30405877,"uuid":"33958665","full_name":"tkocisky/oxnn","owner":"tkocisky","description":null,"archived":false,"fork":false,"pushed_at":"2016-04-23T19:02:24.000Z","size":71,"stargazers_count":128,"open_issues_count":0,"forks_count":15,"subscribers_count":14,"default_branch":"master","last_synced_at":"2024-08-03T01:14:08.328Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tkocisky.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-04-14T21:44:30.000Z","updated_at":"2023-10-06T08:29:47.000Z","dependencies_parsed_at":"2022-09-08T09:02:47.622Z","dependency_job_id":null,"html_url":"https://github.com/tkocisky/oxnn","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tkocisky%2Foxnn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tkocisky%2Foxnn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tkocisky%2Foxnn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tkocisky%2Foxnn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tkocisky","download_url":"https://codeload.github.com/tkocisky/oxnn/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223810287,"owners_count":17206729,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T01:00:58.421Z","updated_at":"2024-11-09T09:30:40.977Z","avatar_url":"https://github.com/tkocisky.png","language":"Lua","funding_links":[],"categories":["Software Package"],"sub_categories":[],"readme":"# oxnn -- Oxford NN Library\n\nThis library contains extensions to the Torch nn and cunn libraries. This is\ndevelopment code and is not fully tested.\n\n## Highlights\n\n  * RNNs\n    * [oxnn.SequenceOfWords](rnn/SequenceOfWords.lua) - Deep RNN class for\n      sentences. It can handle batches where sentence lengths vary across and\n      within batches, with output and loss appropriately masked.\n    * Optimized LSTM cell [oxnn.ModelUtil.LSTM12cl](util/ModelUtil.lua),\n      [oxnn.LSTM12Part2](nn/LSTM12Part2.lua).\n    * [oxnn.RecurrentPropagator](rnn/RecurrentPropagator.lua) - Module for\n      executing custom computations graphs, which is useful for RNNs. It handles\n      cloning and weight sharing of modules that are used multiple times.  Each\n      batch can have a different computation graph,\n  * NN\n    * [oxnn.LinearBlockDiagonal](nn/LinearBlockDiagonal.lua)\n    * [oxnn.LinearCAddInplace](nn/LinearCAddInplace.lua)\n    * [oxnn.LogSoftMaxInplace](nn/LogSoftMaxInplace.lua)\n    * [oxnn.NarrowTable](nn/NarrowTable.lua) - multiple narrows on one tensor.\n    * [oxnn.NoAccGradParameters](nn/NoAccGradParameters.lua) - do not train\n      parameters of a module.\n    * [oxnn.VecsToVecs](nn/VecsToVecs.lua) - maps vectors to vectors, given and\n      retuned as a table or a tensor of batch of vectors for each time step.\n  * Text\n    * [oxnn.Vocabulary](text/Vocabulary.lua)\n    * [oxnn.TextUtil](text/TextUtil.lua)\n\nDocumentation can be found at the top of the files, and we provide some\n(examples)[examples/].\n\n## License\n\nWe release this code under the BSD license (see the [LICENSE](LICENSE) file).\nSome of the files are modification of files from nn or cunn. File\n[util/cloneManyTimes.lua](util/cloneManyTimes.lua) has a separate license.\n\n## Installation\n\nClone this code into your `$HOME` directory and run the `./build.sh` command.\n\nFor your .bashrc:\n```bash\n# substitute $HOME with the path to the cloned repository\nexport LUA_PATH=\"$HOME/?/init.lua;$HOME/?.lua;$LUA_PATH\"\nexport LUA_CPATH=\"$HOME/oxnn/?.so;$HOME/?.so;$LUA_CPATH\"\n```\n\nTo test the installation try running the tests and the examples.\n\n\n## Short examples\n\n### oxnn.SequenceOfWords\n\nTwo layer LSTM:\n```lua\nlstm = oxnn.SequenceOfWords{\n   lookuptable = nn.Sequential():add(nn.LookupTable(10, 128))\n                                :add(nn.SplitTable(2)),\n   recurrent = { oxnn.ModelUtil.LSTMCell12cl(128, true),  -- layer 1\n                 oxnn.ModelUtil.LSTMCell12cl(128, true) },-- layer 2\n   output =\n            nn.Sequential()\n               :add(nn.Linear(128, 10))\n               :add(oxnn.LogSoftMaxInplace(true,true)),\n   loss = 'nllloss',\n   layers = 2\n}\n\npad = 10\n-- batch of 2 sequences of lengths 4 and 3\ninput = { { { torch.zeros(2,128), torch.zeros(2,128) }, -- initial state layer 1\n            { torch.zeros(2,128), torch.zeros(2,128) } }, -- initial state layer 2\n          torch.Tensor{ { 1, 7, 9, 8   },   -- sentence 1\n                        { 2, 3, 5, pad } }, -- sentence 2\n          { 4, 3 }  -- sentence lengths\n        }\n\n\nprint(lstm:forward(input))\n{\n  1 :\n    {\n      1 :\n        {\n          1 : DoubleTensor - size: 2x128   -- last recurrent state for layer 1\n          2 : DoubleTensor - size: 2x128   -- not including the padding step\n        }\n      2 :\n        {\n          1 : DoubleTensor - size: 2x128   -- last recurrent state for layer 2\n          2 : DoubleTensor - size: 2x128   -- not including the padding step\n        }\n    }\n  2 : 2.3162038392024                      -- NLL/(3+2)  (per output token);\n                                           -- as loss gradOutput pass 0\n}\n```\n\nFor more advanced examples see in the `examples/` directory.\n\n### oxnn.RecurrentPropagator\n\nWe will implement a one layer LSTM that takes batches of sequences. Sequences\nwithin a batch need to have the same lengths, but can change across batches.\n\n```lua\nlocal cuda = false\nrequire 'oxnn'\nif cuda then oxnn.InitCuda() end\n\n-- We implement a simple LSTM\n--\n-- Predict  B       C\n--          ^       ^\n--          |       |\n-- init -\u003e Cell -\u003e Cell -\u003e Cell -\u003e representation\n--          ^       ^       ^\n--          |       |       |\n--          A       B       C\n\nlookuptable = nn.Sequential():add(nn.LookupTable(10, 128))\n                             :add(nn.SplitTable(2))\nrecurrent = oxnn.ModelUtil.LSTMCell12cl(128, true)\noutput = nn.Sequential():add(nn.Linear(128, 10))\n                        :add(oxnn.LogSoftMaxInplace(true,true))\ncriterion = oxnn.CriterionTable(nn.ClassNLLCriterion())\ntargets = nn.Sequential():add(nn.SplitTable(2))\n\nrp = oxnn.RecurrentPropagator()\nrp, mod_lt = rp:add(lookuptable)  -- modifies rp inplace\nrp, mod_rec = rp:add(recurrent)\nrp, mod_out = rp:add(output)\nrp, mod_crit = rp:add(criterion)\nrp, mod_targ = rp:add(targets)\n\nrp._cg = function(batch, type)\n   -- This function creates a computation graph for the given batch (i.e.\n   -- input); it is called for each input. The modules are not executed during\n   -- the run of this function.  type is the last type we used to type the\n   -- RecurrentPropagator with.\n\n   local edges = {}                                 -- the edges of the CG\n   local r = oxnn.RPUtil.CGUtil(edges)              -- helper functions\n\n   -- We will assume the batch has the same format as above example, but for one\n   -- layer.\n   local len = batch[2]:size(2)\n\n   -- The outputs are stored in virtual stacks. We create a stack. The string\n   -- argument shows only when debugging.\n   -- (For debugging set RecurrentPropagator.debug to 1 or 3.\n   --\n   -- When storing we need specify the index, where inputs[1] would be the top\n   -- of the stack.\n   local inputs = r.S('inputs')\n   -- Add the first edge to the edges table.\n   r.E { r.i(2), mod_lt, inputs[{len,1}] }\n   -- Output of the lookuptable is a table with input for each time step.\n   -- inputs[{len,1}] is equivalent to {inputs[len],inputs[len-1]...inputs[1]}.\n   -- (Note that we can store only at the top of the stack, however, when we are\n   -- storing multiple values, we need to index them with contiguous decreasing\n   -- indices up to 1. This is so that on backward pass we can reverse graph.)\n   --\n   -- Inputs from the batch are accessed similarly. The elements of the input\n   -- table correspond the r.i(1), r.i(2),... Each input is regarded also as a\n   -- stack and needs to be indexed, e.g. r.i(1)[2]; except in a special case\n   -- when batch[j] is a tensor, then we can use simply r.i(j), as we did above.\n\n   local rec_state = r.S('rec_state')\n   r.E { r.i(1)[1], nn.Identity(), rec_state[1] }  -- initial LSTM state\n   -- As a module in an edge we usually use the module string returned when\n   -- adding a module to the RecurrentPropagator (here mod_lt,...); we can\n   -- also use a new module, but this module is created anew for each batch, and\n   -- it's particularly bad if it allocates memory. This is not an issue with\n   -- nn.Identity().\n\n   local outputs = r.S('outputs')\n   for i=len,1,-1 do\n      r.E { { rec_state[1], inputs[i] },\n            mod_rec,\n            { rec_state[1], outputs[1] } }\n      -- Each time mod_rec is used, a clone that shares the parameters of the\n      -- original is used (and reused for subsequent batches).\n      -- We take the appropriate input and store the output on the top of the\n      -- stack. Note that rec_state[1] on both input and output is a table of\n      -- two elements: h and c (hidden layers of the LSTM).\n   end\n\n   local after_out = r.S('after_out')\n   for i=len,2,-1 do\n      r.E { outputs[i], mod_out, after_out[1] }\n   end\n\n   local targets_all = r.S('targets_all')\n   -- To demonstrate r.Split\n   r.E { r.i(2), mod_targ, targets_all[1] }\n   local targets = r.Split(targets_all[1], len)\n   -- this is equivalent to\n   -- r.E { targets_all[1], nn.Identity(), targets[{len,1}] }\n   -- where targets is a new stack. We could have \"saved\" to multiple stack\n   -- places directly.\n\n   local losses = r.S('loss')\n   --target of ouput of first time step (after_out[len-1]) is the input of \n   --second timestep(target[len-1]), and so forth\n   for i=len-1,1,-1 do\n      r.E { { after_out[i], targets[i] },\n            mod_crit,\n            losses[1] }\n   end\n\n   -- Some of the computed values are not used and expect a gradient flowing\n   -- back so we put zero loss on them. The only module allowed without a\n   -- gradient flowing back is oxnn.CriterionTable .\n   r.E { {outputs[1],nil}, oxnn.CriterionTable(oxnn.ZeroLoss()), r.S('0')[1] }\n   r.E { {rec_state[1],nil}, oxnn.CriterionTable(oxnn.ZeroLoss()), r.S('0')[1] }\n   r.E { {targets[len],nil}, oxnn.CriterionTable(oxnn.ZeroLoss()), r.S('0')[1] }\n   -- Ideally, we would add the above modules to the RecurrentPropagator since\n   -- they allocate memory; this way they do it for each batch.\n\n   local lengths = {}\n   for i=1,batch[2]:size(1) do table.insert(lengths, len) end\n   r.E { {losses[{len-1,1}], nil},  -- table of table of losses since\n                                    -- CriterionTable expects a table.\n         oxnn.CriterionTable(oxnn.SumLosses(true, lengths)),  -- sum the losses\n                                                              -- and average\n         r.S('final output')[1] }\n   -- output from RecurrentPropagator is the output of the last edge/module.\n\n   return edges\nend\n\n-- batch of 2 sequences of lengths 4 and 4\ninput = { { { torch.zeros(2,128), torch.zeros(2,128) } },\n          torch.Tensor{ { 1, 7, 9, 8 },   -- sentence 1\n                        { 2, 3, 5, 6 } }, -- sentence 2\n          { 4, 3 }  -- sentence lengths\n        }\n\nif cuda then\n   rp:cuda()\n   input[1][1][1] = input[1][1][2]:cuda()\n   input[1][1][2] = input[1][1][2]:cuda()\n   input[2] = input[2]:cuda()\nend\n\nprint(rp:forward(input))\nprint(rp:backward(input, 0))  -- 0 since output if only a number and the loss\n                              -- does not require a gradient.\n```\n\nFor more advanced example see the implementation of\n[oxnn.SequenceOfWords](rnn/SequenceOfWords.lua).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftkocisky%2Foxnn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftkocisky%2Foxnn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftkocisky%2Foxnn/lists"}