Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hughperkins/rnn-notes
Notes on how element-research rnn works
https://github.com/hughperkins/rnn-notes
Last synced: 20 days ago
JSON representation
Notes on how element-research rnn works
- Host: GitHub
- URL: https://github.com/hughperkins/rnn-notes
- Owner: hughperkins
- License: bsd-2-clause
- Created: 2016-01-06T00:56:11.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2016-01-06T03:27:28.000Z (almost 9 years ago)
- Last Synced: 2023-03-10T19:25:58.647Z (over 1 year ago)
- Size: 5.86 KB
- Stars: 17
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RNN Notes
Notes on how [rnn](https://github.com/element-research/rnn) works. Mostly derivatives from the rnn documentation, but supplemented a bit by reading the source code. Note that Colah's blog post on LSTM is really good http://colah.github.io/posts/2015-08-Understanding-LSTMs/ .
## Training process flow
### In absence of rnn
- we have a network, in this case an nn.Sequential
- contains an nn.Linear and an nn.LogSoftMax
- we train on each pair by doing:
- net:forward(input)
- (get gradOutput from criterion; criterion works same for backwardsonline or non-rnn usage)
- net:backward(input, gradOutput)
- net:updateParameters(learningRate)
- net:forward will do:
```
Sequential[Module]:forward (notation: lowest-level class [class containing method]:method name)
=> Sequential[Sequential]:updateOutput
=> Linear[Linear]:updateOutput (calcs output)
=> LogSoftMax[LogSoftMax]:update Output (calcs output)
```
- net:backward will do:
```
Sequential[Module]:backward
=> Sequential[Sequential]:updateGradInput
=> LogSoftMax[LogSoftMax]:updateGradInput (calc gradInput)
=> Linear[Linear]:updateGradInput (calc gradInput)
=> Sequential[Sequential]:accGradParameters
=> Linear[Linear]:accGradPararameters (calc gradWeight, gradBias)
```
- netUpdateParameters(learningRate) will do:
```
Sequential[Container]:updateParameters
=> Linear[Module]:updateParameters (updates weights, bias)
```### Using backwardsonline
- training looks like:
```
net:train()
net:forget()
net:backwardsonline()
for s=1,seqLength do
outputs[s] = net:forward(inputs[s])
end
for s=seqLength,1,-1 do
-- (get gradOutput from criteria, based on inputs[s] and targets[s]
-- invariant with non-rnn case, so we ignore it here)
net:backward(inputs[s], gradOutputs[s])
end
net:updateParameters(learningRate)
```## Class hierarchies
### nn
```
Sequential => Container => Module
Linear => Module
LogSoftMax => ModuleSequential:
updateOutput modules:updateOutput
updateGradInput modules:updateGradInput
accUpdateGradParameters modules:accUpdateGradParameters
accGradparameters modules:accGradParameters
backward modules:backwardContainer:
zeroGradParameters modules:zeroGradParameters
updateParameters modules:updateParameters
parameters concatenate modules:parameters
training modules:training
evaluate modules:evaluate
applyToModulesModule:
updateOutput return self.output
updateGradInput return self.gradInput
accGradParameters nothing
accUpdateGradParameters self:accGradParameters
forward self:updateOutput
backward self:updateGradInput, self:accGradParameters
backwardUpdate self:updateGradInput, self:accUpdateGradParameters
zeroGradParameters zeros parameters()
updateParameters adds learningrate * self.parameters()[2] to parameters()[1]
training self.train = true
evaluate self.train = false
clone clone, via serialize to memory file
flatten flattens all parameters from self and children into single storage
getParametersLinear:
updateOutput calc output
updateGradInput calc gradInput
addGradParameters calc gradWeight, gradBiasLogSoftMax:
updateOutput
updateGradInput
```### rnn
Adds some methods to existing `nn` classes:
```
Module:
forget modules:forget
remember modules:remember
backwardThroughTime modules:backwardThroughTime
backwardOnline modules:backwardOnline
maxBPTTstep modules:maxBPTTstep
stepClone return self:sharedClone()
```New classes:
```
Sequencer => AbstractSequencer => Container => Module
Recursor => AbstractRecurrent => Container => Module
LSTM => AbstractRecurrent => Container => Module
``````
Recursor:
__init(module) self.recurrentModule = module; self.modules = {module}
updateOutput getStepModule (clones self.recurrentModule),
stores input, stores output, increments step
backwardThroughTime
updateGradInputThroughTime
accUpdateGradParametersThroughTime
sharedClone return self
backwardOnline AbstractRecurrent.backwardOnline
forgetAbstractRecurrent:
getStepModule(step) calls self.recurrentModule:stepClone(), stores in self.sharedClones
- stepClone is in `Module`
- for LSTM, is a nop, returns self
maskZero
updateGradInput self:updateGradInputThroughTime(self.updateGradInputStep, 1)
decrement self.updateGradInputStep
accGradParameters
backwardThroughTime nop
updateGradInputThroughTime nop
accGradParametersThroughTime nop
accUpdateGradParametersThroughTime nop
backwardUpdateThroughTime(lr)
updateParameters(lr)
recycle(offset)
forget(offset)
includingSharedClones(f)
type(type)
training
evaluate
backwardOnlineLSTM:
__init self.recurrentModule = self:buildModel()
buildModel
updateOutput self.outputs[step], self.cells[step] = self.recurrentModel:updateOutput(
input, self.outputs[step-1], selfcells[step-1])
backwardThroughTime
updateGradInputThroughTime(step) self.gradInputs[maxSteps-step], self.gradPrevOutput, self.gradCells[step-2] =
self:getStepModule(step):updateGradInput(
{self.inputs[step-1], self.outputs[step-2], self.cells[step-2]},
{self.gradOutputs[step-1], self.gradCells[step-1]})
accGradParametersThroughTime
accUpdateGradParameters
```