Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hal3/macarico

learning to search in pytorch
https://github.com/hal3/macarico

Last synced: about 2 months ago
JSON representation

learning to search in pytorch

Awesome Lists containing this project

README

        

![maçarico](resources/macarico.png?raw=true "maçarico")

# maçarico

An implementation of the imperative learning to search framework [1]
in pytorch, compatible with automatic differentiation, for deep
learning-based structured prediction and reinforcement learning.

[1] http://hal3.name/docs/daume16compiler.pdf

The basic structure is:

macarico/
base.py defines the abstract classes used for maçarico,
such as Env, Policy, Features, Learner, Attention

annealing.py tools for annealing, useful for eg DAgger

util.py basic utility functions

tasks/ example tasks, such as: sequence_labeler,
dependency_parser, sequence2sequence, etc. all of
these define an Env that can be run

features/ contains example static features and dynamic features
sequence.py defines two types of static features: RNNFeatures
(obtained by running an RNN over the input) and
BOWFeatures (simple bag of words). also defines
useful attention models over sequences.

actor.py defines two types of dynamic features: TransitionRNN
(which is an actor that has an RNN-like hidden state),
and TransitionBOW (which has no hidden state and
instead just conditions on the previous actions)

policies/ currently only implements a linear policy

lts/ various learning to search algorithms, such as:
maximum_likelihood, dagger, reinforce,
aggrevate and LOLS

tests/
run_tests.sh run all (or some) of the tests, compare the outputs
to previous versions to make sure you didn't botch
anything. (*please* run this before pushing changes.)

test_util.py some utilities for running tests, such as train/eval
loops, printint outputs, etc.

nlp_data.py generate or load data for various natural language
processing tasks. (requires external data.)

test_X.py various tests for different parts of maçarico. if you
develop something new, please create a test!

output/ outputs from prvious runs

# To create a new task

Take a look at existing tasks.

Create a new mytask.py file in macarico/tasks that defines:

1. an `Example` class, which contains labeled examples for your
task. This class must define a `mk_env` function that returns an
environment (`Env`) particular to this task.

2. an `Env` class, which defines how your environment works. It must
provide `run_episode` and `loss` at the minimum. For some learning
algorithms, it must provide `rewind` (mostly for efficiency) and/or
`reference`.

3. if none of the existing attention models make sense for your task
(this is the case for e.g. `DependencyParser`), define your own
`Attention` mechanism.

4. make a test case in tests/test_mytask.py that tests it.

# To create new features

There are two types of features: static features (things that can be
precomputed on the input before the environment starts running) and
dynamic features (things that depend on the status of the
environment).

For static features (like `RNNFeatures`), create a class the derives
from `macarico.Features` (and probably also from `nn.Module` if it has
any of its own parameters). At a minimum, this must define its
dimensionality and give a name to itself (called the `field`). This
`field` can then be referenced either by other features or by
`Attention` modules. This should defined a `_forward` method that
computes the static features, and which will be cached automatically
for you. It should return a tensor of dimension (M,dim), where M is
arbitrary (but which must be compatible with `Attention`) and where
dim is the pre-declared dimensionality.

For dynamic features (like `TransitionRNN`), create a class as before.
However, instead of defining the static `_forward` function, you must
define your own dynamic `forward` function. This can peek at
`state.t` and `state.T` to get the current and maximum time step. It
should return features /just/ for the current timestep, `state.t`.

# To create new attention mechanisms

An `Attention` mechanism tells a dynamic model where to look to access
its features. There are type types: hard attention and soft attention.

A hard attention mechanism defines its field (which features it is
attending to) and its arity (how many feature vectors does it attend
to at any given time). Then, at runtime, given a `state`, it must
return the indices into the corresonding fields based on the state,
where the number of indices is exactly equal to its arity.

A soft attention mechanism still defines its field but declares its
arity to be `None`. This means that instead of returning an /index/
into its input, it must return a /distribution/ over its input as a
torch Variable tensor.

# To create new learners

The most basic type of `Learner` basically behaves like a `Policy`,
but additionally provides an `update` function that, for instance,
does backprop.

Perhaps the simplest example is `MaximumLikelihood`, which just
behaves according to a reference policy, but accumulates an objective
function that's the sum of individual predictions. At `update` time,
it runs backprop on this objective.

One *very important thing* in Learners, is that even if they do not
use the return value of their underlying policy, they *must* call the
underlying policy every time they run. Why? Because the underlying
policy may accumulate state (as in the case of `TransitionRNN`) and if
it is "skipped" the policy will become very confused because it will
have missed some input.

A slightly different example is `Reinforce`, which implements the
reinforce RL algorithm. This Learner does not explicitly accumulate an
objective that it then backprops on; instead it uses the fact that
stochastic choices can be backpropped through automatically using
torch's `.reinforce` function.

# Understanding how everything fits together

Because we have designed maçarico to be as modular as possible, there
are some places where the different pieces need to "talk" to each
other.

Let's take `test_sequence_labeler.py` as an example. In `test_demo`,
we have code that looks like:

data = [Example(x, y, n_labels) for x, y in ...]

This constructs `sequence_labeler.Example` data structures and calls
them data. If you look at the `Example` data structure, you find it
has two main components: `tokens` and `labels`, corresponding to `x`
and `y` respectively, above.

Next, we build some *static* features:

features = RNNFeatures(n_types,
input_field = 'tokens',
output_field = 'tokens_feats')

This constructs a biLSTM over the inputs. Where does it look? It looks
in `tokens` because that's the specified input field. And it stores
the features generated by the biLSTM in `tokens_feats`. You can
therefore think of `features` as something that maps from `tokens` to
tokens_feats`. (Note: those two arguments are the default and could
have been left off for convenience, but here we're trying to make
everything explicit.)

Next, we need an actor. The actor is the thing that takes a state of
the world and produces a feature representation. (This feature
representation will later be consumed by the `Policy` to predict an
action.) However, the actor needs to *attend* somewhere when making
predictions. In this case, when the environment (the sequence labeler)
is predicting the label of the `n`th word, the actor should look at
that word! This can be done with the `AttendAt` attention mechanism.

attention = AttendAt(field='tokens_feats',
get_position=lambda state: state.n)

This constructs an attention mechanism that essentially returns
`tokens_feats[state.n]` when the environment state is on word
`n`. Note that this hinges on the fact that we *know* that the
environment stores "current position" in `state.n`. (Again, these
arguments are the default and could be left off.)

Next, we can construct the actor. The actor itself an RNN (not
bidirectional this time), which uses the biLSTM features we build
above, together with the simple attention mechanism.

actor = TransitionRNN([features],
[attention],
n_labels)

Finally, we can construct the policy. In this case, it's just a linear
function that maps from the `actor`'s feature represention to one of
`n_labels` actions:

policy = LinearPolicy(actor, n_labels)

Tracing back through this. The policy maps from a state feature
representation to an action. This mapping is done by
`LinearPolicy`. But where does the state feature representation come
from? It comes from the `actor`, which, when labeling word `n` asks
its attention model(s) what features to use. In this case, the
attention tells it to look at `tokens_feats[n]`, where
`tokens_feats[n]` is the output of the biLSTM.

We will now train this model with DAgger. In order to do this, we need
to anneal the degree to which rollin is done according to the
reference policy versus the learned policy:

p_rollin_ref = stochastic(ExponentialAnnealing(0.99))

Next, we construct an optimizer. This is exactly like you would do in
pytorch, extracting parameters from the `policy`:

optimizer = torch.optim.Adam(policy.parameters(), lr=0.01)

And now we can train:

for epoch in xrange(5):
# train on each example, one at a time
for ex in data:
optimizer.zero_grad()
learner = DAgger(ref, policy, p_rollin_ref)
env = ex.mk_env()
env.run_episode(learner)
learner.update(env.loss())
optimizer.step()
p_rollin_ref.step()

# now make some predictions
for ex in data:
env = ex.mk_env()
out = env.run_episode(policy)
print 'prediction = %s' % out