{"id":13717500,"url":"https://github.com/hal3/macarico","last_synced_at":"2025-05-07T07:31:38.172Z","repository":{"id":66760946,"uuid":"87583629","full_name":"hal3/macarico","owner":"hal3","description":"learning to search in pytorch","archived":false,"fork":false,"pushed_at":"2020-02-18T15:05:23.000Z","size":36770,"stargazers_count":111,"open_issues_count":19,"forks_count":12,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-11-14T05:34:24.163Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hal3.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-04-07T20:17:24.000Z","updated_at":"2024-01-04T16:13:01.000Z","dependencies_parsed_at":"2023-02-24T22:15:14.304Z","dependency_job_id":null,"html_url":"https://github.com/hal3/macarico","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hal3%2Fmacarico","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hal3%2Fmacarico/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hal3%2Fmacarico/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hal3%2Fmacarico/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hal3","download_url":"https://codeload.github.com/hal3/macarico/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252833579,"owners_count":21811214,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T00:01:23.177Z","updated_at":"2025-05-07T07:31:37.646Z","avatar_url":"https://github.com/hal3.png","language":"Python","readme":"![maçarico](resources/macarico.png?raw=true \"maçarico\")\n\n# maçarico\n\nAn implementation of the imperative learning to search framework [1]\nin pytorch, compatible with automatic differentiation, for deep\nlearning-based structured prediction and reinforcement learning.\n\n[1] http://hal3.name/docs/daume16compiler.pdf\n\nThe basic structure is:\n\n    macarico/\n        base.py          defines the abstract classes used for maçarico,\n                         such as Env, Policy, Features, Learner, Attention\n\n        annealing.py     tools for annealing, useful for eg DAgger\n\n        util.py          basic utility functions\n\n        tasks/           example tasks, such as: sequence_labeler,\n                         dependency_parser, sequence2sequence, etc. all of\n                         these define an Env that can be run\n\n        features/        contains example static features and dynamic features\n            sequence.py  defines two types of static features: RNNFeatures\n                         (obtained by running an RNN over the input) and\n                         BOWFeatures (simple bag of words). also defines\n                         useful attention models over sequences.\n\n            actor.py     defines two types of dynamic features: TransitionRNN\n                         (which is an actor that has an RNN-like hidden state),\n                         and TransitionBOW (which has no hidden state and\n                         instead just conditions on the previous actions)\n\n        policies/        currently only implements a linear policy\n\n        lts/             various learning to search algorithms, such as:\n                         maximum_likelihood, dagger, reinforce,\n                         aggrevate and LOLS\n\n    tests/\n        run_tests.sh     run all (or some) of the tests, compare the outputs\n                         to previous versions to make sure you didn't botch\n                         anything. (*please* run this before pushing changes.)\n\n        test_util.py     some utilities for running tests, such as train/eval\n                         loops, printint outputs, etc.\n\n        nlp_data.py      generate or load data for various natural language\n                         processing tasks. (requires external data.)\n\n        test_X.py        various tests for different parts of maçarico. if you\n                         develop something new, please create a test!\n\n        output/          outputs from prvious runs\n\n# To create a new task\n\nTake a look at existing tasks.\n\nCreate a new mytask.py file in macarico/tasks that defines:\n\n1. an `Example` class, which contains labeled examples for your\ntask. This class must define a `mk_env` function that returns an\nenvironment (`Env`) particular to this task.\n\n2. an `Env` class, which defines how your environment works. It must\nprovide `run_episode` and `loss` at the minimum. For some learning\nalgorithms, it must provide `rewind` (mostly for efficiency) and/or\n`reference`.\n\n3. if none of the existing attention models make sense for your task\n(this is the case for e.g. `DependencyParser`), define your own\n`Attention` mechanism.\n\n4. make a test case in tests/test_mytask.py that tests it.\n\n\n# To create new features\n\nThere are two types of features: static features (things that can be\nprecomputed on the input before the environment starts running) and\ndynamic features (things that depend on the status of the\nenvironment).\n\nFor static features (like `RNNFeatures`), create a class the derives\nfrom `macarico.Features` (and probably also from `nn.Module` if it has\nany of its own parameters). At a minimum, this must define its\ndimensionality and give a name to itself (called the `field`). This\n`field` can then be referenced either by other features or by\n`Attention` modules. This should defined a `_forward` method that\ncomputes the static features, and which will be cached automatically\nfor you. It should return a tensor of dimension (M,dim), where M is\narbitrary (but which must be compatible with `Attention`) and where\ndim is the pre-declared dimensionality.\n\nFor dynamic features (like `TransitionRNN`), create a class as before.\nHowever, instead of defining the static `_forward` function, you must\ndefine your own dynamic `forward` function.  This can peek at\n`state.t` and `state.T` to get the current and maximum time step. It\nshould return features /just/ for the current timestep, `state.t`.\n\n\n# To create new attention mechanisms\n\nAn `Attention` mechanism tells a dynamic model where to look to access\nits features. There are type types: hard attention and soft attention.\n\nA hard attention mechanism defines its field (which features it is\nattending to) and its arity (how many feature vectors does it attend\nto at any given time).  Then, at runtime, given a `state`, it must\nreturn the indices into the corresonding fields based on the state,\nwhere the number of indices is exactly equal to its arity.\n\nA soft attention mechanism still defines its field but declares its\narity to be `None`. This means that instead of returning an /index/\ninto its input, it must return a /distribution/ over its input as a\ntorch Variable tensor.\n\n\n# To create new learners\n\nThe most basic type of `Learner` basically behaves like a `Policy`,\nbut additionally provides an `update` function that, for instance,\ndoes backprop.\n\nPerhaps the simplest example is `MaximumLikelihood`, which just\nbehaves according to a reference policy, but accumulates an objective\nfunction that's the sum of individual predictions. At `update` time,\nit runs backprop on this objective.\n\nOne *very important thing* in Learners, is that even if they do not\nuse the return value of their underlying policy, they *must* call the\nunderlying policy every time they run. Why? Because the underlying\npolicy may accumulate state (as in the case of `TransitionRNN`) and if\nit is \"skipped\" the policy will become very confused because it will\nhave missed some input.\n\nA slightly different example is `Reinforce`, which implements the\nreinforce RL algorithm. This Learner does not explicitly accumulate an\nobjective that it then backprops on; instead it uses the fact that\nstochastic choices can be backpropped through automatically using\ntorch's `.reinforce` function.\n\n\n# Understanding how everything fits together\n\nBecause we have designed maçarico to be as modular as possible, there\nare some places where the different pieces need to \"talk\" to each\nother.\n\nLet's take `test_sequence_labeler.py` as an example. In `test_demo`,\nwe have code that looks like:\n\n    data = [Example(x, y, n_labels) for x, y in ...]\n\nThis constructs `sequence_labeler.Example` data structures and calls\nthem data.  If you look at the `Example` data structure, you find it\nhas two main components: `tokens` and `labels`, corresponding to `x`\nand `y` respectively, above.\n\nNext, we build some *static* features:\n\n    features = RNNFeatures(n_types,\n                           input_field  = 'tokens',\n                           output_field = 'tokens_feats')\n\nThis constructs a biLSTM over the inputs. Where does it look? It looks\nin `tokens` because that's the specified input field. And it stores\nthe features generated by the biLSTM in `tokens_feats`. You can\ntherefore think of `features` as something that maps from `tokens` to\ntokens_feats`. (Note: those two arguments are the default and could\nhave been left off for convenience, but here we're trying to make\neverything explicit.)\n\nNext, we need an actor. The actor is the thing that takes a state of\nthe world and produces a feature representation. (This feature\nrepresentation will later be consumed by the `Policy` to predict an\naction.) However, the actor needs to *attend* somewhere when making\npredictions. In this case, when the environment (the sequence labeler)\nis predicting the label of the `n`th word, the actor should look at\nthat word! This can be done with the `AttendAt` attention mechanism.\n\n    attention = AttendAt(field='tokens_feats',\n                         get_position=lambda state: state.n)\n\nThis constructs an attention mechanism that essentially returns\n`tokens_feats[state.n]` when the environment state is on word\n`n`. Note that this hinges on the fact that we *know* that the\nenvironment stores \"current position\" in `state.n`. (Again, these\narguments are the default and could be left off.)\n\nNext, we can construct the actor. The actor itself an RNN (not\nbidirectional this time), which uses the biLSTM features we build\nabove, together with the simple attention mechanism.\n\n    actor = TransitionRNN([features],\n                          [attention],\n                          n_labels)\n\nFinally, we can construct the policy. In this case, it's just a linear\nfunction that maps from the `actor`'s feature represention to one of\n`n_labels` actions:\n\n    policy = LinearPolicy(actor, n_labels)\n\nTracing back through this. The policy maps from a state feature\nrepresentation to an action. This mapping is done by\n`LinearPolicy`. But where does the state feature representation come\nfrom? It comes from the `actor`, which, when labeling word `n` asks\nits attention model(s) what features to use. In this case, the\nattention tells it to look at `tokens_feats[n]`, where\n`tokens_feats[n]` is the output of the biLSTM.\n\nWe will now train this model with DAgger. In order to do this, we need\nto anneal the degree to which rollin is done according to the\nreference policy versus the learned policy:\n\n    p_rollin_ref = stochastic(ExponentialAnnealing(0.99))\n\nNext, we construct an optimizer. This is exactly like you would do in\npytorch, extracting parameters from the `policy`:\n\n    optimizer = torch.optim.Adam(policy.parameters(), lr=0.01)\n\nAnd now we can train:\n\n    for epoch in xrange(5):\n        # train on each example, one at a time\n        for ex in data:\n            optimizer.zero_grad()\n            learner = DAgger(ref, policy, p_rollin_ref)\n            env = ex.mk_env()\n            env.run_episode(learner)\n            learner.update(env.loss())\n            optimizer.step()\n            p_rollin_ref.step()\n\n        # now make some predictions\n        for ex in data:\n            env = ex.mk_env()\n            out = env.run_episode(policy)\n            print 'prediction = %s' % out\n\n","funding_links":[],"categories":["Pytorch \u0026 related libraries｜Pytorch \u0026 相关库","Pytorch \u0026 related libraries"],"sub_categories":["Other libraries｜其他库:","Other libraries:"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhal3%2Fmacarico","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhal3%2Fmacarico","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhal3%2Fmacarico/lists"}