{"id":28781046,"url":"https://github.com/workday/upshot-montague","last_synced_at":"2025-09-19T17:54:46.288Z","repository":{"id":71172075,"uuid":"54428561","full_name":"Workday/upshot-montague","owner":"Workday","description":"Montague is a little CCG semantic parsing library for Scala.","archived":false,"fork":false,"pushed_at":"2022-08-06T23:13:01.000Z","size":1210,"stargazers_count":59,"open_issues_count":1,"forks_count":8,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-06-17T18:52:02.078Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Workday.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2016-03-21T22:51:14.000Z","updated_at":"2024-03-03T21:02:03.000Z","dependencies_parsed_at":"2023-03-18T04:15:28.929Z","dependency_job_id":null,"html_url":"https://github.com/Workday/upshot-montague","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Workday/upshot-montague","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Workday%2Fupshot-montague","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Workday%2Fupshot-montague/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Workday%2Fupshot-montague/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Workday%2Fupshot-montague/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Workday","download_url":"https://codeload.github.com/Workday/upshot-montague/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Workday%2Fupshot-montague/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275980911,"owners_count":25564137,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-19T02:00:09.700Z","response_time":108,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-17T18:39:42.762Z","updated_at":"2025-09-19T17:54:46.281Z","avatar_url":"https://github.com/Workday.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"montague\n========\n\n`montague` is a little CCG semantic parsing library for Scala.\n\nYou can build on this code to translate English-into-SQL,\nEnglish-into-API commands, etc. To do so, you need to build a lexicon\nfor your specific application. We include a few sample lexicons,\nas described in the [Getting Started](https://github.com/workday/upshot-montague/#getting-started)\nsection of this README.\n\nThe code currently implements boolean (non-probabilistic) CCG\nparsing, using a [CKY](https://en.wikipedia.org/wiki/CYK_algorithm)-based\nparse search strategy.\n\n![An example syntactic parse tree](https://github.com/Workday/upshot-montague/blob/master/doc/example.png?raw=true \"An example syntactic parse tree\")\n![An example semantic parse tree](https://github.com/Workday/upshot-montague/blob/master/doc/example_math.png?raw=true \"An example semantic parse tree\")\n\nAuthors\n-------\n\n* [Thomas Kim](https://twitter.com/tksfz)\n* [Joseph Turian](http://joseph.turian.com)\n* [Alex Nisnevich](http://alex.nisnevich.com/portfolio)\n\nNote that the repo history doesn't accurately reflect authorship,\nbecause much of the code was ported from the original UPSHOT repo.\n\nBackground\n----------\n\nAt [UPSHOT](http://blogs.workday.com/workday-acquires-upshot/)\n(acquired by Workday), we built a semantic parser that translated\nEnglish into SQL, and — later — English into SOQL (the Salesforce\nquery language). This was packaged in a mobile application with the\nfollowing architecture:\n\n![UPSHOT architecture](https://github.com/Workday/upshot-montague/blob/master/doc/upshot-architecture.png?raw=true \"UPSHOT architecture\")\n\nThis package extracts and open-sources the core CCG-based semantic\nparser component of UPSHOT, in a form that is general-purpose and\nself-contained. We hope that that other people find it useful. We\nplan to clean up the SQL-generation code and release that too. If\nyou have more requests, please email us.\n\nJoseph Turian and Alex Nisnevich gave a [talk at Strata 2016](http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/47360) introducing `montague` (video [here](https://www.youtube.com/watch?v=lnV2JnNBM1I\u0026feature=youtu.be)).\n\nIntroduction\n------------\n\n\u003e \"Oh, get ahold of yourself. Nobody's proposing that we parse English.\"\n\u003e — Larry Wall in `\u003c199709032332.QAA21669@wall.org\u003e`\n\n`montague` takes its name from [Montague\nsemantics](https://en.wikipedia.org/wiki/Montague_grammar), the\nidea that human language can be expressed through formal logic and\nlambda-calculus. The process of inferring this formal representation\nfrom natural language is called \"semantic parsing\". Specifically,\n`montague` implements [Combinatory Categorial Grammar\n(CCG)](https://en.wikipedia.org/wiki/Combinatory_categorial_grammar), a\nparticular grammar formalism that has become popular recently for\nsemantic parsing.\n\n### Lambda-calculus? Combinatory grammar? Huh??\n\nLet's break it down.\nHere's an example of a definition in `montague`:\n```scala\n  (\"plus\" -\u003e ((N\\N)/N, λ {y: Int =\u003e λ {x: Int =\u003e x + y}}))\n```\n\nHere's what it means:\n- There's a term called _\"plus\"_.\n- It has the syntactic category `(N\\N)/N`. This means that it's something that\n  attaches to a noun (`N`) after it to form a `N\\N`, which is a thing that attaches\n  to a noun before it to form another noun. In other words, \"plus\" must be between\n  two nouns, and the end result of `(Noun) plus (Noun)` syntactically is just another noun.\n  So far, so good!\n- It has the semantic definition `λ {y: Int =\u003e λ {x: Int =\u003e x + y}}`. In other words,\n  it's a function that takes an integer and returns a function that another integer, adding\n  the first integer to it. Uncurrying it (because in Montague semantics, all functions must\n  be curried) simply yields `λ {x: Int, y: Int =\u003e x + y}`. Well, that's pretty straightforward.\n\nLooking through the code of the examples below, you'll notice that not all\ndefinitions look quite like this. Some\nof them have multiple synonyms for one definition, or multiple definitions\nfor a single term (in that case, we say that the term is _ambiguous_).\nSome of them don't operate on single terms at all, but on _matchers_ (functions that try\nto find certain kinds of strings). Some of them don't have semantic definitions\nat all and only describe syntactic categories. But the general idea for all of these\nis the same.\n\nThe way you use `montague` is by defining your own _lexicon_ of terms with syntactic\nand semantic definitions. And the semantic parser does the rest.\n\nGetting Started\n---------------\n\n`montague` comes with a few simple examples demonstrating some applications of its semantic parsing features.\n\n### English-to-calculator arithmetic\n\n(See [`ArithmeticParser`](https://ghe.megaleo.com/upshot/montague/blob/master/src/main/scala/example/ArithmeticParser.scala).)\n\nIn this example, English is parsed into a semantic form, which is\nthen realized as arithmetic operations.\n\n```sh\n\u003e sbt \"runMain example.ArithmeticParser (3 + 5) * 2\"\nInput: (3 + 5) * 2\nOutput: 16\n```\n\nOur current implementation treats all grammatical rule applications\nas either possible (true) or impossible (false).  For this reason,\nthe parser cannot currently discriminate between rules of different\nprecedence:\n\n```sh\n\u003e sbt \"runMain example.ArithmeticParser 3 + 5 * 2\"\nInput: 3 + 5 * 2\nOutput: Ambiguous(13, 16)\n```\n\nBesides ambiguity arising from the inability to discriminate between\ndifferent rule applications, we might want intentionally to encode\nambiguity into our language. For example, the `+/-` operation is\nintentionally ambiguous, and has multiple valid semantic\ninterpretations:\n\n```sh\n\u003e sbt \"runMain example.ArithmeticParser (3 +/- 5) * 2\"\nInput: (3 +/- 5) * 2\nOutput: Ambiguous(16, -4)\n```\n\nWe ignore all unrecognized tokens by adding an `Else` clause in the\nlexicon, which matches all tokens that wouldn't match otherwise, and in\nthis case produces semantically null parses:\n\n```\n\u003e sbt \"runMain example.ArithmeticParser Could you please tell me, what is 100 + 100 ?\"\nInput: Could you please tell me, what is 100 + 100 ?\nOutput: 200\n```\n\n### English-to-syntactic structure\n\n(See [`CcgBankParser`](https://ghe.megaleo.com/upshot/montague/blob/master/src/main/scala/example/CcgBankParser.scala).)\n\nUsing the CCGBank lexicon, we parse English sentences into syntax dependency trees.\n\nIf you don't have the CCGBank lexicon, you can use an older version of it that we downloaded from\n[Julia Hockenmaier's site](http://juliahmr.cs.illinois.edu/). The lexicon is located at\n`data/lexicon.wsj02-21.gz`.\n\nYou can then parse sentences using the old CCGBank lexicon as follows:\n\n```sh\n\u003e sbt \"runMain example.OldCcgBankParser Thom and Alex and Joseph are writing a parser\"\nInput: Thom and Alex and Joseph are writing a parser\nOutput:\n  are\n    writing\n      a\n        parser\n    and\n      joseph\n      and\n        alex\n        thom\n```\n\nIf you do have the CCGBank lexicon and would like to use it, put\n`CCGbank.00-24.lexicon` into subdirectory `data/`, and invoke the\nparser as follows:\n\n```sh\n\u003e sbt \"runMain example.CcgBankParser Thom and Alex and Joseph are writing a parser\"\n```\n\n### English-to-information storage and retrieval\n\n(See [`InformationStore`](https://ghe.megaleo.com/upshot/montague/blob/master/src/main/scala/example/InformationStore.scala).)\n\n`InformationStore` uses [`SemanticRepl`](https://ghe.megaleo.com/upshot/montague/blob/master/src/main/scala/parser/SemanticRepl.scala) to implement a very basic information storage and retrieval system, by parsing statements into `Define` constructs and queries into `Query` constructs, then executing them accordingly.\n\n**Important:** This example uses the older-style CCGBank lexicon - see [the above example](#english-to-semantic-structure) for download instructions.\n\nAn example interactive session with `InformationStore`:\n```sh\n\u003e sbt \"runMain example.InformationStore\"\n\u003e\u003e Joseph is a programmer\nOk\n\u003e\u003e Joseph is pretty weird\nOk\n\u003e\u003e Who is Joseph?\n{a(programmer), pretty(weird)}\n\u003e\u003e Who are Joseph and Ted?\nI don't know\n\u003e\u003e Joseph and Ted are Alex's bandmates\nOk\n\u003e\u003e Who are Joseph and Ted?\nalex's(bandmates)\n```\n\nLibrary overview\n----------------\n\n### `SemanticParser`\n\n`SemanticParser` is the main entry point into _montague_. To instantiate a `SemanticParser`, you need a syntactic scheme (`CcgCat` for our purposes) and a lexicon, stored in a `ParserDict`.\n\nOnce you've instantiated a `SemanticParser`, `.parse(text, tokenizer)` yields a `SemanticParseResult`, which you can unpack to find the parse tree and resulting semantic representation (if the parse succeeded). See `SemanticParser.main` for an example of how to extract results.\n\n##### Lexicons\n\nTo build up a lexicon, you can create a new `ParserDict()` (or load syntactic entries from a CCGbank lexicon, if you have one, with `ParserDict.fromCcgBankLexicon`) and add entries to it with the `+` operator.\n\nA lexicon entry looks like `(matcher -\u003e meaning)`, where\n- `matcher` can be\n    1. a term (String),\n    2. a Seq of terms,\n    3. an instance of `TokenMatcher[T]` (a `String =\u003e Seq[T]` function), or\n    4. `Else`, which matches any otherwise un-matched token; and\n- `meaning` can be\n    1. a syntactic category,\n    2. a `(`syntactic category`,` semantic representation`)` pair,\n    3. a Seq of either of the above _(in which case the meaning of the term is ambiguous)_, or\n    4. _(only if the `matcher` is a `TokenMatcher` or `Else`)_, a function of the matched object that produces of Seq of syntactic categories or a `(`syntactic category`,` semantic representation`)` pairs\n\n##### `SemanticRepl`\n\n`SemanticRepl` is a wrapper on top of `SemanticParser` that's useful for making REPLs that repeatedly read input, parse it into an \"action\", and pattern-match that action to perform some operation against an internal state. For an example of `SemanticRepl` at work, see the [`InformationStore`](https://ghe.megaleo.com/upshot/montague#english-to-information-storage-and-retrieval) example above.\n\n### Syntactic Categories\n\n_montague_ supports arbitrary syntactic schemes, but the only one built-in is `CcgCat`, representing CCG categories. Here are the categories available in the `ccg` package:\n\n- _Terminal_ syntactic categories are ones that can appear at the top of the parse tree and cannot consume adjacent terms. Built-in terminals are `S` (\"sentence\"), `N` (\"noun\"), `NP` (\"noun phrase\"), and `PP` (\"prepositional phrase\"), but others are easy to add, depending on your application.\n- _Non-terminal_ syntactic categories are ones that can (and must) consume adjacent terms. A parse cannot succeed if there is a non-terminal at the top of the parse tree. Types of non-terminal categories are:\n   - _Forward application_: `A/B` consumes a `B` in front of it to become an `A`.\n   - _Backward application_: `A\\B` consumes a `B` behind it to become an `A`.\n   - _Bidirectional application_: `A|B` consumes an adjacent `Y` to become an `A`.\n   - _Identity categories_: `X|X`, `X/X`, and `X\\X` are special cases of the above -- they can consume a term of _any category_ to become that category.\n   - `Conj`, the _conjunction_ category, is a short-hand for `(X\\X)/X`.\n     _(Exercise: Why is this called the \"conjunction\" category?)_\n- Additionally, a category assigned to a term may have a probability attached to it: `Cat % prob`. For example, if the term _apple_ has categories `N % 0.9` and `(NP/N) % 0.1`, that means that it's 9 times more likely to be a noun than an adjective, and the parser will score potential parses accordingly. Probabilities default to `1.0` if unspecified.\n- A category may also have a label: `Cat(\"label\")`. `A/B(\"somelabel\")` can consume a `B(\"somelabel\")` but not a regular `B`.\n\n### Semantic Representations\n\nA `SemanticState` is generally one of two things (in each case, `LF` corresponds to the type of the objects we're dealing with -- in the examples below, `LF = Int`):\n\n- A _form_ `Form[LF]` represents a semantic state that is \"complete\" (i.e. doesn't consume any arguments) -- for example, `Form(4)` represents the number four.\n- A _lambda_ `λ {x: LF =\u003e \u003csemantic state\u003e}` represents a semantic state that must consume arguments -- for example, `λ {x: Int =\u003e Form(x + 5)}` represents the function that adds 5 to any integer. Implicit conversions allow us to write this more concisely to `λ {x: Int =\u003e x + 5}`. Multi-argument functions are represented by currying (e.g. `λ {y: Int =\u003e λ {x: Int =\u003e x + y}}`).\n\nThere are a few other possible `SemanticStates` that generally shouldn't be specified directly in the lexicon, but can appear in parse results:\n- `Nonsense` represents a parse with no valid semantic outputs.\n- `Ambiguous(Set[SemanticState])` represents a parse with more than one valid semantic output.\n- `Ignored` represents a parse that ignored semantics entirely (i.e. you didn't specify semantic representations in the lexicon).\n\nExercises for the reader\n------------------------\n_(In rough order of difficulty.)_\n\n### Parsing\n\n1. While all of our examples involve CCG parsing, _montague_ supports alternative syntactic schemes. Create your own semantic scheme (that is, a type hierarchy that inherits from `SyntacticLabel`), and parse something with it.\n1. **Composition.** _montague_'s CCG implementation currently supports only one of the three CCG combinators: the _application_ combinator (`X/Y Y -\u003e X`, `X X\\Y -\u003e Y`). Extend it to also support the [_composition_ combinator](https://en.wikipedia.org/wiki/Combinatory_categorial_grammar#Composition_Combinators) (`X/Y Y/Z -\u003e X/Z`).\n1. **Type-raising.** As above, but for the [_type-raising_ combinator](https://en.wikipedia.org/wiki/Combinatory_categorial_grammar#Type-raising_Combinators) (`X -\u003e T/(T\\X)`).\n1. **Probabilistic parsing.** The parser currently supports boolean parsing: a parse of the input\nstring is either possible (true) or impossible (false).\n_(This corresponds to implementing the boolean semiring parser of\n[Goodman, 1999](http://www.aclweb.org/anthology/J99-4004) -- see\nFigure 5.)_\nExtend the code so that it supports\nprobabilistic or weighted parsing. _(The existing probabilistic\nimplementation of English parsing, based upon multiplying out lexicon\nweights, is a hack that including the token weights within the CCG\ncategory.)_\n1. **Speed improvements.** The parser implements a [CKY](https://en.wikipedia.org/wiki/CYK_algorithm) search strategy, which is bottom-up.\nIf the parser had weight implemented, we could parse faster using\nagenda-based parsing: You use an agenda to order nodes by some\npriority. For example, the priority can be the cumulative probability\nof applying the rules (best-first parsing). Alternately, instead\nof agenda-based parsing, beam pruning could be used to reduce the\nsize of the search space. In this case, only the top *k* weighted\nnodes are kept in any parse cell.\n1. **† Fuzzy matching.** What if a user enters a phrase that doesn't parse successfully, but adding or removing one word (or perhaps correcting a misspelling) would fix it? Create a _\"Did You Mean?\"_ feature that implements this efficiently.\n\n### Applications\n\n1. Add more features to the `ArithmeticParser` example. For example, improve the tokenizer to correctly handle infix expressions without spaces (e.g. `(1+2)*3`), or add more operations.\n1. Add more features to the `InformationStore` example. For example, add other types of relations, or support more kinds of expressions.\n1. **IFTTT.** Parse English phrases like _\"When (this happens), (do this)\"_ into semantic forms corresponding to [IFTTT](https://ifttt.com/) API calls. Try building a REPL on top of `SemanticRepl` for communicating with IFTTT via natural language.\n1. **Slack bot.** Similar to the above, but do something cool with a [Slack bot](https://api.slack.com/bot-users) instead.\n1. **† Game semantics.** Come up with a semantic scheme for representing rule descriptions for a simple card game (think _Magic_, _Hearthstone_, etc., but simplify!) For example, a card may say something like _\"Whenever your opponent loses life, draw a card\"_. Then write a parser for it.\n1. **† English to Freebase.** Parse English phrases into [Freebase](https://www.freebase.com/) queries. For example (borrowing an example from [SEMPRE](http://www-nlp.stanford.edu/software/sempre/)), _\"Which college did Obama go to?\"_ → `(and (Type University) (Education BarackObama))` → _\"Occidental College, Columbia University\"_. (_Hint_: You'll have to generate most of the lexicon programmatically using Freebase as well.)\n1. **† English to SQL.** Parse English questions (such as _\"How many customers in Europe made a purchase last month?\"_) into SQL statements. Assume that you have all relevant table structure information available. (_Hints_: Generate an abstract structure for the full parse first, and then worry about generating SQL out of it. Much of the lexicon will have to be generated from table and column names, as well as entries for categorical columns.  There will be a *lot* of ambiguous definitions.)\n\nRelated work\n------------\n\n* [SEMPRE](http://www-nlp.stanford.edu/software/sempre/) is a toolkit\nfor training semantic parsers.\n* [Cornell Semantic Parsing Framework](https://bitbucket.org/yoavartzi/spf)\nis an open source research software package. It includes a semantic\nparsing algorithm, a flexible meaning representation language and\nlearning algorithms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworkday%2Fupshot-montague","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fworkday%2Fupshot-montague","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworkday%2Fupshot-montague/lists"}