{"id":13936809,"url":"https://github.com/ayoungprogrammer/Lango","last_synced_at":"2025-07-19T22:32:28.613Z","repository":{"id":62575069,"uuid":"57098794","full_name":"ayoungprogrammer/Lango","owner":"ayoungprogrammer","description":"Language Lego","archived":false,"fork":false,"pushed_at":"2019-11-04T02:52:11.000Z","size":72,"stargazers_count":142,"open_issues_count":1,"forks_count":15,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-05-03T06:22:02.804Z","etag":null,"topics":["nlp","parse-trees","stanford-corenlp","stanford-parser"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ayoungprogrammer.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-04-26T04:51:49.000Z","updated_at":"2024-01-04T16:04:27.000Z","dependencies_parsed_at":"2022-11-03T18:48:02.128Z","dependency_job_id":null,"html_url":"https://github.com/ayoungprogrammer/Lango","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ayoungprogrammer%2FLango","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ayoungprogrammer%2FLango/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ayoungprogrammer%2FLango/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ayoungprogrammer%2FLango/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ayoungprogrammer","download_url":"https://codeload.github.com/ayoungprogrammer/Lango/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226693897,"owners_count":17667757,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nlp","parse-trees","stanford-corenlp","stanford-parser"],"created_at":"2024-08-07T23:03:01.259Z","updated_at":"2024-11-27T05:30:33.533Z","avatar_url":"https://github.com/ayoungprogrammer.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Lango\n\n[![Gitter](https://badges.gitter.im/lango-nlp/Lobby.svg)](https://gitter.im/lango-nlp/Lobby?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge)\n\nLango is a natural language processing library for working with the building blocks of language. It includes tools for:\n\n* matching [constituent parse trees](https://en.wikipedia.org/wiki/Parse_tree#Constituency-based_parse_trees). \n* modeling conversations (TODO)\n\nNeed help? Ask me for help on [Gitter](https://gitter.im/lango-nlp/Lobby)\n\n## Installation\n\n### Install package with pip\n\n```\npip install lango\n```\n\n### Download Stanford CoreNLP\n\nMake sure you have Java installed for the Stanford CoreNLP to work.\n\n[Download Stanford CoreNLP](http://stanfordnlp.github.io/CoreNLP/#download)\n\nExtract to any folder\n\n### Run the Stanford CoreNLP server\n\nRun the following command in the folder where you extracted Stanford CoreNLP\n```\njava -mx4g -cp \"*\" edu.stanford.nlp.pipeline.StanfordCoreNLPServer\n```\n\n## Docs\n\n- [Blog Post](http://blog.ayoungprogrammer.com/2016/07/natural-language-understanding-by.html/)\n- [Read the docs](http://lango.readthedocs.io/en/latest/)\n- [Examples](http://github.com/ayoungprogrammer/lango/tree/master/examples)\n\n## Matching\n\nMatching is done by comparing a set rules and matching it with a parse tree. You\ncan see parse trees for sentences from examples/parser_input.py. \n\nThe set of rules is recursive and can match multiple parts of the parse tree.\n\nRules can be broken down into smaller parts:\n- Tag\n- Token\n- Token Tree\n- Rules\n\n### Tag\n\nA tag is a POS (part of speech) tag to match. A list of POS tags used by the Stanford Parser can be found [here](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html).\n\n```\nFormat:\ntag = string\n\nExample:\n'NP'\n'VP'\n'PP'\n```\n\n### Token\n\nA token is a string comprising of a tag and modifiers/labels for matching. We specify a match_label to match the tag to. We can specify opts for extracting the string from a tree. We can specify eq for matching the tree to a string.\n\n```\nExample string:\nThe red car\n\nopts:\n-o Get object by removing \"a\", \"the\", etc. (Ex. red car)\n-r Get raw string (Ex. The red car)\n```\n\n```\nFormat: (only tag is required)\ntoken = tag:match_label-opts=eq\n\nExample: \n'VP'\n'NP:subject-o'\n'NP:np'\n'VP=run'\n'VP:action=run'\n```\n\n### Token Tree\n\nA token tree is a recursive tree of tokens. The tree matches the structure of a parse tree.\n\n```\nFormat:\ntoken_tree = ( token token_tree token_tree ... )\n\nExamples: \n'( NP ( DT ) ( NP:subject-o ) )'\n'( NP )'\n'( PP ( TO=to ) ( NP:object-o ) )'\n```\n\n### Rules\n\nRules are a dictionary of token trees to dictionaries of matching labels to a \nnested set of rules. \n\n```\nFormat:\nrules = {token_tree: {match_label: rules}}\n\nExample: \n{\n    '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )': {\n        'np': {\n            '( NP:subject-o )': {}\n        },\n        'pp': {\n            '( PP ( TO=to ) ( NP:to_object-o ) )': {},\n            '( PP ( IN=from ) ( NP:from_object-o ) )': {},\n        }\n    },\n}\n```\n\nWhen matching a rule to a parse tree, the token tree is first matched. Then, all\nmatching tags are matched to nested rules corresponding to their matching label.\n\nAll nested match labels must have a subrule match or the rules will not match.\n\nThe first rule to match is returned so the order of match is based on key \nordering (use OrderedDict if order matters). Once a rule is matched, it calls\nthe callback function with the context as arguments.\n\n### Example\n\nSuppose we have the sentence \"Sam ran to his house\" and we wanted to match the\nsubject (\"Sam\"), the object (\"his house\") and the action (\"ran\"). \n\nSample parse tree for \"Sam ran to his house\" from the Stanford Parser. \n\n```\n(S\n  (NP \n    (NNP Sam)\n    )\n  (VP\n    (VBD ran)\n      (PP \n        (TO to)\n        (NP\n          (PRP$ his)\n          (NN house)\n          )\n        )\n    )\n  )\n```\n\nSimplified image of tree:\n\n![tree](/docs/_static/img/sent_tree.png)\n\n```\nMatching:\nParse Tree: \n(S (NP (NNP Sam) ) (VP (VBD ran) (PP (TO to) (NP (PRP$ his) (NN house))))\n\nMatched token tree: '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )'\nMatched context: \n  np: (NP (NNP Sam))\n  action-o: 'ran'\n  pp: (PP (TO to) (NP (PRP$ his) (NN house)))\n```\n\nRule for '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )':\n\n![tree](/docs/_static/img/rule_tree_1.png)\n\nMatching 'NP' matches the whole NP tree and converts to a word:\n\n```\nMatched token tree for np: '( NP:subject-o )'\nMatched context:\n  subject-o: 'Sam'\n```\n\nMatching 'PP' requires matching the nested rules:\n\n```\nMatch token tree for pp: '( PP ( TO=to ) ( NP:to_object-o ) )'\nMatch context:\n  object-o: 'his house'\n\nMatch token tree for pp: '( PP ( IN=from ) ( NP:from_object-o ) )'\nNo match found\n```\nPP of the sample sentence:\n\n![tree](/docs/_static/img/sent_tree_pp.png)\n\nNested PP rules:\n\n![tree](/docs/_static/img/rule_tree_2.png)\n![tree](/docs/_static/img/rule_tree_3.png)\n\nOnly the first rule matches for 'PP'.\n\nNow that we have a match for all nested rules, we can return the context:\n```\nReturned context:\n  action: 'ran'\n  subject: 'sam'\n  to_object: 'his house'\n```\n\nFull code:\n\n```python\nfrom lango.parser import StanfordServerParser\nfrom lango.matcher import match_rules\n\nparser = StanfordServerParser()\n\nrules = {\n  '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )': {\n    'np': {\n        '( NP:subject-o )': {}\n    },\n    'pp': {\n        '( PP ( TO=to ) ( NP:to_object-o ) )': {},\n        '( PP ( IN=from ) ( NP:from_object-o ) )': {}\n    }\n  }\n}\n\ndef fun(subject, action, to_object=None, from_object=None):\n    print \"%s,%s,%s,%s\" % (subject, action, to_object, from_object)\n\ntree = parser.parse('Sam ran to his house')\nmatch_rules(tree, rules, fun)\n# output should be: sam, ran, his house, None\n\ntree = parser.parse('Billy walked from his apartment')\nmatch_rules(tree, rules, fun)\n# output should be: billy, walked, None, his apartment\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fayoungprogrammer%2FLango","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fayoungprogrammer%2FLango","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fayoungprogrammer%2FLango/lists"}