{"id":15293310,"url":"https://github.com/famished-tiger/rley","last_synced_at":"2025-04-05T13:08:35.266Z","repository":{"id":22985400,"uuid":"26335731","full_name":"famished-tiger/Rley","owner":"famished-tiger","description":"An Earley parser written in Ruby","archived":false,"fork":false,"pushed_at":"2025-03-18T19:33:04.000Z","size":1436,"stargazers_count":38,"open_issues_count":2,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-05T13:08:21.734Z","etag":null,"topics":["earley-parser","natural-language-processing","nlp","parser","ruby","rubynlp"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/famished-tiger.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-11-07T20:01:30.000Z","updated_at":"2025-03-18T19:33:08.000Z","dependencies_parsed_at":"2025-02-24T20:25:07.263Z","dependency_job_id":"493bcd0b-5187-43a4-a9ba-12655428e78b","html_url":"https://github.com/famished-tiger/Rley","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/famished-tiger%2FRley","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/famished-tiger%2FRley/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/famished-tiger%2FRley/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/famished-tiger%2FRley/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/famished-tiger","download_url":"https://codeload.github.com/famished-tiger/Rley/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247339158,"owners_count":20923014,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["earley-parser","natural-language-processing","nlp","parser","ruby","rubynlp"],"created_at":"2024-09-30T16:46:10.707Z","updated_at":"2025-04-05T13:08:35.229Z","avatar_url":"https://github.com/famished-tiger.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"[Rley](https://github.com/famished-tiger/Rley)\n====\n[![Linux Build Status](https://img.shields.io/travis/famished-tiger/Rley/master.svg?label=Linux%20build)](https://travis-ci.org/famished-tiger/Rley)\n[![Build status](https://ci.appveyor.com/api/projects/status/l5adgcbfo128rvo9?svg=true)](https://ci.appveyor.com/project/famished-tiger/rley)\n[![Coverage Status](https://img.shields.io/coveralls/famished-tiger/Rley.svg)](https://coveralls.io/r/famished-tiger/Rley?branch=master)\n[![Gem Version](https://badge.fury.io/rb/rley.svg)](http://badge.fury.io/rb/rley)\n[![Inline docs](http://inch-ci.org/github/famished-tiger/Rley.svg?branch=master)](http://inch-ci.org/github/famished-tiger/Rley)\n[![License](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](https://github.com/famished-tiger/SRL-Ruby/blob/master/LICENSE.txt)\n\nA Ruby library for constructing general parsers for _any_ context-free language.  \n\nWhat is Rley?\n-------------\n__Rley__ uses the [Earley](http://en.wikipedia.org/wiki/Earley_parser)\nalgorithm which is a general parsing algorithm that can handle any context-free\ngrammar. Earley parsers can literally swallow anything that can be described\nby a context-free grammar. That's why Earley parsers find their place in so\nmany __NLP__ (_Natural Language Processing_) libraries/toolkits.  \n\nIn addition, __Rley__ goes beyond most Earley parser implementations by providing\nsupport for ambiguous parses. Indeed, it delivers the results of a parse as a\n_Shared Packed Parse Forest_ (SPPF). A SPPF is a data structure that allows to\nencode efficiently all the possible parse trees that result from an ambiguous\ngrammar.  \n\nAs another distinctive mark, __Rley__ is also the first Ruby implementation of a\nparsing library based on the new [Grammar Flow Graph](#grammar-flow-graph) approach .\n\n### What it can do?\nMaybe parsing algorithms and internal implementation details are of lesser\ninterest to you and the good question to ask is \"what Rley can really do?\".  \n\nIn a nutshell:  \n* Rley can parse context-free languages that other well-known libraries cannot\nhandle  \n* Built-in support for ambiguous grammars that typically occur in NLP\n\nIn short, the foundations of Rley are strong enough to be useful in a large\napplication range such as:  \n* computer languages -e.g. [Simple Regex Language](https://github.com/famished-tiger/SRL-Ruby) - ,  \n* artificial intelligence and  \n* Natural Language Processing.\n\n### Features\n* Simple API for context-free grammar definition,\n* Allows ambiguous grammars,\n* Generates shared packed parse forests,\n* Accepts left-recursive rules/productions,\n* Provides syntax error detection and reporting.\n\n\n### Compatibility\nRley supports the following Ruby implementations:\n- MRI 3.2\n- MRI 3.3 \n- MRI 3.4\n- JRuby 9.1+  \n\n---\n\nGetting Started\n---------------\n\n### Installation\nInstalling the latest stable version is simple:\n\n    $ gem install rley\n\n\n## A whirlwind tour of Rley\nThe purpose of this section is show how to create a parser for a minimalistic\nEnglish language subset.\nThe tour is organized as follows:  \n1. [Creating facade object of Rley library](#creating-facade-object-of-rley-library)   \n2. [Defining the language grammar](#defining-the-language-grammar)  \n3. [Creating a lexicon](#creating-a-lexicon)  \n4. [Creating a tokenizer](#creating-a-tokenizer)    \n5. [Parsing some input](#parsing-some-input)  \n6. [Generating the parse tree](#generating-the-parse-tree)\n\nThe complete source code of the example used in this tour can be found in the\n[examples](https://github.com/famished-tiger/Rley/tree/master/examples/NLP/mini_en_demo.rb)\ndirectory\n\n\n### Creating facade object of Rley library\n```ruby  \n    require 'rley' # Load Rley library\n\n    # Let's create a facade object called 'engine'\n    # It provides a unified, higher-level interface\n    engine = Rley::Engine.new\n```\n\n\n### Defining the language grammar\nThe subset of English grammar is based on an example from the NLTK book.\n\n```ruby  \n    engine.build_grammar do\n      # Terminal symbols (= word categories in lexicon)\n      add_terminals('Noun', 'Proper-Noun', 'Verb')\n      add_terminals('Determiner', 'Preposition')\n\n      # Here we define the productions (= grammar rules)\n      rule 'S' =\u003e 'NP VP'\n      rule 'NP' =\u003e 'Proper-Noun'\n      rule 'NP' =\u003e 'Determiner Noun'      \n      rule 'NP' =\u003e 'Determiner Noun PP'\n      rule 'VP' =\u003e 'Verb NP'      \n      rule 'VP' =\u003e 'Verb NP PP'\n      rule 'PP' =\u003e 'Preposition NP'\n    end\n```  \n\n### Creating a lexicon\n\n```ruby\n    # To simplify things, lexicon is implemented as a Hash with pairs of the form:\n    # word =\u003e terminal symbol name\n    Lexicon = {\n      'man' =\u003e 'Noun',\n      'dog' =\u003e 'Noun',\n      'cat' =\u003e 'Noun',\n      'telescope' =\u003e 'Noun',\n      'park' =\u003e 'Noun',  \n      'saw' =\u003e 'Verb',\n      'ate' =\u003e 'Verb',\n      'walked' =\u003e 'Verb',\n      'John' =\u003e 'Proper-Noun',\n      'Mary' =\u003e 'Proper-Noun',\n      'Bob' =\u003e 'Proper-Noun',\n      'a' =\u003e 'Determiner',\n      'an' =\u003e 'Determiner',\n      'the' =\u003e 'Determiner',\n      'my' =\u003e 'Determiner',\n      'in' =\u003e 'Preposition',\n      'on' =\u003e 'Preposition',\n      'by' =\u003e 'Preposition',\n      'with' =\u003e 'Preposition'\n    }.freeze\n```  \n\n\n### Creating a tokenizer\n```ruby\n  require 'strscan'\n\n    # A tokenizer reads the input string and converts it into a sequence of tokens.\n    # Remark: Rley doesn't provide tokenizer functionality.\n    # Highly simplified tokenizer implementation\n    def tokenizer(aTextToParse)\n      scanner = StringScanner.new(aTextToParse)\n      tokens = []\n\n      loop do\n        scanner.skip(/\\s+/)\n        curr_pos = scanner.pos\n        word = scanner.scan(/\\S+/)\n        break unless word\n\n        term_name = Lexicon[word]\n        raise StandardError, \"Word '#{word}' not found in lexicon\" if term_name.nil?\n        pos = Rley::Lexical::Position.new(1, curr_pos + 1)\n        tokens \u003c\u003c Rley::Lexical::Token.new(word, term_name, pos)\n      end\n\n      return tokens\n    end\n```\n\nMore ambitious NLP applications will surely rely on a Part-of-Speech tagger instead of\ncreating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speech gems:  \n* [engtagger](https://rubygems.org/gems/engtagger)\n* [rbtagger](https://rubygems.org/gems/rbtagger)\n\n\n### Parsing some input\n```ruby\n    input_to_parse = 'John saw Mary with a telescope'\n    # Convert input text into a sequence of token objects...\n    tokens = tokenizer(input_to_parse)\n    result = engine.parse(tokens)\n\n    puts \"Parsing successful? #{result.success?}\" # =\u003e Parsing successful? true\n```\n\nAt this stage, we're done with parsing. What we need next are convenient means\nto exploit the parse result. As it is, the `result` variable in the last code snippet\nabove is a data structure (\"Earley item sets\") that is highly depending on the intricate details\nof the Earley's parsing algorithm. Obviously, it contains all the necessary data to exploit\nthe parsing results but it is rather low-level and inconvenient from a programming viewpoint.\nTherefore, __Rley__ provides out of the box two convenient data structures for\nrepresenting the parse outcome:\n- Parse tree (optimal when the parse is unambiguous)   \n- Parse forest (a more sophisticated data structure that copes with ambiguity)\n\nFor our whirlwind tour, we will opt for parse trees.\n\n### Generating the parse tree\n\n```ruby\n    ptree = engine.convert(result)\n```  \nOK. Now that we have the parse tree, what we can do with it?\nOne option is to manipulate the parse tree and its node directly. For instance,\none could write code to customize and transform the parse tree. This approach gives\nmost the of flexibility needed for advanced applications. The other, more common\noption is to use an `Rley::ParseTreeVisitor` instance.\nSuch a visitor walks over the parse tree nodes and generates visit events that\nare dispatched to subscribed event listeners. All this may, at first, sound\ncomplicated but the coming code snippets show it otherwise.\n\nLet's do it by:  \n- Creating a parse tree visitor  \n- Using one of the built-in visit subscribers specifically created to render the\n parse tree in a given output format.  \n\n#### Creating a parse tree visitor  \nGood news: creating a parse tree visitor for the parse tree `ptree` is just\nan one-liner:\n\n```ruby\n    # Let's create a parse tree visitor\n    visitor = engine.ptree_visitor(ptree)\n```\n\n#### Visiting the parse tree\n\nUnsurprisingly, to start the parse tree visit, one calls the `#start` method:\n\n```ruby\n    visitor.start\n```\n\nIf you try the above line, no particular result will be visible and for a good reason:\nno object was specified as a visit event subscriber. As a convenience, __Rley__\nbundles a number of [formatter classes](https://github.com/famished-tiger/Rley/tree/master/lib/rley/formatter)\nthat were designed to listen to the visit event and then render the parse tree\nin a specific format. To begin with, we'll use the simple formatter\n`Rley::Formatter::Debug` class. Its purpose is just to print out the visit event\nname.\n\nRemove the line with the call to the `#start` method and replace it with the two\nstatements:\n```ruby\n    # Let's create a formatter (i.e. visit event listener)\n    renderer = Rley::Formatter::Debug.new($stdout)\n\n    # Subscribe the formatter to the visitor's event and launch the visit\n    renderer.render(visitor)    \n```\n\nThese two lines will generate the following output:\n```\nbefore_ptree\n  before_non_terminal\n    before_subnodes\n      before_non_terminal\n        before_subnodes\n          before_terminal\n          after_terminal\n        after_subnodes\n      after_non_terminal\n      before_non_terminal\n        before_subnodes\n          before_terminal\n          after_terminal\n          before_non_terminal\n            before_subnodes\n              before_terminal\n              after_terminal\n            after_subnodes\n          after_non_terminal\n          before_non_terminal\n            before_subnodes\n              before_terminal\n              after_terminal\n              before_non_terminal\n                before_subnodes\n                  before_terminal\n                  after_terminal\n                  before_terminal\n                  after_terminal\n                after_subnodes\n              after_non_terminal\n            after_subnodes\n          after_non_terminal\n        after_subnodes\n      after_non_terminal\n    after_subnodes\n  after_non_terminal\nafter_ptree\n```\n\nAt least is something visible: these are the parse tree visit events.\nNote that the indentation of event names depends on the nesting level of\nthe tree node being visited.\n\nNot really impressive? So let's use another formatter...\n\n#### Visualizing the parse tree structure\nIf one replaces the previous formatter by an instance of\n`Rley::Formatter::Asciitree` the output now shows the parse tree structure.\n\n```ruby\n    # Let's create a formatter that will render the parse tree with characters\n    renderer = Rley::Formatter::Asciitree.new($stdout)\n\n    # Subscribe the formatter to the visitor's event and launch the visit\n    renderer.render(visitor)   \n```\n\nThe outputs looks like this:\n```\nS\n+-- NP\n|   +-- Proper-Noun: 'John'\n+-- VP\n    +-- Verb: 'saw'\n    +-- NP\n    |   +-- Proper-Noun: 'Mary'\n    +-- PP\n        +-- Preposition: 'with'\n        +-- NP\n            +-- Determiner: 'a'\n            +-- Noun: 'telescope'\n```\n\nIf you are more inclined for graphical representation, then replace the last formatter\nby yet another one:\n\n```ruby\n    # Let's create a formatter that will render the parse tree in labelled bracket notation\n    renderer = Rley::Formatter::BracketNotation.new($stdout)\n\n    # Subscribe the formatter to the visitor's event and launch the visit\n    renderer.render(visitor)   \n```\n\nThis results in the strange-looking output:\n```\n[S [NP [Proper-Noun John]][VP [Verb saw][NP [Proper-Noun Mary]][PP [Preposition with][NP [Determiner a][Noun telescope]]]]]\n```\n\nThis output is in a format that is recognized by many NLP softwares.\nThe next diagram was created by copy-pasting the output above in the online tool\n[RSyntaxTree](http://yohasebe.com/rsyntaxtree/).\nBy the way, this tool is also a Ruby gem, [rsyntaxtree](https://rubygems.org/gems/rsyntaxtree).\n\n![Sample parse tree diagram](www/sample_parse_tree.png)\n\n\n## Error reporting\n__Rley__ is a non-violent parser, that is, it won't throw an exception when it\ndetects a syntax error. Instead, the parse result will be marked as\nnon-successful. The parse error can then be identified by calling the\n`GFGParsing#failure_reason` method. This method returns an error reason object\nwhich can help to produce an error message.  \n\nConsider the example from the [Parsing some input](#parsing-some-input) section\nabove and, as an error, we delete the verb `saw` in the sentence to parse.  \n\n```ruby\n    # Verb has been removed from the sentence on next line\n    input_to_parse = 'John Mary with a telescope'\n    # Convert input text into a sequence of token objects...\n    tokens = tokenizer(input_to_parse)\n    result = engine.parse(tokens)\n\n    puts \"Parsing successful? #{result.success?}\" # =\u003e Parsing successful? false\n    exit(1)\n```\n\nAs expected, the parse is now failing.  \nTo get an error message, one just need to retrieve the error reason and\nask it to generate a message.  \n```ruby\n    # Show error message if parse fails...\n    puts result.failure_reason.message unless result.success?\n```\n\nRe-running the example with the error, results in the error message:\n```\n  Syntax error at or near token line 1, column 6 \u003e\u003e\u003eMary\u003c\u003c\u003c\n  Expected one 'Verb', found a 'Proper-Noun' instead.\n```\n\nThe standard __Rley__ message not only inform about the location of\nthe mistake, it also provides some hint by disclosing its expectations.\n\nLet's experiment again with the original sentence but without the word\n`telescope`.\n\n```ruby\n    # Last word has been removed from the sentence on next line\n    input_to_parse = 'John saw Mary with a '\n    # Convert input text into a sequence of token objects...\n    tokens = tokenizer(input_to_parse)\n    result = engine.parse(tokens)\n\n    puts \"Parsing successful? #{result.success?}\" # =\u003e Parsing successful? false\n    unless result.success?\n      puts result.failure_reason.message\n      exit(1)\n    end\n```\n\nThis time, the following output is displayed:\n```\n  Parsing successful? false\n  Premature end of input after 'a' at position line 1, column 20\n  Expected one 'Noun'.\n```\nAgain, the resulting error message is user-friendly.  \n\n\n## Examples\n\nThe project source directory contains several example scripts that demonstrate\nhow grammars are to be constructed and used.\n\n\n## Other similar Ruby projects\n__Rley__ isn't the sole implementation of the Earley parser algorithm in Ruby.  \nHere are a few other ones:  \n- [Kanocc gem](https://rubygems.org/gems/kanocc) -- Advertised as a Ruby based parsing and translation framework.  \n  Although the gem dates from 2009, the author still maintains its in a public repository in [Github](https://github.com/surlykke/Kanocc)  \n  The grammar symbols (tokens and non-terminals) must be represented as (sub)classes.\n  Grammar rules are methods of the non-terminal classes. A rule can have a block code argument\n  that specifies the semantic action when that rule is applied.  \n- [lc1 project](https://github.com/kp0v/lc1) -- Advertised as a combination of Earley and Viterbi algorithms for [Probabilistic] Context Free Grammars   \n  Aimed in parsing brazilian portuguese.  \n  [earley project](https://github.com/joshingly/earley) -- An Earley parser (grammar rules are specified in JSON format).  \n  The code doesn't seem to be maintained: latest commit dates from Nov. 2011.  \n- [linguist project](https://github.com/davidkellis/linguist) -- Advertised as a library for parsing context-free languages.  \n  It is a recognizer not a parser. In other words it can only tell whether a given input\n  conforms to the grammar rules or not. As such it cannot build parse trees.  \n  The code doesn't seem to be maintained: latest commit dates from Oct. 2011.\n\n## Other interesting Ruby resources\nThe extensive resource list not to miss: [Awesome NLP with Ruby](https://github.com/arbox/nlp-with-ruby)\nactively curated by Andrei Beliankou (aka arbox).\n\n##  Thanks to:\n* Professor Keshav Pingali, one of the creators of the Grammar Flow Graph parsing approach for his encouraging e-mail exchange.\n* [Arjun Menon](https://github.com/arjunmenon) for his NLP example that uses `engtagger` gem.\n* [Gui Heurich](https://github.com/GuiHeurich) for spotting a mistake in the code sample in `README` file.\n\n## Grammar Flow Graph\nSince the Grammar Flow Graph parsing approach is quite new, it has not yet taken a place in\nstandard parser textbooks. Here are a few references (and links) of papers on GFG:    \n- K. Pingali, G. Bilardi. [Parsing with Pictures](http://apps.cs.utexas.edu/tech_reports/reports/tr/TR-2102.pdf)\n- K. Pingali, G. Bilardi. [A Graphical Model for Context-Free Grammar Parsing.](https://link.springer.com/chapter/10.1007/978-3-662-46663-6_1)\n  In : International Conference on Compiler Construction. Springer Berlin Heidelberg, 2015. p. 3-27.  \n- M. Fulbright. [An Evaluation of Two Approaches to Parsing](http://apps.cs.utexas.edu/tech_reports/reports/tr/TR-2199.pdf)  \n\n\nCopyright\n---------\nCopyright (c) 2014-2022, Dimitri Geshef.  \n__Rley__ is released under the MIT License see [LICENSE.txt](https://github.com/famished-tiger/Rley/blob/master/LICENSE.txt) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffamished-tiger%2Frley","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffamished-tiger%2Frley","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffamished-tiger%2Frley/lists"}