Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vadyushkins/kotgll
Simple and fast Kotlin implementation of GLL parsing algorithm
https://github.com/vadyushkins/kotgll
cfpq-algo context-free-grammar gll gll-parser graph-algorithms graph-theory kotlin
Last synced: 3 months ago
JSON representation
Simple and fast Kotlin implementation of GLL parsing algorithm
- Host: GitHub
- URL: https://github.com/vadyushkins/kotgll
- Owner: vadyushkins
- License: mit
- Created: 2023-03-22T14:58:18.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-22T02:47:13.000Z (8 months ago)
- Last Synced: 2024-10-13T17:05:51.762Z (3 months ago)
- Topics: cfpq-algo, context-free-grammar, gll, gll-parser, graph-algorithms, graph-theory, kotlin
- Language: Kotlin
- Homepage:
- Size: 193 KB
- Stars: 5
- Watchers: 3
- Forks: 5
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# kotgll
[![Build](https://github.com/vadyushkins/kotgll/actions/workflows/build.yml/badge.svg?branch=main)](https://github.com/vadyushkins/kotgll/actions/workflows/build.yml)
[![Test](https://github.com/vadyushkins/kotgll/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/vadyushkins/kotgll/actions/workflows/test.yml)* [About](https://github.com/vadyushkins/kotgll#about)
* [Usage](https://github.com/vadyushkins/kotgll#usage)
* [From sources](https://github.com/vadyushkins/kotgll#from-sources)
* [Using JAR](https://github.com/vadyushkins/kotgll#using-jar)
* [CFG Format Example](https://github.com/vadyushkins/kotgll#cfg-format-example)
* [RSM Format Example](https://github.com/vadyushkins/kotgll#rsm-format-example)
* [Performance](https://github.com/vadyushkins/kotgll#performance)
* [Graphs](https://github.com/vadyushkins/kotgll#graphs)
* [Grammars](https://github.com/vadyushkins/kotgll#grammars)
* [Results](https://github.com/vadyushkins/kotgll#results)
* [More results](https://github.com/vadyushkins/kotgll#more-results)> Note: project under heavy development!
## About
**kotgll** is an open-source project for implementing the GLL algorithm and its modifications in Kotlin## Usage
### Command Line Interface
```text
Usage: kotgll options_list
Options:
--input -> Input format (always required) { Value should be one of [string, graph] }
--grammar -> Grammar format (always required) { Value should be one of [cfg, rsm] }
--sppf [ON] -> Sppf mode { Value should be one of [on, off] }
--inputPath -> Path to input txt file (always required) { String }
--grammarPath -> Path to grammar txt file (always required) { String }
--outputPath -> Path to output txt file (always required) { String }
--help, -h -> Usage info
```### From sources
#### Step 1. Clone repository
`git clone https://github.com/vadyushkins/kotgll.git`
or
`git clone [email protected]:vadyushkins/kotgll.git`
or
`gh repo clone vadyushkins/kotgll`
#### Step 2. Go to the folder
`cd kotgll`
#### Step 3. Run the help command
`gradle run --args="--help"`
You will see the ["Options list"](https://github.com/vadyushkins/kotgll#command-line-interface) message.
#### Example
```text
gradle run --args="--input graph --grammar rsm --sppf off --inputPath src/test/resources/cli/TestGraphReadWriteCSV/dyck.csv --grammarPath src/test/resources/cli/TestRSMReadWriteTXT/dyck.txt --outputPath ./result.txt"
```### Using JAR
#### Step 1. Download the latest JAR
```text
curl -L -O https://github.com/vadyushkins/kotgll/releases/download/v1.0.0/kotgll-1.0.0.jar
```#### Step 2. Run JAR with Java
```text
java -jar kotgll-1.0.0.jar --input graph --grammar rsm --sppf off --inputPath src/test/resources/cli/TestGraphReadWriteCSV/dyck.csv --grammarPath src/test/resources/cli/TestRSMReadWriteTXT/dyck.txt --outputPath ./result.txt
```
### CFG Format Example```text
StartNonterminal("S")
Nonterminal("S") -> Terminal("subClassOf_r") Nonterminal("S") Terminal("subClassOf")
Nonterminal("S") -> Terminal("subClassOf_r") Terminal("subClassOf")
Nonterminal("S") -> Terminal("type_r") Nonterminal("S") Terminal("type")
Nonterminal("S") -> Terminal("type_r") Terminal("type")
```### RSM Format Example
```text
StartState(id=0,nonterminal=Nonterminal("S"),isStart=true,isFinal=false)
State(id=0,nonterminal=Nonterminal("S"),isStart=true,isFinal=false)
State(id=1,nonterminal=Nonterminal("S"),isStart=false,isFinal=false)
State(id=4,nonterminal=Nonterminal("S"),isStart=false,isFinal=false)
State(id=3,nonterminal=Nonterminal("S"),isStart=false,isFinal=true)
State(id=2,nonterminal=Nonterminal("S"),isStart=false,isFinal=false)
State(id=6,nonterminal=Nonterminal("S"),isStart=false,isFinal=true)
State(id=5,nonterminal=Nonterminal("S"),isStart=false,isFinal=false)
TerminalEdge(tail=0,head=1,terminal=Terminal("subClassOf_r"))
TerminalEdge(tail=0,head=4,terminal=Terminal("type_r"))
TerminalEdge(tail=1,head=3,terminal=Terminal("subClassOf"))
NonterminalEdge(tail=1,head=2,nonterminal=Nonterminal("S"))
TerminalEdge(tail=4,head=6,terminal=Terminal("type"))
NonterminalEdge(tail=4,head=5,nonterminal=Nonterminal("S"))
TerminalEdge(tail=2,head=3,terminal=Terminal("subClassOf"))
TerminalEdge(tail=5,head=6,terminal=Terminal("type"))
```## Performance
The GLL algorithm has been modified to support graph input.
The proposed modification has been evaluated on several real graphs for the scenario of finding all pairs of reachability.**Machine configuration**: PC with Ubuntu 20.04, Intel Core i7-6700 3.40GHz CPU, DDR4 64Gb RAM.
**Enviroment configuration**:
* Java HotSpot(TM) 64-Bit server virtual machine (build 15.0.2+7-27, mixed mode, sharing).
* JVM heap configuration: 55Gb both xms and xmx.### Graphs
The graph data is selected from [CFPQ_Data dataset](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data).
A detailed description of the graphs is listed bellow.
#### RDF analysis graphs
| Graph name | \|*V*\| | \|*E*\| | #subClassOf | #type | #broaderTransitive |
|:------------|----------:|------------:|-------------:|-----------:|--------------------:|
| [Enzyme](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data/graphs/data/enzyme.html#enzyme) | 48 815 | 86 543 | 8 163 | 14 989 | 8 156 |
| [Eclass](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data/graphs/data/eclass.html#eclass) | 239 111 | 360 248 | 90 962 | 72 517 | 0 |
| [Go hierarchy](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data/graphs/data/go_hierarchy.html#go-hierarchy) | 45 007 | 490 109 | 490 109 | 0 | 0 |
| [Go](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data/graphs/data/go.html#go) | 582 929 | 1 437 437 | 94 514 | 226 481 | 0 |
| [Geospecies](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data/graphs/data/geospecies.html#geospecies) | 450 609 | 2 201 532 | 0 | 89 065 | 20 867 |
| [Taxonomy](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data/graphs/data/taxonomy.html#taxonomy) | 5 728 398 | 14 922 125 | 2 112 637 | 2 508 635 | 0 |### Grammars
All queries used in evaluation are variants of same-generation query.
The inverse of an ```x``` relation and the respective edge is denoted as ```x_r```.
Grammars used for **RDF** graphs:
**G1**
```
S -> subClassOf_r S subClassOf | subClassOf_r subClassOf
| type_r S type | type_r type
```The representation of **G1** context-free grammar in the repository can be found [here](https://github.com/vadyushkins/kotgll/blob/main/src/test/resources/cli/TestCFGReadWriteTXT/g1.txt).
The representation of **G1** context-free grammar as recursive automaton in the repository can be found [here](https://github.com/vadyushkins/kotgll/blob/main/src/test/resources/cli/TestRSMReadWriteTXT/g1.txt).
### Results
The results of the **all pairs reachability** queries evaluation on graphs related to **RDF analysis** are listed below.
In each row, the best mean time in seconds is highlighted in **bold**.
| Graph | [CFG](https://github.com/vadyushkins/kotgll/tree/main/src/main/kotlin/org/kotgll/cfg/graphinput/withoutsppf) | [RSM](https://github.com/vadyushkins/kotgll/tree/main/src/main/kotlin/org/kotgll/rsm/graphinput/withoutsppf) | [GLL4Graph](https://github.com/FormalLanguageConstrainedPathQuerying/GLL4Graph) |
|-------------- |:-----: |:---------: |:---------: |
| [Enzyme](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data/graphs/data/enzyme.html#enzyme) | 0.107 | **0.044** | 0.22 |
| [Eclass](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data/graphs/data/eclass.html#eclass) | 0.94 | **0.43** | 1.5 |
| [Go hierarchy](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data/graphs/data/go_hierarchy.html#go-hierarchy) | 4.1 | **3.0** | 3.6 |
| [Go](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data/graphs/data/go.html#go) | 3.2 | **1.86** | 5.55 |
| [Geospecies](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data/graphs/data/geospecies.html#geospecies) | 0.97 | **0.34** | 2.89 |
| [Taxonomy](https://formallanguageconstrainedpathquerying.github.io/CFPQ_Data/graphs/data/taxonomy.html#taxonomy) | 31.2 | **14.8** | 45.4 |#### More results
More results, but in raw form, can be found in repository [kotgll_benchmarks](https://github.com/vadyushkins/kotgll_benchmarks/)