Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/touhi99/parsing
Parsing Course Project - A CYK implementation with CNF
https://github.com/touhi99/parsing
cky cnf cyk-algorithm nltk parsing
Last synced: about 2 months ago
JSON representation
Parsing Course Project - A CYK implementation with CNF
- Host: GitHub
- URL: https://github.com/touhi99/parsing
- Owner: touhi99
- Created: 2017-12-25T17:33:34.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-06-22T10:27:28.000Z (over 6 years ago)
- Last Synced: 2023-11-26T21:28:19.821Z (about 1 year ago)
- Topics: cky, cnf, cyk-algorithm, nltk, parsing
- Language: Python
- Homepage:
- Size: 5.86 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
CYK Parsing Implementation
Programming Language- Python 3.6.1
Overview:
CKY.py
* Class ConverCNF()
* startSymbolAdd() - function adds a new start symbol if existing symbol already on the right hand side of the rules
* eliminateEpsilon() - if rule contains any ‘ε’ in the right hand side, it removes and restructure the grammar accordingly
* eliminateVariableUnit() - This function removes single production non-terminal from the right hand side
* moveTerminalToUnits() - this function add unit non-terminal for terminals
* replaceLongProd() - replace any non-terminal production more than 2 are adjusted in the form of A -> B C
* Class CYK()
* readGrammar() - read input grammar, add the rules as dictionary, where key goes the left-hand side rule and value contains right hand-side rules in a list
* readString() - read input string from the text files
* readOutput() - print the modified grammar & CKY table
* parser() - CKY parser algorithmDependencies:
re - should be included with python by default
NLTK-
* Pip3 install nltkUsage:
* On terminal where the files exist, write command
Python3 CYK.py
* Sample grammar file as included
* First line contains the start symbol
* Epsilon defines with ‘ε’ symbol
* Terminals are separated by single-quote (‘’) e.g. ‘John’, ‘man’
* Multiple terminals and non-terminals are separated by whitespace
* Sample string file as included- each line contains separate string