https://github.com/paebbels/pyvhdlparser
Streaming based VHDL parser.
https://github.com/paebbels/pyvhdlparser
language-model parser python-3 vhdl
Last synced: 6 months ago
JSON representation
Streaming based VHDL parser.
- Host: GitHub
- URL: https://github.com/paebbels/pyvhdlparser
- Owner: Paebbels
- License: other
- Created: 2016-09-21T13:20:39.000Z (over 9 years ago)
- Default Branch: main
- Last Pushed: 2024-07-15T05:45:31.000Z (over 1 year ago)
- Last Synced: 2024-10-24T23:40:06.234Z (over 1 year ago)
- Topics: language-model, parser, python-3, vhdl
- Language: Python
- Homepage: https://paebbels.github.io/pyVHDLParser/
- Size: 7.51 MB
- Stars: 78
- Watchers: 11
- Forks: 15
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
[](https://github.com/Paebbels/pyVHDLParser)
[](LICENSE.md)
[](https://github.com/Paebbels/pyVHDLParser/tags)
[](https://github.com/Paebbels/pyVHDLParser/releases/latest)
[](https://github.com/Paebbels/pyVHDLParser/releases)
[](https://github.com/Paebbels/pyVHDLParser/actions?query=workflow%3A%22Test%2C+Coverage+and+Release%22)
[](https://pypi.org/project/pyVHDLParser/)


[](https://github.com/Paebbels/pyVHDLParser/network/dependents)
[](https://libraries.io/github/Paebbels/pyVHDLParser)
[](https://www.codacy.com/manual/Paebbels/pyVHDLParser)
[](https://www.codacy.com/manual/Paebbels/pyVHDLParser)
[](https://codecov.io/gh/Paebbels/pyVHDLParser)
[](https://libraries.io/github/Paebbels/pyVHDLParser/sourcerank)
[](https://pyVHDLParser.readthedocs.io/en/latest/)
# pyVHDLParser
This is a token-stream based parser for VHDL-2008.
This project requires Python 3.8+.
## Introduction
### Main Goals
* **Parsing**
* slice an input document into **tokens** and text **blocks** which are categorized
* preserve case, whitespace and comments
* recover on parsing errors
* good error reporting / throw exceptions
* **Fast Processing**
* multi-pass parsing and analysis
* delay analysis if not needed at current pass
* link tokens and blocks for fast-forward scanning
* **Generic VHDL Language Model**
* Assemble a document-object-model (Code-DOM)
* Provide an API for code introspection
### Use Cases
* generate documentation by using the fast-forward scanner
* generate a document/language model by using the grouped text-block scanner
* extract compile orders and other dependency graphs
* generate highlighted syntax
* re-annotate documenting comments to their objects for doc extraction
### Parsing approach
1. slice an input document into **tokens**
2. assemble tokens to text **blocks** which are categorized
3. assemble text blocks for fast-forward scanning into **groups**
4. translate groups into a document-object-model (DOM)
5. provide a generic VHDL language model
### Long time goals
* A Sphinx language plugin for VHDL
TODO: Move the following documentation to ReadTheDocs and replace it with a more lightweight version.
## Basic Concept
[![][concept]][concept]
[concept]: https://raw.githubusercontent.com/Paebbels/pyVHDLParser/master/doc/images/Linking_TokenBlockGroup.png
## Example 1
This is an input file:
```VHDL
-- Copryright 2016
library IEEE;
use IEEE.std_logic_1164.all;
entity myEntity is
generic (
BITS : positive := 8
);
port (
Clock : in std_logic;
Output : out std_logic_vector(BITS - 1 downto 0)
);
end entity;
architecture rtl of myEntity is
constant const0 : integer := 5;
begin
process(Clock)
begin
end process;
end architecture;
library IEEE, PoC;
use PoC.Utils.all, PoC.Common.all;
package pkg0 is
function func0(a : integer) return string;
end package;
package body Components is
function func0(a : integer) return string is
procedure proc0 is
begin
end procedure;
begin
end function
end package body;
```
### Step 1
The input file (stream of characters) is translated into stream of basic tokens:
* `StartOfDocumentToken`
* `LinebreakToken`
* `SpaceToken`
* `IndentationToken`
* `WordToken`
* `CharacterToken`
* `FusedCharacterToken`
* CommentToken
* `SingleLineCommentToken`
* `MultiLineCommentToken`
* `EndOfDocumentToken`
The stream looks like this:
```
```
[![Screenshot][10]][10]
### Step 2
The token stream from step 1 is translated into typed tokens like
`DelimiterToken` (:), `EndToken` (;) or subtypes of `KeywordToken`.
These tokens are then grouped into blocks.
The example generates:
```
[StartOfDocumentBlock]
[Blocks.CommentBlock '-- Copryright 2016\n' at (line: 1, col: 1) .. (line: 1, col: 19)]
[LibraryStatement.LibraryBlock 'library ' at (line: 2, col: 1) .. (line: 2, col: 8)]
[LibraryStatement.LibraryNameBlock 'IEEE' at (line: 2, col: 9) .. (line: 2, col: 13)]
[LibraryStatement.LibraryEndBlock ';' at (line: 2, col: 13) .. (line: 2, col: 13)]
[LinebreakBlock at (line: 2, col: 14) .. (line: 2, col: 14)]
[Use.UseBlock 'use ' at (line: 3, col: 1) .. (line: 3, col: 8)]
[Use.UseNameBlock 'IEEE.std_logic_1164.all' at (line: 3, col: 9) .. (line: 3, col: 32)]
[Use.UseEndBlock ';' at (line: 3, col: 32) .. (line: 3, col: 32)]
[LinebreakBlock at (line: 3, col: 33) .. (line: 3, col: 33)]
[EmptyLineBlock at (line: 4, col: 1) .. (line: 4, col: 1)]
[Entity.NameBlock 'entity myEntity is' at (line: 5, col: 1) .. (line: 5, col: 19)]
[LinebreakBlock at (line: 5, col: 19) .. (line: 5, col: 19)]
[IndentationBlock length=1 (2) at (line: 6, col: 1) .. (line: 6, col: 1)]
[GenericList.OpenBlock 'generic (' at (line: 6, col: 2) .. (line: 6, col: 10)]
[LinebreakBlock at (line: 6, col: 11) .. (line: 6, col: 11)]
[IndentationBlock length=2 (4) at (line: 7, col: 1) .. (line: 7, col: 2)]
[GenericList.ItemBlock 'BITS : positive := 8\n\t' at (line: 7, col: 3) .. (line: 8, col: 1)]
[GenericList.CloseBlock ');' at (line: 8, col: 2) .. (line: 8, col: 3)]
[LinebreakBlock at (line: 8, col: 4) .. (line: 8, col: 4)]
[IndentationBlock length=1 (2) at (line: 9, col: 1) .. (line: 9, col: 1)]
[PortList.OpenBlock 'port (' at (line: 9, col: 2) .. (line: 9, col: 7)]
[LinebreakBlock at (line: 9, col: 8) .. (line: 9, col: 8)]
[IndentationBlock length=2 (4) at (line: 10, col: 1) .. (line: 10, col: 2)]
[PortList.ItemBlock 'Clock : in std_logic' at (line: 10, col: 3) .. (line: 10, col: 26)]
[PortList.DelimiterBlock ';' at (line: 10, col: 26) .. (line: 10, col: 26)]
[LinebreakBlock at (line: 10, col: 27) .. (line: 10, col: 27)]
[IndentationBlock length=2 (4) at (line: 11, col: 1) .. (line: 11, col: 2)]
[PortList.ItemBlock 'Output\t: out\tstd_logic_vector(BITS - 1 downto 0)\n\t' at (line: 11, col: 3) .. (line: 12, col: 1)]
[PortList.CloseBlock ');' at (line: 12, col: 2) .. (line: 12, col: 3)]
[LinebreakBlock at (line: 12, col: 4) .. (line: 12, col: 4)]
[Entity.EndBlock 'end entity;' at (line: 13, col: 1) .. (line: 13, col: 11)]
[LinebreakBlock at (line: 13, col: 12) .. (line: 13, col: 12)]
[EmptyLineBlock at (line: 14, col: 1) .. (line: 14, col: 1)]
[Architecture.NameBlock 'architecture rtl of myEntity is' at (line: 15, col: 1) .. (line: 15, col: 32)]
[LinebreakBlock at (line: 15, col: 32) .. (line: 15, col: 32)]
[IndentationBlock length=1 (2) at (line: 16, col: 1) .. (line: 16, col: 1)]
[Constant.ConstantBlock 'constant const0 : integer := 5;' at (line: 16, col: 2) .. (line: 16, col: 32)]
[LinebreakBlock at (line: 16, col: 33) .. (line: 16, col: 33)]
[EmptyLineBlock at (line: 17, col: 6) .. (line: 17, col: 6)]
[IndentationBlock length=1 (2) at (line: 18, col: 1) .. (line: 18, col: 1)]
[Process.OpenBlock 'process(' at (line: 18, col: 2) .. (line: 18, col: 9)]
[SensitivityList.ItemBlock 'Clock' at (line: 18, col: 10) .. (line: 18, col: 15)]
[Process.OpenBlock2* ')' at (line: 18, col: 15) .. (line: 18, col: 15)]
[LinebreakBlock at (line: 18, col: 16) .. (line: 18, col: 16)]
...
```
The following screenshot shows the resulting stream of blocks:
[![][20]][20]
[outdated]
The block stream can also be "opened" to show the stream of tokens within each block. This is shown in the next screenshot:
[![][21]][21]
### Step 3
The stream of blocks from step 2 is transformed into a stream of groups.
### Step 4
One of many post processing steps could be to remove whitespaces, indentation and comment blocks. So a filter can be applied to remove these block types. Additionally, multiparted blocks (e.g. if a comment or linebreak was inserted between consecutive code sequences, which belong to one block) can be fused to one single block.
This screenshot shows the filtered results:
[![][30]][30]
[10]: https://raw.githubusercontent.com/Paebbels/pyVHDLParser/master/doc/screens/TokenStream_Example_1.png
[20]: https://raw.githubusercontent.com/Paebbels/pyVHDLParser/master/doc/screens/BlockStream_Example_1.png
[21]: https://raw.githubusercontent.com/Paebbels/pyVHDLParser/master/doc/screens/BlockStream_Uses_Detailed.png
[30]: https://raw.githubusercontent.com/Paebbels/pyVHDLParser/master/doc/screens/BlockStream_Uses_Fused.png
## Example 2 - Simple_1
This is an input file:
```VHDL
-- Copryright 2016
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
entity myEntity is
generic (
BITS : positive := 8
);
port (
Clock : in std_logic;
Reset : in std_logic;
Output : out std_logic_vector(BITS - 1 downto 0)
);
end entity;
architecture rtl of myEntity is
begin
end architecture;
```
This is the result stream:
[![][40]][40]
And this is the filtered and fused result stream:
[![][41]][41]
[40]: https://raw.githubusercontent.com/Paebbels/pyVHDLParser/master/doc/screens/BlockStream_Simple_1.png
[41]: https://raw.githubusercontent.com/Paebbels/pyVHDLParser/master/doc/screens/BlockStream_Simple_1_Fused.png
## Contributors
* [Patrick Lehmann](https://github.com/Paebbels) (Maintainer)
* [and more...](https://github.com/Paebbels/pyVHDLParser/graphs/contributors)
## License
This Python package (source code) licensed under [Apache License 2.0](LICENSE.md).
The accompanying documentation is licensed under [Creative Commons - Attribution 4.0 (CC-BY 4.0)](doc/Doc-License.rst).
-------------------------
SPDX-License-Identifier: Apache-2.0