Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/c2nes/javalang

Pure Python Java parser and tools
https://github.com/c2nes/javalang

java library parser python

Last synced: 9 days ago
JSON representation

Pure Python Java parser and tools

Awesome Lists containing this project

README

        

========
javalang
========

.. image:: https://travis-ci.org/c2nes/javalang.svg?branch=master
:target: https://travis-ci.org/c2nes/javalang

.. image:: https://badge.fury.io/py/javalang.svg
:target: https://badge.fury.io/py/javalang

javalang is a pure Python library for working with Java source
code. javalang provides a lexer and parser targeting Java 8. The
implementation is based on the Java language spec available at
http://docs.oracle.com/javase/specs/jls/se8/html/.

The following gives a very brief introduction to using javalang.

---------------
Getting Started
---------------

.. code-block:: python

>>> import javalang
>>> tree = javalang.parse.parse("package javalang.brewtab.com; class Test {}")

This will return a ``CompilationUnit`` instance. This object is the root of a
tree which may be traversed to extract different information about the
compilation unit,

.. code-block:: python

>>> tree.package.name
u'javalang.brewtab.com'
>>> tree.types[0]
ClassDeclaration
>>> tree.types[0].name
u'Test'

The string passed to ``javalang.parse.parse()`` must represent a complete unit
which simply means it should represent a complete, valid Java source file. Other
methods in the ``javalang.parse`` module allow for some smaller code snippets to
be parsed without providing an entire compilation unit.

Working with the syntax tree
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``CompilationUnit`` is a subclass of ``javalang.ast.Node``, as are its
descendants in the tree. The ``javalang.tree`` module defines the different
types of ``Node`` subclasses, each of which represent the different syntaxual
elements you will find in Java code. For more detail on what node types are
available, see the ``javalang/tree.py`` source file until the documentation is
complete.

``Node`` instances support iteration,

.. code-block:: python

>>> for path, node in tree:
... print path, node
...
() CompilationUnit
(CompilationUnit,) PackageDeclaration
(CompilationUnit, [ClassDeclaration]) ClassDeclaration

This iteration can also be filtered by type,

.. code-block:: python

>>> for path, node in tree.filter(javalang.tree.ClassDeclaration):
... print path, node
...
(CompilationUnit, [ClassDeclaration]) ClassDeclaration

---------------
Component Usage
---------------

Internally, the ``javalang.parse.parse`` method is a simple method which creates
a token stream for the input, initializes a new ``javalang.parser.Parser``
instance with the given token stream, and then invokes the parser's ``parse()``
method, returning the resulting ``CompilationUnit``. These components may be
also be used individually.

Tokenizer
^^^^^^^^^

The tokenizer/lexer may be invoked directly be calling ``javalang.tokenizer.tokenize``,

.. code-block:: python

>>> javalang.tokenizer.tokenize('System.out.println("Hello " + "world");')

This returns a generator which provides a stream of ``JavaToken`` objects. Each
token carries position (line, column) and value information,

.. code-block:: python

>>> tokens = list(javalang.tokenizer.tokenize('System.out.println("Hello " + "world");'))
>>> tokens[6].value
u'"Hello "'
>>> tokens[6].position
(1, 19)

The tokens are not directly instances of ``JavaToken``, but are instead
instances of subclasses which identify their general type,

.. code-block:: python

>>> type(tokens[6])

>>> type(tokens[7])

**NOTE:** The shift operators ``>>`` and ``>>>`` are represented by multiple
``>`` tokens. This is because multiple ``>`` may appear in a row when closing
nested generic parameter/arguments lists. This abiguity is instead resolved by
the parser.

Parser
^^^^^^

To parse snippets of code, a parser may be used directly,

.. code-block:: python

>>> tokens = javalang.tokenizer.tokenize('System.out.println("Hello " + "world");')
>>> parser = javalang.parser.Parser(tokens)
>>> parser.parse_expression()
MethodInvocation

The parse methods are designed for incremental parsing so they will not restart
at the beginning of the token stream. Attempting to call a parse method more
than once will result in a ``JavaSyntaxError`` exception.

Invoking the incorrect parse method will also result in a ``JavaSyntaxError``
exception,

.. code-block:: python

>>> tokens = javalang.tokenizer.tokenize('System.out.println("Hello " + "world");')
>>> parser = javalang.parser.Parser(tokens)
>>> parser.parse_type_declaration()
Traceback (most recent call last):
File "", line 1, in
File "javalang/parser.py", line 336, in parse_type_declaration
return self.parse_class_or_interface_declaration()
File "javalang/parser.py", line 353, in parse_class_or_interface_declaration
self.illegal("Expected type declaration")
File "javalang/parser.py", line 122, in illegal
raise JavaSyntaxError(description, at)
javalang.parser.JavaSyntaxError

The ``javalang.parse`` module also provides convenience methods for parsing more
common types of code snippets.