Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/autosoft-dev/tree-hugger
A light-weight, extendable, high level, universal code parser built on top of tree-sitter
https://github.com/autosoft-dev/tree-hugger
ast cli code-mining cpp data-mining java javascript languages machine-learning-on-source-code parser parsing php programming-language-theory python python-binding tree-sitter universal
Last synced: 3 months ago
JSON representation
A light-weight, extendable, high level, universal code parser built on top of tree-sitter
- Host: GitHub
- URL: https://github.com/autosoft-dev/tree-hugger
- Owner: autosoft-dev
- License: mit
- Created: 2020-03-01T11:46:26.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2021-12-02T12:16:49.000Z (about 3 years ago)
- Last Synced: 2024-09-26T20:51:40.544Z (3 months ago)
- Topics: ast, cli, code-mining, cpp, data-mining, java, javascript, languages, machine-learning-on-source-code, parser, parsing, php, programming-language-theory, python, python-binding, tree-sitter, universal
- Language: Python
- Homepage:
- Size: 1.38 MB
- Stars: 126
- Watchers: 9
- Forks: 10
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![Code mining at scale - tree hugger](https://github.com/autosoft-dev/tree-hugger/blob/master/tree-hugger%20schema.PNG)
[![Downloads](https://pepy.tech/badge/tree-hugger)](https://pepy.tech/project/tree-hugger)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)
[![Support Python Version](https://img.shields.io/badge/python-3.6%7C3.7%7C3.8-brightgreen)](https://pypi.org/project/tree-hugger/)
[![PyPI version](https://badge.fury.io/py/tree-hugger.svg)](https://badge.fury.io/py/tree-hugger)
![](build_badges/macpass.svg)
![](build_badges/linuxpass.svg)
![](build_badges/windowsfail.svg)
[![autosoft-dev](https://circleci.com/gh/autosoft-dev/tree-hugger.svg?style=svg)](https://app.circleci.com/pipelines/github/autosoft-dev/tree-hugger)![](logo/th-logo.png) **For People in a Hurry :)**
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autosoft-dev/tree-hugger/blob/master/notebooks/Using_tree_hugger_to_Enhance_CodeXGLUE.ipynb)
# tree-hugger
Mine source code repositories at scale. Easily. Tree-hugger is a light-weight, high level library which provides Pythonic APIs to mine trough Git repositories (it works on any collection of supported code files, actually).Tree-hugger is built on top of [tree-sitter](https://tree-sitter.github.io/tree-sitter/).
Covered languages:
* Python
* PHP
* Java
* JavaScript
* C++_System Requirement: Python 3.6_
## Contributors
Made with [contributors-img](https://contributors-img.web.app).
## Contents
Table of contents
---
- [Installation](#installation)
- [Setup](#setup)
- [Hello world example](#hello-world-example)
- [API reference](#api-reference)
- [Extending tree-hugger](#extending-tree-hugger)
- [Adding languages](#adding-languages)
- [Adding queries](#adding-queries)
- [Roadmap](#roadmap)---
## Installation
### From pip:
```
pip install -U tree-hugger PyYAML
```### From Source:
```
git clone https://github.com/autosoft-dev/tree-hugger.gitcd tree-hugger
pip install -e .
```_The installation process is tested in macOS Mojave, we have a [separate docker binding](https://github.com/autosoft-dev/tree-sitter-docker) for compiling the libraries for Linux and soon this library will be integrated in that as well_
_You may need to install libgit2. In case you are in mac just use `brew install libgit2`_
## Setup
### Getting your .so files
### Update - 19.11.2021 -
**We are not able to support the s3 based download anymore. So the `download_libs` command does not work. We are making them available via this release - https://github.com/autosoft-dev/tree-hugger/releases/tag/0.10.1 Please download the required zip file from there.**
_Please note that building the libraries has been tested under a macOS Mojave with Apple LLVM version 10.0.1 (clang-1001.0.46.4). However, they should work on all main stream Linux systems. We have not tested them on Windows._
### Environment variables
You can set up `TS_LIB_PATH` environment variable for the tree-sitter lib path (the .so files you just donwloaded) and then the libary will use them automatically. Otherwise, as an alternative, you can pass it when creating any `Parser` object.## Hello world example
1. **Generate the librairies** : run the above command to generate the libraries.
In our settings we use the `-c` flag to copy the generated `tree-sitter` library's `.so` file to our workspace. Once copied, we place it under a directory called `tslibs` (It is in the .gitignore).
⚠ If you are using linux,you will need to use our [tree-sitter-docker](https://github.com/autosoft-dev/tree-sitter-docker) image and manually copy the final .so file. Unless you are in a debian based distro and in that case you should probably use our pre-compiled version via `download_libs` command as described above2. **Setup environment variable** (optional)
Assuming that you have the necessary environment variable setup. The following line of code will create a `Parser` object according to the language you want to analyse:**Python**
```python
# Python
from tree_hugger.core import PythonParser
pp = PythonParser()
pp.parse_file("tests/assets/file_with_different_functions.py")
pp.get_all_function_names()
Out[4]:
['first_child', 'second_child', 'say_whee', 'wrapper', 'my_decorator', 'parent']
```**PHP**
```Python
# PHP
from tree_hugger.core import PHPParser
phpp = PHPParser()
phpp.parse_file("tests/assets/file_with_different_functions.php")
phpp.get_all_function_names()
Out[5] :
['foo', 'test', 'simple_params', 'variadic_param' ]
```**Java**
```python
# Java
from tree_hugger.core import JavaParser
jp = JavaParser()
jp.parse_file("tests/assets/file_with_different_methods.java")
jp.get_all_class_names()
Out[6] :
['HelloWorld','Animal', 'Dog' ]
```**JavaScript**
```python
# JavaScript
from tree_hugger.core import JavascriptParser
jsp = JavascriptParser()
jsp.parse_file("tests/assets/file_with_different_functions.js")
jsp.get_all_function_names()
Out[7] :
['test', 'utf8_to_b64', 'sum', 'multiply' ]
```**C++**
``` python
from tree_hugger.core import CPPParser
cp = CPPParser()
cp.parse_file("tests/assets/file_with_different_functions.cpp")
cp.get_all_function_names()
Out[8] :
['foo', 'test', 'simple_params', 'variadic_param' ]
```## API reference
| Language | Functions | Methods | Classes |
| ------------- |-------------|-------------|-------------|
| **Python** | get_all_function_names get_all_function_doctrings get_all_function_names_and_params get_all_function_bodies | get_all_class_method_names get_all_method_docstrings get_all_method_documentations get_all_class_method_bodies | get_all_class_names get_all_class_docstrings |
| **PHP** | get_all_function_names get_all_function_names_with_params get_all_function_bodies get_all_function_docstrings get_all_function_documentations | get_all_class_method_names get_all_method_docstrings get_all_method_documentations get_all_class_method_bodies | get_all_class_names get_all_class_docstrings get_all_class_documentations |
| **Java** | | get_all_class_method_names get_all_method_names_with_params get_all_method_bodies get_all_method_javadocs get_all_method_documentations | get_all_class_names get_all_class_javadocs get_all_class_documentations |
| **JavaScript** | get_all_function_names get_all_function_names_with_params get_all_function_bodies get_all_function_jsdocs get_all_function_documentations | get_all_class_method_names get_all_method_jsdocs get_all_method_documentations | get_all_class_names get_all_class_jsdocs get_all_class_documentations |
| **C++** | get_all_function_names get_all_function_names_with_params get_all_function_commentdocs get_all_function_documentations get_all_function_bodies | get_all_class_method_names | get_all_class_names get_all_class_commentdocs get_all_class_documentations |
## Extending tree-hugger
Extending tree-hugger for other languages and/or more functionalities for the already provided ones, is easy.
1. ### Adding languages:
Parsed languages can be extended through adding a parser class from the BaseParser class. The only mandatory argument that a Parser class should pass to the parent is the `language`. This is a string. Such as `python` (lower case). Each parser class must have the options to take in the path of the tree-sitter library (.so file that we are using to parse the code) and the path to the queries yaml file, in their constructor.The BaseParser class can do few things:
- Loading and preparing the .so file with respect to the language you just mentioned.
- Loading, preparing and parsing the query yaml file. (for the queries, we internally use an extended UserDict class)
- Providing an API to parse a file and prepare it for query. `BaseParser.parse_file`It also gives you another (most likely not to be exposed outside) API `_run_query_and_get_captures` which lets you run any queries and return back the matched results (if any) from the parsed tree.
We use those APIs once we have called `parse_file` and parsed the file.
2. ### Adding queries:
Queries processed on source code are s-expressions, they are listed in a `queries.yml`file for each parser class. Tree-hugger gives you a way to write your queries in yaml file for each language parsed.**Query structure**: A name of a query followed by the query itself. Written as an s-expression. *Example*:
```
all_function_docstrings:
"
(
function_definition
name: (identifier) @function.def
body: (block(expression_statement(string))) @function.docstring
)
"
```
You have to follow yaml grammar while writing these queries. You can see a bit more about writng these queries in the [documentation of tree-sitter](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries).Some example queries, that you will find in the yaml file (and their corresponding API from the PythonParser class) -
```
* all_function_names => get_all_function_names()* all_function_docstrings => get_all_function_documentations()
* all_class_methods => get_all_class_method_names()
```## Roadmap
* Documentation: tutorial on queries writing
* Write *Parser class for other languages
| Languages | Status-Finished | Author |
| ------------- |:-------------:| :-----:|
| Python |✅ | [Shubhadeep](https://github.com/rcshubhadeep) |
| PHP | ✅ | [Clément](https://github.com/CDluznie) |
| Java | ✅ | [Clément](https://github.com/CDluznie) |
| JavaScript | ✅ | [Clément](https://github.com/CDluznie) |
| C++ | ✅ | [Clément](https://github.com/CDluznie) |If you are using tree-hugger in your project, please consider putting [![parssr: tree-hugger](https://img.shields.io/badge/parser-tree--hugger-lightgrey)](https://github.com/autosoft-dev/tree-hugger/) in your project :)