{"id":13585342,"url":"https://github.com/autosoft-dev/tree-hugger","last_synced_at":"2025-09-12T12:41:08.175Z","repository":{"id":53523651,"uuid":"244139058","full_name":"autosoft-dev/tree-hugger","owner":"autosoft-dev","description":"A light-weight, extendable, high level, universal code parser built on top of tree-sitter","archived":false,"fork":false,"pushed_at":"2021-12-02T12:16:49.000Z","size":1448,"stargazers_count":126,"open_issues_count":9,"forks_count":10,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-24T10:20:06.522Z","etag":null,"topics":["ast","cli","code-mining","cpp","data-mining","java","javascript","languages","machine-learning-on-source-code","parser","parsing","php","programming-language-theory","python","python-binding","tree-sitter","universal"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/autosoft-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-01T11:46:26.000Z","updated_at":"2025-02-25T07:09:15.000Z","dependencies_parsed_at":"2022-09-13T17:11:42.906Z","dependency_job_id":null,"html_url":"https://github.com/autosoft-dev/tree-hugger","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autosoft-dev%2Ftree-hugger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autosoft-dev%2Ftree-hugger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autosoft-dev%2Ftree-hugger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autosoft-dev%2Ftree-hugger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/autosoft-dev","download_url":"https://codeload.github.com/autosoft-dev/tree-hugger/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247773721,"owners_count":20993639,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ast","cli","code-mining","cpp","data-mining","java","javascript","languages","machine-learning-on-source-code","parser","parsing","php","programming-language-theory","python","python-binding","tree-sitter","universal"],"created_at":"2024-08-01T15:04:53.091Z","updated_at":"2025-04-08T04:18:13.799Z","avatar_url":"https://github.com/autosoft-dev.png","language":"Python","readme":"\n![Code mining at scale - tree hugger](https://github.com/autosoft-dev/tree-hugger/blob/master/tree-hugger%20schema.PNG)\n\n[![Downloads](https://pepy.tech/badge/tree-hugger)](https://pepy.tech/project/tree-hugger)\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)\n[![Support Python Version](https://img.shields.io/badge/python-3.6%7C3.7%7C3.8-brightgreen)](https://pypi.org/project/tree-hugger/)\n[![PyPI version](https://badge.fury.io/py/tree-hugger.svg)](https://badge.fury.io/py/tree-hugger)\n![](build_badges/macpass.svg)\n![](build_badges/linuxpass.svg)\n![](build_badges/windowsfail.svg)\n[![autosoft-dev](https://circleci.com/gh/autosoft-dev/tree-hugger.svg?style=svg)](https://app.circleci.com/pipelines/github/autosoft-dev/tree-hugger)\n\n\n![](logo/th-logo.png) **For People in a Hurry :)**\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autosoft-dev/tree-hugger/blob/master/notebooks/Using_tree_hugger_to_Enhance_CodeXGLUE.ipynb)\n\n# tree-hugger\nMine source code repositories at scale. Easily. Tree-hugger is a light-weight, high level library which provides Pythonic APIs  to mine trough Git repositories (it works on any collection of supported code files, actually).\n\nTree-hugger is built on top of [tree-sitter](https://tree-sitter.github.io/tree-sitter/).\n\nCovered languages:\n* Python\n* PHP\n* Java\n* JavaScript\n* C++\n\n_System Requirement: Python 3.6_\n\n\n## Contributors\n\n\u003ca href=\"https://github.com/autosoft-dev/tree-hugger/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contributors-img.web.app/image?repo=autosoft-dev/tree-hugger\" /\u003e\n\u003c/a\u003e\n\nMade with [contributors-img](https://contributors-img.web.app).\n\n\n## Contents\n\n\u003cdetails\u003e\n  \u003csummary\u003eTable of contents\u003c/summary\u003e\n\n---\n\n- [Installation](#installation)\n- [Setup](#setup)\n- [Hello world example](#hello-world-example)\n- [API reference](#api-reference)\n- [Extending tree-hugger](#extending-tree-hugger)\n  - [Adding languages](#adding-languages)\n  - [Adding queries](#adding-queries)\n- [Roadmap](#roadmap)\n\n---\n\n\u003c/details\u003e\n\n\n## Installation\n\n### From pip:\n\n```\npip install -U tree-hugger PyYAML\n```\n\n### From Source:\n\n```\ngit clone https://github.com/autosoft-dev/tree-hugger.git\n\ncd tree-hugger\n\npip install -e .\n```\n\n_The installation process is tested in macOS Mojave, we have a [separate docker binding](https://github.com/autosoft-dev/tree-sitter-docker) for compiling the libraries for Linux and soon this library will be integrated in that as well_\n\n_You may need to install libgit2. In case you are in mac just use `brew install libgit2`_\n\n## Setup\n\n### Getting your .so files\n\n### Update - 19.11.2021 - \n\n**We are not able to support the s3 based download anymore. So the `download_libs` command does not work. We are making them available via this release - https://github.com/autosoft-dev/tree-hugger/releases/tag/0.10.1 Please download the required zip file from there.**\n\n_Please note that building the libraries has been tested under a macOS Mojave with Apple LLVM version 10.0.1 (clang-1001.0.46.4). However, they should work on all main stream Linux systems. We have not tested them on Windows._\n\n### Environment variables\nYou can set up `TS_LIB_PATH` environment variable for the tree-sitter lib path (the .so files you just donwloaded) and then the libary will use them automatically. Otherwise, as an alternative, you can pass it when creating any `Parser` object.\n\n\n## Hello world example\n\n\n1. **Generate the librairies** : run the above command to generate the libraries. \n\n    In our settings we use the `-c` flag to copy the generated `tree-sitter` library's `.so` file to our workspace. Once copied, we place it under a directory called `tslibs` (It is in the .gitignore).\n    \n    ⚠ If you are using linux,you will need to use our [tree-sitter-docker](https://github.com/autosoft-dev/tree-sitter-docker) image and manually copy the final .so file. Unless you are in a debian based distro and in that case you should probably use our pre-compiled version via `download_libs` command as described above\n\n2. **Setup environment variable** (optional)\nAssuming that you have the necessary environment variable setup. The following line of code will create a `Parser` object according to the language you want to analyse: \n\n**Python**\n```python\n# Python\nfrom tree_hugger.core import PythonParser\npp = PythonParser()\npp.parse_file(\"tests/assets/file_with_different_functions.py\")\npp.get_all_function_names()\nOut[4]:\n['first_child', 'second_child', 'say_whee', 'wrapper', 'my_decorator', 'parent']\n```\n\n**PHP**\n```Python \n# PHP\nfrom tree_hugger.core import PHPParser\nphpp = PHPParser()\nphpp.parse_file(\"tests/assets/file_with_different_functions.php\")\nphpp.get_all_function_names() \nOut[5] :\n['foo', 'test', 'simple_params', 'variadic_param' ]\n```\n\n**Java**\n```python\n# Java \nfrom tree_hugger.core import JavaParser\njp = JavaParser()\njp.parse_file(\"tests/assets/file_with_different_methods.java\")\njp.get_all_class_names() \nOut[6] :\n['HelloWorld','Animal', 'Dog' ]\n```\n\n**JavaScript**\n```python\n# JavaScript\nfrom tree_hugger.core import JavascriptParser\njsp = JavascriptParser()\njsp.parse_file(\"tests/assets/file_with_different_functions.js\")\njsp.get_all_function_names() \nOut[7] :\n['test', 'utf8_to_b64',\t'sum', 'multiply' ]\n```\n\n**C++**\n``` python\nfrom tree_hugger.core import CPPParser\ncp = CPPParser()\ncp.parse_file(\"tests/assets/file_with_different_functions.cpp\")\ncp.get_all_function_names() \nOut[8] :\n['foo', 'test', 'simple_params', 'variadic_param' ]\n```\n\n\n## API reference\n\n\n| Language      | Functions        | Methods      | Classes |\n| ------------- |-------------|-------------|-------------|\n| **Python**        |  get_all_function_names get_all_function_doctrings  get_all_function_names_and_params  get_all_function_bodies  |  get_all_class_method_names  get_all_method_docstrings  get_all_method_documentations  get_all_class_method_bodies  |  get_all_class_names  get_all_class_docstrings |\n| **PHP**           | get_all_function_names  get_all_function_names_with_params   get_all_function_bodies  get_all_function_docstrings  get_all_function_documentations | get_all_class_method_names  get_all_method_docstrings  get_all_method_documentations  get_all_class_method_bodies |  get_all_class_names  get_all_class_docstrings  get_all_class_documentations |\n| **Java**          |   |  get_all_class_method_names   get_all_method_names_with_params  get_all_method_bodies  get_all_method_javadocs  get_all_method_documentations |  get_all_class_names  get_all_class_javadocs  get_all_class_documentations |\n| **JavaScript**    | get_all_function_names  get_all_function_names_with_params  get_all_function_bodies  get_all_function_jsdocs  get_all_function_documentations  |  get_all_class_method_names  get_all_method_jsdocs  get_all_method_documentations |  get_all_class_names  get_all_class_jsdocs  get_all_class_documentations |\n| **C++**            | get_all_function_names  get_all_function_names_with_params  get_all_function_commentdocs  get_all_function_documentations  get_all_function_bodies  | get_all_class_method_names    |   get_all_class_names  get_all_class_commentdocs  get_all_class_documentations  |\n\n \n\n## Extending tree-hugger\n\nExtending tree-hugger for other languages and/or more functionalities for the already provided ones, is easy. \n\n1. ### Adding languages:\nParsed languages can be extended through adding a parser class from the BaseParser class. The only mandatory argument that a Parser class should pass to the parent is the `language`. This is a string. Such as `python` (lower case). Each parser class must have the options to take in the path of the tree-sitter library (.so file that we are using to parse the code) and the path to the queries yaml file, in their constructor.\n\nThe BaseParser class can do few things: \n- Loading and preparing the .so file with respect to the language you just mentioned.\n- Loading, preparing and parsing the query yaml file. (for the queries, we internally use an extended UserDict class)\n- Providing an API to parse a file and prepare it for query. `BaseParser.parse_file`\n\nIt also gives you another (most likely not to be exposed outside) API `_run_query_and_get_captures` which lets you run any queries and return back the matched results (if any) from the parsed tree.\n\nWe use those APIs once we have called `parse_file` and parsed the file.\n\n\n2. ### Adding queries: \nQueries processed on source code are s-expressions, they are listed in a `queries.yml`file for each parser class. Tree-hugger gives you a way to write your queries in yaml file for each language parsed.\n\n**Query structure**: A name of a query followed by the query itself. Written as an s-expression. *Example*:\n\n```\nall_function_docstrings:\n        \"\n        (\n            function_definition\n            name: (identifier) @function.def\n            body: (block(expression_statement(string))) @function.docstring\n        )\n        \"\n```\nYou have to follow yaml grammar while writing these queries. You can see a bit more about writng these queries in the [documentation of tree-sitter](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries). \n\nSome example queries, that you will find in the yaml file (and their corresponding API from the PythonParser class) - \n\n```\n* all_function_names =\u003e get_all_function_names()\n\n* all_function_docstrings =\u003e get_all_function_documentations()\n\n* all_class_methods =\u003e get_all_class_method_names()\n```\n\n\n## Roadmap\n\n\n * Documentation: tutorial on queries writing\n\n * Write *Parser class for other languages\n\n| Languages     | Status-Finished           | Author  |\n| ------------- |:-------------:| :-----:|\n| Python     |✅  | [Shubhadeep](https://github.com/rcshubhadeep) |\n| PHP      | ✅    |   [Clément](https://github.com/CDluznie) |\n| Java | ✅      |   [Clément](https://github.com/CDluznie)  |\n| JavaScript |  ✅  | [Clément](https://github.com/CDluznie) | \n| C++ |  ✅ | [Clément](https://github.com/CDluznie)  |\n\n\nIf you are using tree-hugger in your project, please consider putting [![parssr: tree-hugger](https://img.shields.io/badge/parser-tree--hugger-lightgrey)](https://github.com/autosoft-dev/tree-hugger/) in your project :)\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautosoft-dev%2Ftree-hugger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fautosoft-dev%2Ftree-hugger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautosoft-dev%2Ftree-hugger/lists"}