{"id":21168505,"url":"https://github.com/jetbrains-research/astminer","last_synced_at":"2025-04-05T20:09:34.702Z","repository":{"id":38983262,"uuid":"161813380","full_name":"JetBrains-Research/astminer","owner":"JetBrains-Research","description":"A library for mining of path-based representations of code (and more)","archived":false,"fork":false,"pushed_at":"2023-12-11T04:03:46.000Z","size":2010,"stargazers_count":287,"open_issues_count":12,"forks_count":81,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-29T19:08:16.832Z","etag":null,"topics":["antlr","code2vec","mining"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JetBrains-Research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-12-14T16:37:33.000Z","updated_at":"2025-03-18T01:51:37.000Z","dependencies_parsed_at":"2024-01-22T19:33:43.887Z","dependency_job_id":"264bf7f3-7ef9-4ab5-81e5-788a90e374c4","html_url":"https://github.com/JetBrains-Research/astminer","commit_stats":null,"previous_names":["vovak/astminer"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetBrains-Research%2Fastminer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetBrains-Research%2Fastminer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetBrains-Research%2Fastminer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetBrains-Research%2Fastminer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JetBrains-Research","download_url":"https://codeload.github.com/JetBrains-Research/astminer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247393573,"owners_count":20931813,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["antlr","code2vec","mining"],"created_at":"2024-11-20T15:14:39.245Z","updated_at":"2025-04-05T20:09:34.681Z","avatar_url":"https://github.com/JetBrains-Research.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![JetBrains Research](https://jb.gg/badges/research.svg)](https://confluence.jetbrains.com/display/ALL/JetBrains+on+GitHub)\n![astminer version](https://img.shields.io/badge/astminer-v0.9.0-blue)\n\n# `astminer`\nA library for mining [path-based representations of code](https://arxiv.org/pdf/1803.09544.pdf) and more\nsupported by the\n[Machine Learning Methods for Software Engineering](https://research.jetbrains.org/groups/ml_methods)\ngroup at [JetBrains Research](https://research.jetbrains.org).\n\nSupported languages of the input:\n\n|         | Java | Python | C/C++ | JavaScript | PHP |\n|---------|------|--------|-------|------------|-----|\n| ANTLR   | ✅    | ✅      |       | ✅          | ✅   |\n| GumTree | ✅ (JDT and srcML)    | ✅      |       |            |     |\n| Fuzzy   |      |        | ✅     |            |     |\n| JavaParser | ✅ |        |        |             |      |\n| TreeSitter | ✅ |       |        |            |     |\n| JavaLang| ✅    |       |        |           |      |\n\n\n\n## About\n`astminer` lets you create an end-to-end pipeline to process code for machine learning models.\n\nCurrently, it supports the extraction of:\n* Path-based representations of files/methods\n* Raw ASTs of files/methods\n\n`astminer` was first implemented as a part of the pipeline in the [code style extraction project](https://arxiv.org/abs/2002.03997) and later converted into a reusable tool.\nIt is designed to be easily extensible to new languages.\n\n`astminer` allows you to convert source code cloned from VCSs to formats suitable for training.\nTo achieve that, `astminer` incorporates the following processing modules:\n- [Filters](./docs/filters.md) to remove redundant samples from data.\n- [Label extractors](./docs/label_extractors.md) to create a label for each tree.\n- [Storages](./docs/storages.md) to define the storage format.\n\n## Usage\nThere are two ways to use `astminer`:\n\n- [As a standalone CLI tool](#using-astminer-cli) with a pre-implemented logic for common processing and mining tasks.\n- [Integrated](#using-astminer-as-a-dependency) into your Kotlin/Java mining pipelines as a Gradle dependency.\n\n### Using `astminer` CLI\n\n1. [Build the CLI](./docs/cli.md#Getting+started) from the sources.\n\n2. Prepare your inputs and [configure](./docs/cli.md#Configuration) pipeline options. For config examples, see the [configs](./configs) directory. \n\n3. To run the CLI, pass the config to the shell script:\n    ```shell\n    ./cli.sh \u003cpath-to-YAML-config\u003e\n    ```\nAlternatively, you can run the tool inside the [Docker image](./docs/cli.md#Docker).\n\n### Using `astminer` as a dependency\n\n#### Import\n\n`astminer` is available in the JetBrains Space package repository. You can add the dependency in your `build.gradle` file:\n```\nrepositories {\n    maven {\n        url \"https://packages.jetbrains.team/maven/p/astminer/astminer\"\n    }\n}\n\ndependencies {\n    implementation 'io.github.vovak:astminer:\u003cVERSION\u003e'\n}\n```\n\nIf you use `build.gradle.kts`:\n```\nrepositories {\n    maven(url = uri(\"https://packages.jetbrains.team/maven/p/astminer/astminer\"))\n}\n\ndependencies {\n    implementation(\"io.github.vovak:astminer:\u003cVERSION\u003e\")\n}\n```\n\n#### Local development\n\nTo use a specific version of the library, navigate to the required branch and build a local version of `astminer`:\n```shell\n./gradlew publishToMavenLocal\n```\nAfter that add `mavenLocal()` into the `repositories` section in your gradle configuration.\n\n#### Examples\n\nIf you want to use `astminer` as a library in your Java/Kotlin-based data mining tool, check the following usage examples:\n\n* Simple standalone [example scripts](src/examples) in Java and Kotlin with calling to different APIs of `astminer`.\n* [psiminer](https://github.com/JetBrains-Research/psiminer), a mining tool that uses `astminer` to extract paths from PSI trees. See the [code2seq storage implementation] (https://github.com/JetBrains-Research/psiminer/blob/master/psiminer-core/src/main/kotlin/storage/paths/Code2SeqStorage.kt).\n\nPlease consider trying Kotlin for your data mining pipelines: from our experience, it is much better suited for data collection and transformation instruments than Java.\n\n## Contribution\n\nWe believe that `astminer` can find use beyond our own mining tasks.\n\nPlease help make `astminer` easier to use by sharing your use cases. Pull requests are welcome as well.\nSupport for other languages and documentation are the key areas of improvement.\n\n## Citing `astminer`\n\nA [paper](https://zenodo.org/record/2595271) dedicated to `astminer` (more precisely, to its older version [PathMiner](https://github.com/vovak/astminer/tree/pathminer)) was presented at [MSR'19](https://2019.msrconf.org/). \nIf you use `astminer` in your academic work, please cite it.\n```\n@inproceedings{kovalenko2019pathminer,\n  title={PathMiner: a library for mining of path-based representations of code},\n  author={Kovalenko, Vladimir and Bogomolov, Egor and Bryksin, Timofey and Bacchelli, Alberto},\n  booktitle={Proceedings of the 16th International Conference on Mining Software Repositories},\n  pages={13--17},\n  year={2019},\n  organization={IEEE Press}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjetbrains-research%2Fastminer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjetbrains-research%2Fastminer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjetbrains-research%2Fastminer/lists"}