{"id":17959913,"url":"https://github.com/dargones/import_prediction","last_synced_at":"2025-06-15T21:16:01.163Z","repository":{"id":104892233,"uuid":"243832835","full_name":"Dargones/import_prediction","owner":"Dargones","description":"Code that allows predicting imports in Java code with GGNNs","archived":false,"fork":false,"pushed_at":"2020-05-09T03:53:43.000Z","size":38308,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-09T06:41:27.319Z","etag":null,"topics":["github","graph-neural-networks","imports","java","machine-learning"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Dargones.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-28T18:52:02.000Z","updated_at":"2023-07-16T11:36:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"8d554131-6edd-4ca1-aa4a-06e2e5515956","html_url":"https://github.com/Dargones/import_prediction","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dargones%2Fimport_prediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dargones%2Fimport_prediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dargones%2Fimport_prediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dargones%2Fimport_prediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Dargones","download_url":"https://codeload.github.com/Dargones/import_prediction/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247052624,"owners_count":20875685,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["github","graph-neural-networks","imports","java","machine-learning"],"created_at":"2024-10-29T11:04:45.318Z","updated_at":"2025-04-03T18:21:23.911Z","avatar_url":"https://github.com/Dargones.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Java Local Import Prediction with Gated Graph Neural Networks\n\n- [Introduction](#introduction)\n- [Getting the Data](#getting-the-data)\n- [Data Preprocessing](#data-preprocessing-and-baselines)\n- [Running the GGNNs](#running-the-ggnns)\n\n## Introduction\n\nThis repository contains the code of my Senior Thesis in which I explore how\nGated Graph Neural Networks can be used to predict class-level imports in Java \ncode. More specifically, I attempt to predict which classes (or, more properly, \ncompilation units) defined in a certain project currently under development \nmight be imported to a class newly defined in the same project given the new\nclass's name and possibly initial import statements in that class. \n\nBecause the goal is to predict imports occurring within\na possibly unfinished project, one cannot gather enough import co-occurance \nstatistics to efficiently predict class-level imports. Instead, one must rely\nexclusively on information that is locally available. To approach this problem,\nI propose to model relationships between compilation units in a project with a \ngraph. A node in such a graph corresponds to a particular compilation unit. \nAn edge between two nodes marks some relationship between the two corresponding \ncompilation units. \n\nBelow is an example of such a graph build for one of the repositories in my \ndataset. Grey undirected edges connect compilation units defined within the same \npackage. Black directed units correspond to import statements. Blue edges mark \nclass inheritance or interface implementation:\n\n![Graph Example](graph_example.png)\n\nThe code in this repository allows building such graphs for arbitrary GitHub\nprojects and running GGNNs on these graphs to learn to predict future imports in\na way similar to how [Allamanis *et al.*](https://arxiv.org/abs/1711.00740) \nsolve the variable misuse task (they work on the level of a small piece of code\nand model relationships between variables, while I model relationships between\nfiles\\compilation units in the context of a repository). The initial comparison \nbetween the baseline results I was able to achieve by other means \n(see notebooks/Baselines.ipynb) suggest that GGNNs are well suited for this task. For more details on the way I use GGNNs, see README.md in the python directory.\n\n\n## Getting the Data\n\nThe data for this project can be obtained by running a set of SQL queries on \npublicly available BigQuery Dataset. The README.md file in the SQL directory\ncontains detailed information about replicating this step and the criteria \nused to select the data.\n\n## Data Preprocessing and Baselines\n\nThe raw data can be downloaded from BigQuery via GCS as a set of .json files\neach of which will contain millions of lines of Java code. This code can be \nparsed by running Parser.java (located in java/src) passing the name of one .json file as the first argument and the name of the output file as a second argument. You will need \n[JavaParser](https://javaparser.org/) to run this code. \n\nNext, the data goes through a series of preprocessing steps. To replicate them,\nrun Filtering.ipynb and ConvertingToGraphs.ipynb notebooks. You might need the\nfollowing python libraries to run this code: `pytorch, numpy, tqdm, joblib, \nnetworkx, sklearn, matplotlib, pandas`\n\nMore information on data preprocessing and baselines used for import prediction\ncan be found in the README.md file in the notebooks directory.\n\n## Running the GGNNs\n\nThe detailed desription of the pytorch implementation of GGNNs used for this\nproject can be found in the README.md file in the python directory. The same\ndirectory contains the python modules that define the network structure, the\nloss function, and the way the data loader works. wrapper.py in the python directory contains an example of how the network could be run.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdargones%2Fimport_prediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdargones%2Fimport_prediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdargones%2Fimport_prediction/lists"}