{"id":19462723,"url":"https://github.com/naghim/source-file-visualizer","last_synced_at":"2025-09-16T03:47:23.729Z","repository":{"id":161506413,"uuid":"186691345","full_name":"naghim/Source-File-Visualizer","owner":"naghim","description":"This is a repository containing an interactive visualizer for the source file analyzer code that we have implemented. The visualizer allows you to explore the output of the source file analyzer in an interactive and intuitive way, making it easier to understand and analyze the results.","archived":false,"fork":false,"pushed_at":"2023-07-06T19:36:28.000Z","size":41668,"stargazers_count":1,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-06-28T02:04:02.622Z","etag":null,"topics":["interactive-exploration","source-code-analysis","source-code-comprehension","visualization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/naghim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-05-14T19:59:07.000Z","updated_at":"2024-04-12T14:23:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"9663c5e1-696e-4b60-9251-1f2a95544383","html_url":"https://github.com/naghim/Source-File-Visualizer","commit_stats":null,"previous_names":["naghim/source-file-visualizer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/naghim/Source-File-Visualizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naghim%2FSource-File-Visualizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naghim%2FSource-File-Visualizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naghim%2FSource-File-Visualizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naghim%2FSource-File-Visualizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/naghim","download_url":"https://codeload.github.com/naghim/Source-File-Visualizer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naghim%2FSource-File-Visualizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275358838,"owners_count":25450443,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-16T02:00:10.229Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["interactive-exploration","source-code-analysis","source-code-comprehension","visualization"],"created_at":"2024-11-10T18:05:07.284Z","updated_at":"2025-09-16T03:47:23.707Z","avatar_url":"https://github.com/naghim.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SourceFileAnalyzer - visualizer\n### About\nNowadays more and more people are seeking to apply, develop and add to their skills the computer science field. \nConsequently, a need has risen to write clean, efficient, well documented and readable code. The purpose of this research is\nto measure programming style from source code. By doing so, it not only may be an effective tool for companies to filter people\nby their competence, but also could be a great tool for students hoping to improve their abilities. This research is intended \nto focus on the semantical aspects of the source code, using abstract syntax trees and deep learning in order to achieve \nthe desired results mentioned above. \n\nThis is a repository containing an interactive visualizer for the source file analyzer code that we have implemented ([github repository here](https://github.com/kotunde/SourceFileAnalyzer_featureSearch_and_classification)). The visualizer allows you to explore the output of the source file analyzer in an interactive and intuitive way, making it easier to understand and analyze the results.\n\n#### Documentation (HU)\nDocumentation [here](https://drive.google.com/file/d/1kWhOHgtBw9a86DTgF4EhhcINgtFxaYog/view?usp=sharing)\n\nPresentation [here](https://drive.google.com/file/d/1iHppZWdPPIsRhucvu9Lp6l9nL2WHNmr-/view?usp=sharing)\n\n### Getting started\nSource Code Authorship Attribution by extracting layout, lexical and syntactical features and classifying them.\n\n### Main steps\n- Layout and Lexical feature extraction with Python code\n- AST extraction with Clang compiler and output processing (done using Linux OS)\n- Classification with Python frameworks (```sklearn, pandas, numpy```)\n\n## Dataset\nOur dataset contains 9 C/C++ source files/100 user from the Google Code Jam 2015 programming competition [(GCJ_Dataset/Data)](https://github.com/kotunde/SourFileAnalyzer_featureSearch_and_classification/tree/master/GCJ_Dataset/Data), and an average of 27 C++ source files/14 user from Sapientia EMTE University [(Sapi_Dataset/Data)](https://github.com/kotunde/SourFileAnalyzer_featureSearch_and_classification/tree/master/Sapi_Dataset/Data).\n\nThe main difference between the datasets is that the GCJ users were only given the task to solve, while Sapi users were given the header files too to work with. So the results are more remarkable in the GCJ dataset, since the Sapi codes were similar in structure.\n\n## How it works\n\n### Prerequisites\nInstall Python 3.x\n```\n$ sudo apt install python3.7\n```\n#### Set up Clang compiler\nDownload LLVM from [here](http://releases.llvm.org/download.html).The following program will be needed:\n\n```your/download/directory/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang-check```\n\n#### Python libraries\nOur programs require the following libraries: ```sklearn, pandas, numpy```. The easiest way to install these is to install as part of the [Anaconda](https://docs.continuum.io/anaconda/) distribution.\n\n\n### Running\n#### Layout and Lexical features\nRun this script to extract layout and lexical features into a CSV file: [extractAttributes.py](https://github.com/kotunde/SourceFileAnalyzer_featureSearch_and_classification/blob/master/Programs/LL_features/extractAttributes.py).\n\nDon't forget to set the directory path, and set the parameters of the *main* function depending on your platform.\n```\n$ python extractAttributes.py\n```\nNow we have the layout and lexical feautres in LL_features.csv file, where each column is a feature, except the last, which is the \"author\", the class itself. Our results: [GCJ L\u0026L Features CSV](https://github.com/kotunde/SourceFileAnalyzer_featureSearch_and_classification/blob/master/GCJ_Dataset/CSV/GCJ_47.csv),  [Sapi L\u0026L Features CSV](https://github.com/kotunde/SourceFileAnalyzer_featureSearch_and_classification/blob/master/Sapi_Dataset/CSV/SAPI_47.csv)\n\n#### Abstract Syntax Trees\nWe have a bash script which is written for both of the datasets, since they differ in their directory structure ([gcj_data_ast_func.sh](https://github.com/kotunde/SourceFileAnalyzer_featureSearch_and_classification/blob/master/Programs/AST_extraction/gcj_data_ast_func.sh) , [sapi_data_ast_func.sh](https://github.com/kotunde/SourceFileAnalyzer_featureSearch_and_classification/blob/master/Programs/AST_extraction/sapi_data_ast_func.sh)). The script runs through the data directory (given as first parameter) by source files, and creates a second direcory (it's name given as second parameter) with the same directory structure, containing the .ast files with the same name as the respective source file.\n\n**Important!**\nIn [createAstByFuncName.py](https://github.com/kotunde/SourceFileAnalyzer_featureSearch_and_classification/blob/master/Programs/AST_extraction/createAstByFuncName.py) in *command* variable set the path to installed ```clang-check```\n```\n$ bash gcj_data_ast_func.sh relative/path/Data Data_ast\n```\nor\n```\n$ bash sapi_data_ast_func.sh relative/path/Data Data_ast\n```\nThe script creates the ASTs by first extracting the function names from each sourcefile, then creates the AST for each function.\n\nThen run the script named [extractNodes.py](https://github.com/kotunde/SourceFileAnalyzer_featureSearch_and_classification/blob/master/Programs/AST_features/extractNodes.py) which extracts the nodes from the .ast files and writes the output into an ```extracted_ast_nodes.csv``` named file. Then run the next script in the AST_features folder, namely [normalizeAstOutputFile.py](https://github.com/kotunde/SourceFileAnalyzer_featureSearch_and_classification/blob/master/Programs/AST_features/normalizeAstOutputFile.py) which will normalize and match the output with the other .csv file (```LL_features.csv```). \nThen repeat the process for the [extractBigrams.py](https://github.com/kotunde/SourceFileAnalyzer_featureSearch_and_classification/blob/master/Programs/AST_features/extractBigrams.py) file.\n\n#### Classification\nRun the [rfc.py](https://github.com/kotunde/SourceFileAnalyzer_featureSearch_and_classification/blob/master/Programs/Classification/rfc.py) script to classify the result CSVs with Random Forest Classifier.\nSet the directory path to the respective CSV file and format the output CSV's header.\n```\n$ python rfc.py\n```\n#### Illustration\nTo illustrate the output CSVs you can use the ```matplotlib.pyplot``` framework.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnaghim%2Fsource-file-visualizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnaghim%2Fsource-file-visualizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnaghim%2Fsource-file-visualizer/lists"}