{"id":18609587,"url":"https://github.com/secure-software-engineering/headergen","last_synced_at":"2025-04-10T22:31:32.150Z","repository":{"id":79129205,"uuid":"586294002","full_name":"secure-software-engineering/HeaderGen","owner":"secure-software-engineering","description":"HeaderGen annotates Jupyter notebooks using static analysis. Improves PyCG's call graph analysis by supporting external libraries and flow-sensitivity.","archived":false,"fork":false,"pushed_at":"2025-01-30T09:51:02.000Z","size":41985,"stargazers_count":14,"open_issues_count":2,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-20T22:51:11.822Z","etag":null,"topics":["call-graph-analysis","jupyter-notebook","python","static-analysis"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/secure-software-engineering.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-07T16:20:12.000Z","updated_at":"2025-02-09T00:43:02.000Z","dependencies_parsed_at":null,"dependency_job_id":"d6b2bc95-f0ff-451e-bea8-a8aa1bb1bc8b","html_url":"https://github.com/secure-software-engineering/HeaderGen","commit_stats":null,"previous_names":["ashwinprasadme/headergen"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/secure-software-engineering%2FHeaderGen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/secure-software-engineering%2FHeaderGen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/secure-software-engineering%2FHeaderGen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/secure-software-engineering%2FHeaderGen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/secure-software-engineering","download_url":"https://codeload.github.com/secure-software-engineering/HeaderGen/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248309664,"owners_count":21082254,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["call-graph-analysis","jupyter-notebook","python","static-analysis"],"created_at":"2024-11-07T03:06:31.307Z","updated_at":"2025-04-10T22:31:27.132Z","avatar_url":"https://github.com/secure-software-engineering.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HeaderGen\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"headergen.png\" width=\"500\" align=\"center\"\u003e\n\u003c/p\u003e\n\nHeaderGen is a tool-based approach to enhance the comprehension and navigation of undocumented Python based Jupyter notebooks by automatically creating a narrative structure in the notebook.\n\nData scientists build an ML-based solution notebook by first preparing the data, then extracting key features, and then creating and training the model. HeaderGen leverages the implicit narrative structure of an ML notebook to add structural headers as annotations to the notebook.\n\n## Preview\n\n![](preview.gif)\n\n## Install HeaderGen\n\n```\npip install headergen\n```\n\n## Features\n\n- **Automated Markdown Header Insertion:** Through a taxonomy for machine-learning operations, HeaderGen annotates code cells with relevant markdown headers.\n\n- **Function Call Taxonomy:** Methodically classifies function calls based on a machine-learning operations taxonomy.\n\n- **Advanced Call Graph Analysis:** Enhances PyCG framework with flow-sensitivity and external library return-type resolution.\n\n- **Precision in External Libraries:** capability to accurately resolve function return types from external libraries using typestubs.\n\n- **Syntax Pattern Matching:** Employs type data for pattern matching.\n\n## CLI Usage\n\n### `generate` Command:\n\nGenerate the HeaderGen annotated notebook in the current directory. Note that the caches will be created the first time HeaderGen is run.\n```bash\nheadergen generate -i /path/to/input.ipynb\n```\nGenerate a JSON metadata file that includes various analysis information, use the --json_output or -j flag.\n\n```bash\nheadergen generate -i /path/to/input.ipynb -o /path/to/output/ -j\n```\n\n### `types` Command:\n\nRun type inference on the file and fetch type information.\n```bash\nheadergen types -i /path/to/input.ipynb\n```\nGenerate a JSON file with type information, use the --json_output or -j flag.\n\n```bash\nheadergen types -i /path/to/input.ipynb -o /path/to/output/ -j\n```\n\n\n### `server` Command:\n\nStarting the server is straightforward:\n\n```\nheadergen server\n```\n\nThis will start the Uvicorn server listening on host 0.0.0.0 and port 54068.\n\n#### get_analysis_notebook Endpoint:\n\nThis endpoint returns the analysis of the specified notebook or python script as a JSON response containing analysis data like cell_callsites and block_mapping.\n\nExample using curl:\n\n```\ncurl \"http://0.0.0.0:54068/get_analysis_notebook?file_path=/absolute/path/to/your/file.ipynb\"\n```\n\n#### get_types Endpoint:\n\nThis endpoint returns type information of the specified notebook or python script as a JSON response.\n\nExample using curl:\n\n```\ncurl \"http://0.0.0.0:54068/get_types?file_path=/absolute/path/to/your/file.ipynb\"\n```\n\n#### generate_annotated_notebook Endpoint:\n\nThis endpoint returns the annotated notebook based on the analysis. The response will be a file download.\n\nExample using curl:\n\n```\ncurl \"http://0.0.0.0:54068/generate_annotated_notebook?file_path=/absolute/path/to/your/file.ipynb\" --output annotated_file.ipynb\n```\n\n## Folder Structure\n\n+ `callsites-jupyternb-micro-benchmark`: Micro benchmark\n+ `callsites-jupyternb-real-world-benchmark`: Real-world benchmark\n+ `evaluation`: Contains manual header annotation and user study results\n+ `framework_models`: Function calls to ML Taxonomy mapping\n+ `typestub-database`: Type-stbs for ML libraries\n+ `headergen`: Source code of HeaderGen\n+ `pycg_extended`: Source code of extended PyCG\n+ `headergen-extension`: Jupyter notebook plugin for HG\n+ `headergen_output`: Folder where the generated notebooks from the docker container are stored\n\n--------\n## 1. Build container\n\n+ Get source files\n\n      git clone --recursive\n      git submodule update --init --recursive\n      git pull --recurse-submodules\n\n+ Linux\n\n      docker build -t headergen .\n      docker run -v {$PWD}/headergen_output:/headergen_output -it headergen bash\n\n+ Windows\n\n      docker build -t headergen .\n      docker run -v \"%cd%\"/headergen_output:/headergen_output -it headergen bash\n\n\n## 2. Run HeaderGen benchmarks from inside contatiner\n\nOutput generated from the following commands, such as annotated notebooks, reports, callsites, headers, etc, are stored in the local folder `headergen_output` after the following commands are done executing.\n\n+ Micro Benchmark (generates a csv file with results)\n\n      make ROOT_PATH=/app/HeaderGen microbench\n\n+ Real-world Benchmark (generates annotated notebooks and csv file that reproduce table 2)\n\n      make ROOT_PATH=/app/HeaderGen realworldbench\n\n+ Both Benchmarks\n\n      make ROOT_PATH=/app/HeaderGen all\n\n+ Clean generated output\n\n      make clean\n\n---\n\n## Building from Source\n\n+ Get source files\n\n      git clone --recursive\n      git submodule update --init --recursive\n      git pull --recurse-submodules\n\n+ Clear cache if exists\n\n      rm framework_models/models_cache.pickle\n      rm pycg_extended/machinery/pytd_cache.pickle\n\n+ Setup venv and dependencies with `setup.sh` script\n\n      ./setup.sh -i\n\n+ Micro Benchmark (generates a csv file with results)\n\n      make ROOT_PATH=\u003cpath to repo root\u003e microbench\n\n+ Real-world Benchmark (generates annotated notebooks and csv file that reproduce table 2)\n\n      make ROOT_PATH=\u003cpath to repo root\u003e realworldbench\n\n+ Both Benchmarks\n\n      make ROOT_PATH=\u003cpath to repo root\u003e all\n\n+ Clean generated output\n\n      make clean\n\n---\n\nThis repo contains code for the paper **\"Enhancing Comprehension and Navigation in Jupyter Notebooks with Static Analysis\"** published at the [SANER Conference 2023](https://arxiv.org/abs/2301.04419).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsecure-software-engineering%2Fheadergen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsecure-software-engineering%2Fheadergen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsecure-software-engineering%2Fheadergen/lists"}