{"id":18421951,"url":"https://github.com/spcl/fbacode","last_synced_at":"2025-04-07T14:32:09.137Z","repository":{"id":41050642,"uuid":"142330889","full_name":"spcl/FBACode","owner":"spcl","description":null,"archived":false,"fork":false,"pushed_at":"2024-10-15T20:45:47.000Z","size":4476,"stargazers_count":8,"open_issues_count":1,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-10-17T07:21:11.247Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spcl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-25T17:13:23.000Z","updated_at":"2024-10-15T20:45:53.000Z","dependencies_parsed_at":"2022-09-21T08:08:45.909Z","dependency_job_id":null,"html_url":"https://github.com/spcl/FBACode","commit_stats":null,"previous_names":["spcl/fbacode"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2FFBACode","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2FFBACode/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2FFBACode/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2FFBACode/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spcl","download_url":"https://codeload.github.com/spcl/FBACode/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223283923,"owners_count":17119575,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T04:27:24.958Z","updated_at":"2024-11-06T04:27:25.874Z","avatar_url":"https://github.com/spcl.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"*Fetch, Build and Analyze Code*\n\nThe project consists of three major components: [*fetcher*](#fetcher), [*builder*](#builder), and [*analyzer*](#analyzer).\nThe first one is responsible for discovering repositories and source codes. The second one downloades the code, attempts build and generates LLVM bitcodes, and sends them to a remote server for storage. Finally, the analyzer downoads the build artifacts from the remote storage server and analyzes them.\n\nWe perform the analysis on the AST files produces by the builder. Specifically, the analysis is done by a C++ tool ([cxx-langstat](https://github.com/spcl/cxx-langstat)) that relies on the Clang compiler infrastructure to parse the ASTs. This introduces a requirement: the langstat tool must be built with the exact same compiler version that the AST files were built with. We achieve this by using docker multi-stage builds. \n\n## Requirements\n\n* Docker: make sure that Docker daemon is running and your user has sufficient permissions to use containers.\n* Python (3.6+) with virtual environments and pip. The following Python packages are installed through venv and used:\n  - [requests](https://pypi.org/project/requests/)\n  - [GitPython](https://pypi.org/project/GitPython/)\n  - [docker](https://pypi.org/project/docker/)\n  - [fabric](https://pypi.org/project/fabric/)\n  - [PyYaml](https://pypi.org/project/PyYAML/)\n  - [fuzzywuzzy](https://pypi.org/project/fuzzywuzzy/)\n\n## Installation\n\nRun the **install.py** script to create a Python virtual environment **fbacode-virtualenv**\nand install dependencies there. Use `source fbacode-virtualenv/bin/activate` to active\nthe environment.\n\nUse **tools/build_docker_images.py** to build and update Docker images necessary\nto run build steps.\n\n### Debian installation\n#### Creating the docker containers\nFirst, either pull the build and analyze images from DockerHub using:\n```bash\ndocker pull spcleth:debian-bookworm-clang\ndocker pull spcleth:debian-bookworm-cxxlangstat\n```\nOr build the images locally using `python3 tools/build_docker_images.py`.\nThe order in which to create the docker containers is:\n1. debian-bookworm-clang-base\n1. debian-bookworm-clang-base-beta\n1. debian-bookworm-clang\n1. debian-bookworm-cxxlangstat\n\nWhen building the `debian-bookworm-clang` image, make sure the cxx-langstat project can be found as folder inside the root of FBACode.\n## Fetcher\n\nFetcher attempts to discover available repositories that could be built. The goal\nis to search known and popular platforms with open-source code. The output is a JSON\nfile with entries for each project. The current implementation supports [*GitHub*](#github)\nand [*Debian*](#debian) packages. To enable or disable sources, use the `fetch.cfg` \nfile and set the `active` field accordingly \n\n#### GitHub\n\nThe implementation scans GH repositories tagged as C and C++ software and sorts\nthem according to the number of stars.\n\nTo avoid running into [rate limit issues](https://developer.github.com/v3/search/), please provide your [personal access token](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/) in the user configuration file.\n\n#### Debian\n\nThe fetcher looks for random projects with C++ code in them, which is found in the debian package API.\nAt the moment only Debian 12 (Bookworm) is supported. \n\n\n## Builder\n\nThe builder component is responsible for downloading source code, discovering\nthe build system used for this projects, lookup of existing build instructions\nand dependencies and attempting a build. For the build, we use LLVM's Clang and\ngenerate LLVM IR bitcodes.\n\nThe Python implementation uses a thread pool to start multiple build processes\nat the same time. Each build is invoked in a Docker container for security\nand compatibility reasons.\n\nCurrent implementation supports two build systems: [*CMake*](#cmake) projects\nand [*Debian*](#builder-debian) source packages.\n\nTry builder with `builder.py examples/github-repo.json` or `builder.py examples/debian.json`.\n\nThe builder outputs several files to the `buildlogs` folder:\n- `summary-XXX.txt`: basically the same output as the console, shows errors and other build statiscics\n- `build_details_XXX.json`: contains all the information gathered for each project\n- `errorstats_XXX.json`: A sorted list of all errors found, with packages and numbers of occurences\n- `rebuild_XXX.json`: A json file with all the failed projects, can be fed to the Builder again\n- `dependencies_XXX.json`: A sorted list of all missing dependencies found\n\n#### CMake\n\nCurrent implementation supports default configuration without any configuration flags.\n\n#### Make\n\nCurrent implementation basically runs `./configure` and then `make`.\n\n#### Autotools\n\nCurrent implementation basically runs `autoreconf`,  `./configure` and then `make`. Does not work correctly yet.\n\n#### Travis CI\n\nThe builder tries to pick a configuration with `os: linux`, `compiler: clang` and `arch: amd64`\notherwise just picks the first configuration from the build matrix. The apt and snap addons are\nsupported, as well as stages. \n\n#### Debian Builder\n\nThe current implementation uses packages from the Debian 12 (Bookworm, latest LTS at the time) repository.\nThe Packages get downloaded inside the docker container, since we don't know if apt is\navailable on the host system. it basically runs `apt-source package`, then \n`apt build-dep package` to install dependencies and finally `dpkg-buildpackage` to build it.\nbecause the configure and build step is combined into one command, it is not possible to \ntime them separately.\n\n#### Missing features\n\n- Currently we don't perform any search for cloned repositories. Furthermore, it's possible that different platforms provide various versions of the same software, e.g. a GitHub project of a program delivered as a Debian package as well.\n- We need a fleet of Docker containers supporting different versions of LLVM.\n- installing dependencies is missing\n- discovering dependencies is not very good yet\n- We should use CI configurations (Travis, CircleCI, GH Actions) and package mangers to discover configuration flags and dependencies.\n\n## Analyzer\n\nThe analyzer component is responsible for \"downloading\" the build artifacts from the remote storage server and then to analyze them. It uses a docker image to start a container that will decompress the archive and run [cxx-langstat](https://github.com/spcl/cxx-langstat) on the AST files.\n\n# Example build and analysis on Debian\nHere, we will showcase how to build and analyze projects from the Debian repository. We will use the `examples/debian-abseil.json file`, but it can be replaced with any other debian json database.\n\n## Build\n```bash\npython3 builder.py examples/debian-abseil.json\n```\nAfter the build finishes, you should see on the remote server specified inside of `build.cfg` an archive with the results of the build. Depending on the contents of the config file, the build artifacts might exist on the build machine as well.\n\n## Analysis\nMake sure the root of FBACode contains a folder named `ast_archive`. This is where the analyzer will \"download\" the artifacts from the remote server. Additionally, the build artifacts folder must contain a `build_summary.json` file. This is similar to a database of the artifacts in that folder. If it does not exist, we recommend using the `all_built.json` in the build directory used by `builder.py` (by default it is `build/`).\n```bash\npython3 analyzer.py \u003cfolder containing the build artifacts\u003e # typically of form run_\u003ctimestamp\u003e\n```\n\nThen, in the `analyze/` folder, there should be separate folders for each project that was analyzed. Each folder contains the statistics for that specific project. \n\nNext, the `data_analysis/` folder contains a jupyter notebook that aggregates the per-project statistics to extract high-level insights.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspcl%2Ffbacode","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspcl%2Ffbacode","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspcl%2Ffbacode/lists"}