{"id":20713351,"url":"https://github.com/linaro/hpc_tensorflowci","last_synced_at":"2025-03-11T06:43:33.634Z","repository":{"id":90096729,"uuid":"221905829","full_name":"Linaro/hpc_tensorflowci","owner":"Linaro","description":"HPC-SIG Tensorflow CI","archived":false,"fork":false,"pushed_at":"2020-01-24T14:24:44.000Z","size":63,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-01-17T21:30:26.902Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Linaro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-15T10:59:27.000Z","updated_at":"2020-02-26T19:51:56.000Z","dependencies_parsed_at":"2024-04-21T04:47:38.135Z","dependency_job_id":null,"html_url":"https://github.com/Linaro/hpc_tensorflowci","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Linaro%2Fhpc_tensorflowci","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Linaro%2Fhpc_tensorflowci/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Linaro%2Fhpc_tensorflowci/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Linaro%2Fhpc_tensorflowci/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Linaro","download_url":"https://codeload.github.com/Linaro/hpc_tensorflowci/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242988012,"owners_count":20217534,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-17T02:24:48.971Z","updated_at":"2025-03-11T06:43:33.624Z","avatar_url":"https://github.com/Linaro.png","language":"Shell","readme":"# HPC SIG TensorFlow CI Recipes\n\n## Overview\n\nThese recipes are intended to be used in coordination with [hpc_lab_setup](https://github.com/Linaro/hpc_lab_setup) to setup a CI loop for TensorFlow on AArch64 CentOS7 and CentOS8.\nTo achieve this the recipes build :\n- bazel (rpm, using a slightly altered version of vbatts' from Fedora spec)\n- numpy (wheel)\n- openblas (rpm, either using OpenHPC's spec for CentOS7 or upstream for CentOS8)\n- tensorflow (wheel, 1.15.0 and 2.1.0 are supported)\n- dnnl (0.19, also works with fujitsu's DNNL)\n\nEach of those, in addition to the common parts are separated into roles, so they can be executed separately.\n\nNote that DNNL is not yet integrated into TensorFlow since this requires patching the bazelrules to build it.\n\n## How to Use\n\nTo use the recipes, refer to [the build_tensorflow Jenkins job of hpc_lab_setup](https://github.com/Linaro/hpc_lab_setup/blob/master/files/build_tensorflow.yml) :\n- Run install_python3.yml first to make sure the python requirements are present on the target building machine\n- Then run build_tensorflow.yml to build the stack\n\n### Exempli gratia\n\n```bash\n$ cat \u003c\u003c EOF \u003e hosts\n[target]\n$IP_OF_THE_MACHINE_YOU_WANT_TO_BUILD_ON\nEOF\n$ ansible-playbook -i hosts install_python3.yml # Add -v(vv) for verbose output\n$ ansible-playbook -i hosts build_tensorflow.yml # You can also override default variables, although default should build everything fine\n```\n\n## Overview of the build variables\n\n### Common\n\n- arch : architecture of the target machine (aarch64 by default)\n- user : name of the passwordless user that will be created to build everything\n- build_id: index of the build in order to have separate builds on the same machine (in separate virtual environments)\n- use_openhpc: boolean to toggle the use of OpenHPC's stack (if CentOS7 then this is true by default, else it is false)\n\nWe then have multiple variables corresponding to the locations of the various libraries that are dependencies for the build (openblas, fftw, hdf5, openmpi and toolchain/GCC)\n\nThen multiple arrays containing the dependencies to be fetched via OpenHPC or Upstream repositories.\n\n### Bazel\n\nThe variables for each package built (that is bazel, dnnl, openblas, numpy, tensorflow) are quite similar :\n- build_bazel : boolean that determines if bazel should be built or installed from repositories\n- bazel_version : 0.26.1 for 1.15.0 and 0.29.1 for 2.1.0\n- bazel_minorversion : used to determine the rpm name\n- bazel_rpm : path to the built bazel rpm.\n- bazel_dir\\* : various directories to be used in the building of bazel\n- bazel : dictionary containing release_url, zip name (release_zip), binary name (binary)\n\n### DNNL\n\nVariables are similar to Bazel's, but here are the differences:\n- dnnl_prefix : where to install the library (since it is not an package), either in OpenHPC's tree if OpenHPC's toolchain is used; or in a standard path (/usr/local/mkl-dnn) if not.\n- dnnl.arch_opt_flags : see [Targeting Specific Architecture section of the upstream documentation](http://intel.github.io/mkl-dnn/dev_guide_build_options.html)\n- dnnl.build_type : DEBUG, RELEASE\n\n### NumPy\n\nSame as above, let's keep things DRY\n- numpy_patching_required: see [this issue](https://github.com/ansible/ansible/issues/8603) and below\n- numpy.patch : name of the \"patch\" (workaround really) to the below issue\n\n##### NumPy and GCC\n\nThis issue : https://github.com/ansible/ansible/issues/8603 details problems with building NumPy with not-so-recent versions of GCC (so standard CentOS toolchains and OpenHPC's)\nThe workaround or \"patch\" will be applied if numpy_patching_required is set to true.\nThe workaround is here : https://github.com/Linaro/hpc_tensorflowci/blob/master/roles/numpy/templates/numpy_centos7_aarch64.patch.j2\n\n##### NumPy and Environment\n\nNumPy hooks up to the environment libraries (i.e. BLAS, LAPACK, FFTW) via the \"numpy-site.cfg\" file to be put in the HOME_DIR of the user building NumPy.\nNote that this also applies to SciPy.\n\n### OpenBLAS\n\n- openblas_target_micro: target microarchitecture for the OpenHPC spec file (upstream takes native)\n- openblas_additional_flags: additional flags for compilation\n\n### Tensorflow\n\n- build_tf_mpi: boolean, build tensorflow with mpicc or not\n- tensorflow.cxx_optimizations : dictionary with flags to be given to g++ during the build\n- tensroflow.c_optimizations : dictionary with flags to be given to gcc during the build\n\n##### TensorFlow v2 and GCC\n\nOn CentOS8 aarch64 there is an issue with building v2 using the default gcc toolchain. Latest toolchains work fine.\n\n## Conclusion\n\nThere is still work to be done and parts of the stack to be added to this build.\nSciPy, Python itself, Keras, maybe even GCC (if one can find a decent enough spec, hopefully OpenHPC v2 will make this a lot easier).\n\nThe next step is to add support for MLperf or a standard benchmark suite that would allow for assessment of the relevancy of optimizations used.\nThen optimization iterations can be done.\n\nSVE support will first need to focus on OpenBLAS, then NumPy/SciPy, as well as TensorFlow's kernels.\n\nAnother task is to make sure that the recipes are able to build \"master\" branch from TensorFlow github repo. This shouldn't be much of an issue as far as dependencies are concerned (bazel rpm spec should build 1.1.0).\n\nThe process of optimizations, which entails a matrix job (as in, various optimizations on various dimensions, each point corresponding to one build and benchmark job).\nAnalysis of the results of each build might also need a tool to plot them.\n\n## Note To Developers\n\nThanks to Ansible straightforward and readable syntax, I recommend reading the tasks to get a feel on the flow of the play.\nCare as also been taken to make the variables explicit on their function.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinaro%2Fhpc_tensorflowci","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinaro%2Fhpc_tensorflowci","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinaro%2Fhpc_tensorflowci/lists"}