{"id":24772534,"url":"https://github.com/shoyamanishi/webassemblynumericalcomputing","last_synced_at":"2025-03-23T20:45:03.301Z","repository":{"id":274146968,"uuid":"920513042","full_name":"ShoYamanishi/WebAssemblyNumericalComputing","owner":"ShoYamanishi","description":"A study on the numerical computing with WebAssembly on the browsers","archived":false,"fork":false,"pushed_at":"2025-01-25T07:49:06.000Z","size":6487,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-23T20:44:58.082Z","etag":null,"topics":["cpp","emscripten","numerical-computation","simulation","webassembly"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ShoYamanishi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-22T09:34:53.000Z","updated_at":"2025-01-25T07:49:10.000Z","dependencies_parsed_at":"2025-01-25T08:35:30.416Z","dependency_job_id":null,"html_url":"https://github.com/ShoYamanishi/WebAssemblyNumericalComputing","commit_stats":null,"previous_names":["shoyamanishi/webassemblynumericalcomputing"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShoYamanishi%2FWebAssemblyNumericalComputing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShoYamanishi%2FWebAssemblyNumericalComputing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShoYamanishi%2FWebAssemblyNumericalComputing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShoYamanishi%2FWebAssemblyNumericalComputing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ShoYamanishi","download_url":"https://codeload.github.com/ShoYamanishi/WebAssemblyNumericalComputing/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245168813,"owners_count":20571799,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","emscripten","numerical-computation","simulation","webassembly"],"created_at":"2025-01-29T04:23:15.745Z","updated_at":"2025-03-23T20:45:03.257Z","avatar_url":"https://github.com/ShoYamanishi.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# WebAssemblyNumericalComputing\n\nA study on the numerical computing with WebAssembly in C++ on the web browsers\n\n![](docs/banner.png)\n\n\n# Run the Test Scripts on Your Browser\nYou can run the test scripts on your browser without downloading this repo.\n\nA live demo is available at my personal website: [ [https://www.magicalpouch.com/webassembly](https://www.magicalpouch.com/webassembly) ]\n\n# Run the Test Scripts on Your Browser with a Local Server\nYou need to run a local server, such as `http.server` in Python3.\nThe `./public` directory contains all the necessary files (*.html, *.css, *.js, *.wasm. *.data ) for the local HTTP server.\n\nFor example:\n```\n$ git clone git@github.com:ShoYamanishi/WebAssemblyNumericalComputing.git\n$ cd WebAssemblyNumericalComputing/public\n$ python3 -m http.server 5173  \nServing HTTP on :: port 5173 (http://[::]:5173/) ...\n```\n\nThen open 'http://localhost:5173/' on your browser.\nThis will get you the top-level page, which contains the links to the actual test pages.\nClicking on one of those links will open a page for the specific type of computation.\nThe page automatically starts the test script to measure the performance.\n\nThe test script is written in C++ and compiled to WASM by Emscripten.\nWhen it finishes, the page will update the tables with the times measured in milliseconds in the tables.\n\nEach table is accompanied by a smaller table below it.\nThe smaller table contains the numbers sampled from the test program compiled\nfrom the same C++ code by clang++, and natively executed on Mac Mini M1 2020.\nYou can make the side-by-side comparison of the time taken on your browser and on the Mac for each problem type and the size.\n\n# Description\nThis is a collection of scripts that perform some representative numerical computations on the web browsers. The scripts are written in C++ with NEON SIMD intrinsics where applicable. The C++ codes are compiled to WASM, and then linked to HTML files with\nsome glue code in JS. The C++ codes utilize the following libraries where applicable, as well as the C++ standard library.\n\n- [Boost](https://www.boost.org).\n- [Eigen3](https://gitlab.com/libeigen/eigen).\n- [BLAS routines from CLAPACK's reference implementation from Netlib](https://www.netlib.org/clapack/).\n\nWhen you open one of the HTML files through a HTTP request on a browser, it will\nautomatically load the WASM code and starts executing it to measure the performance.\nWhen it finishes its execution, the tables on the HTML page are updated and filled with\nnumbers in milliseconds.\nMost of the test data are artificially generated at runtime on the browser by the random number generators except for the LCP, for which some data sampled from the real rigid body simulations are used.\n\nThe types of the computation and the problem sizes are chosen to reflect the typical real use cases in the interactive UI applications such as games.\nTherefore, the use cases such as training of large machine language models, or large scale computer simulations are not considered.\n\nFollowing topics are covered in this project.\n\n- Memory copy\n- Element-wise multiplication of two vectors\n- Dot product calculation\n- Prefix-sum\n- In-place sort\n- N-body particle simulation\n- Convolution 2D\n- Sparse matrix-vector multiplication\n- Dense matrix-vector multiplication\n- Cholesky factorization\n- Jacobi solver\n- Gauss-Seidel solver\n- Lemke LCP solver\n- Conjugate gradient solver\n- 512-point radix-2 FFT\n\nThe GPU capacity available through WebGPU and WebGL is not considered at moment\nfor the following reason.\n\nBased on my experience, the applicability of GPGPU is limited to the following types\nof computations.\n\n- N-Body-type particle simulation, GPU-based collision detection, and shader-based artistic rendering.\n- Forward and back-prop for the NN learning.\n\nThe former is characterised by their high independence of the computations.\nThe update of the phase space for each particle does not depend on any other\ncomputations. The latter is characterised by the \nseries of matrix-matrix multiplications and additions, and element-wise function evaluations for the activation layers, where most of the data can be kept\nin the GPU memory.\n\nThe former is already popular and covered by many web pages and examples, many of\nwhich render stunningly beautiful animating graphics, and this project does not have\nto cover it. The latter is not a realistic use case on the web browser.\n\nIn general, for the other types of computations on GPU, the problem sizes must be significantly large enough to be able to amortize the overhead of the GPU invokation and take\nadvantage of its parallelism, and for those large problem sizes, the time taken for the computation, either by CPU or GPU will be too long (in seconds and minutes) to be useful for the interactive applications.\n\nIf you are interestedin the numerical computing on Mac and iOS devices,\nplease check my sister project [AppleNumericalComputing on Github](https://github.com/ShoYamanishi/AppleNumericalComputing).\nIt utilizes Accelerate framework and Metal (GPU) compute shaders, as well as Arm NEON SIMD, and CPU multi-threadding.\n\n# C++ Implementations\n\n### Memory Copy\nThis is a simple copy of the content of a region in memory to another non-overlapping region.\nFollowing implementations are tested.\n\n- plain implementation in C++.\n- memcpy()\n\nThe C++ test code is found in [src/test_memcpy.cpp](src/test_memcpy.cpp).\n\n### Element-wise Multiplication of Two Vectors\nThis is the saxpy/daxpy type of operation. Each element-wise multiplication is totally\nindependent from the others. This operation can be maximally parallelized.\n\nFollowing implementations are tested.\n\n- plain implementation in C++.\n- C++ with NEON SIMD.\n- [the reference implementation of saxpy/daxpy from CLPACK on Netlib](https://www.netlib.org/clapack/).\n\nThe C++ test code is found in [src/test_saxpy.cpp](src/test_saxpy.cpp).\n\n### Dot Product Calculation\nThis is the inner product operation for two vectors in various sizes.\n\nFollowing implementations are tested.\n\n- plain implementation in C++.\n- C++ with NEON SIMD.\n- [the reference implementation of sdot/ddot from CLPACK on Netlib](https://www.netlib.org/clapack/).\n\nThe C++ test code is found in [src/test_dot.cpp](src/test_dot.cpp).\n\n### Prefix-Sum\nThis is a scanning operation, which can be partially parallelized.\n\nFollowing implementations are tested.\n\n- plain implementation in C++.\n- std::inclusive_scan()\n\nThe C++ test code is found in [src/test_prefix_sum.cpp](src/test_prefix_sum.cpp).\n\n### In-Place Sort\nThis is for the in-place sorting algorithms.\n\nFollowing implementations are tested.\n\n- std::sort()\n- boost::sort::spreadsort()\n- boost::sort::block_indirect_sort()\n\nThe C++ test code is found in [src/test_sort.cpp](src/test_sort.cpp).\n\n### N-Body Particle Simulation\nThis is for the performance of one step for the N-Body simulation,\nwhere each of N objects interacts with all the other N - 1 objects.\nIt simulates a simplified particle physics simulation in 3D.\nAt each step, for each particle, N-1 forces are collected based on its distances to the others,\nand the velocity and the position are updated by a simple Euler step.\n\nFollowing implementations are tested.\n\n- plain implementation in C++ with the 'array of structures' data arrangement.\n- plain implementation in C++ with the 'structure of arrays' data arrangement.\n- C++ with NEON SIMD with the 'structure of arrays' data arrangement.\n\nThe C++ test code is found in [src/test_nbody.cpp](src/test_nbody.cpp),\n [src/nbody_elements_impl.h](src/nbody_elements_impl.h), and\n [src/nbody_elements.h](src/nbody_elements.h).\n\n### Convolution 2D\nThis is for the 2D filtering operation with the convolution with a 5x5 kernel.\nThis type of operation is usually done with GPU, but this topic may be still\nuseful to get some rough idea about what kind of performance you can get on the browser\nfor this type of operations. Only a plain implementation in C++ is considered.\n\nThe C++ test code is found in [src/test_convolution_2d.cpp](src/test_convolution_2d.cpp).\n\n### Sparse Matrix-Vector Multiplication\nThis is for the performance of the sparse matrix-vector multiplication, where the matrix\nelements are stored in the CSR (compressed sparse row) form. Only a plain implementation in C++ is considered.\n\nThe C++ test code is found in [src/test_sparse_matrix_vector.cpp](src/test_sparse_matrix_vector.cpp).\n\n### Dense Matrix-Vector Multiplication\nThis is for the performance of the dense matrix-vector multiplication.\n\nFollowing implementations are tested.\n- plain implementation in C++.\n- C++ with NEON SIMD.\n- [the reference implementation of sgemv/dgemv from CLPACK on Netlib](https://www.netlib.org/clapack/).\n\nThe C++ test code is found in [src/test_dense_matrix_vector.cpp](src/test_dense_matrix_vector.cpp).\n\n### Cholesky Factorization\nThis is for the performance of the inverse operation for the PD matrices with\nCholesky factorization.\n\nFollowing implementations are tested.\n- plain implementations in C++.\n- [Eigen::LLT from Eigen3](https://eigen.tuxfamily.org/dox/classEigen_1_1LLT.html).\n\nThe C++ test code is found in [src/test_cholesky.cpp](src/test_cholesky.cpp),\n[src/test_case_cholesky_baseline.h](src/test_case_cholesky_baseline.h), and\n[src/test_case_cholesky_eigen3.h](src/test_case_cholesky_eigen3.h).\n\n### Jacobi Solver\nThis is for the performance of the iterative Jacobi solver.\nThe number of iterations is fixed to 10.\n\nFollowing implementations are tested.\n- plain implementations in C++.\n- C++ with NEON SIMD.\n\nThe C++ test code is found in [src/test_jacobi_solver.cpp](src/test_jacobi_solver.cpp).\n\n### Gauss-Seidel Solver\nThis is for the performance of the iterative Gauss-Seidel solver.\nThe number of iterations is fixed to 10.\n\nFollowing implementations are tested.\n- plain implementations in C++.\n- C++ with NEON SIMD.\n\nThe C++ test code is found in [src/test_gauss_seidel_solver.cpp](src/test_gauss_seidel_solver.cpp).\n\n### Lemke LCP Solver\nThis is for the performance of the Lemke LCP (Linear Complentarity Problem) solver.\nThe number of pivots can not be fixed, but the input data have been sampled from a velocity-space constraint-based real rigid body simulation with the hexagonal friction cone.\n\nFollowing implementations are tested.\n- plain implementations in C++.\n- C++ with NEON SIMD.\n\nThe C++ test code is found in [src/test_lcp.cpp](src/test_lcp.cpp),\n[src/test_case_lcp_lemke_baseline.h](src/test_case_lcp_lemke_baseline.h),\n[src/test_case_lcp_lemke_neon.h](src/test_case_lcp_lemke_neon.h), and\n[src/test_case_lcp.h](src/test_case_lcp.h), and\n\n### Conjugate Gradient Solver\nThis is for the performance of the conjugate gradient solver.\nThe test data are artifically generated with the condition numbers of 10.0, 1000.0, and 100000.0. Only a plain implementation in C++ is considered.\n\nThe C++ test code is found in [src/test_conjugate_gradient_solver.cpp](src/test_conjugate_gradient_solver.cpp).\n\n### 512-Point Radix-2 FFT\nThis is for the performance of the 512-point radix-2 FFT.\nThis type of operation is usually done with a DSP, but this topic may be still\nuseful to get some rough idea about what kind of performance you can get on the browser\nfor this type of operations. Only a plain implementation in C++ is considered.\n\nThe C++ test code is found in [src/test_fft.cpp](src/test_fft.cpp).\n\n# Build\nIf you want to build the contents (WASM, JS, HTML, and CSS) in the [public/](public/) directory from the\nC++ files, you can use the following instruction.\n\n## Requirements:\n\n### Emscripten\n\nFollow the instruction in [the official page](https://emscripten.org/).\n\n### Eigen3 (used for Cholesky factorization)\nYou can skip this step, if you don't want to run the tests for Cholesky factorization.\n\n- Download the zip file from the following official location:\n https://gitlab.com/libeigen/eigen/-/releases/3.4.0 on Gitlab.\n\n- Install it to Emscripten as follows.\n\n```\n% source \u003cpath/to\u003e/emsdk_env.sh\n% cd \u003cpath/to\u003eeigen-3.4.0\n% mkdir build \n% cd build\n% emcmake cmake -DCMAKE_BUILD_TYPE=Release ..\n% cmake --build . --target install\n```\n\n## Build\nRun the build script [build.sh](build.sh).\n```\n% source \u003cpath/to\u003e/emsdk_env.sh\n% cd \u003cpath/to\u003eWebAssemblyNumericalComputing\n% ./build.sh\n```\nIt's a small convenience script to invoke emcc to compile the c++ files to WASM.\nThis will arrange all the necessary files in [public/](public/).\n\n# Build for Native MacOS.\n\nYou can run the same c++ test programs on an MacOS for ARM.\nThey have the following dependencies.\n\n- [Eigen3](https://eigen.tuxfamily.org/dox/) used by [src/test_cholesky.cpp](src/test_cholesky.cpp).\n- [Boost](https://www.boost.org/) used by [src/test_sort.cpp](src/test_sort.cpp).\n\n\n## Build\nRun the build script [build_native_macos.sh](build_native_macos.sh).\n```\n% cd \u003cpath/to\u003eWebAssemblyNumericalComputing\n% ./build_native_macos.sh\n```\nThis will generate the binaries in `native_output/`.\nThe boost library is assumed to be installed at `/opt/homebrew/include`.\n\nIt's a small convenience script to invoke clang++. It should work\non any other ARM environment that has a LLVM, Eigen3, and Boost after adjusting the include paths if necessary.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshoyamanishi%2Fwebassemblynumericalcomputing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshoyamanishi%2Fwebassemblynumericalcomputing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshoyamanishi%2Fwebassemblynumericalcomputing/lists"}