{"id":13831568,"url":"https://github.com/a2flo/floor","last_synced_at":"2025-05-16T17:04:52.894Z","repository":{"id":10350416,"uuid":"12487414","full_name":"a2flo/floor","owner":"a2flo","description":"A C++ Compute/Graphics Library and Toolchain enabling same-source CUDA/Host/Metal/OpenCL/Vulkan C++ programming and execution.","archived":false,"fork":false,"pushed_at":"2025-05-09T13:40:45.000Z","size":14284,"stargazers_count":328,"open_issues_count":0,"forks_count":22,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-05-09T14:47:21.621Z","etag":null,"topics":["c-plus-plus","compiler","compute","cuda","graphics","ios","linux","macos","metal","opencl","openxr","rendering","spir","spir-v","virtual-reality","vulkan","windows"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/a2flo.png","metadata":{"files":{"readme":"README.asciidoc","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2013-08-30T13:48:07.000Z","updated_at":"2025-05-09T13:40:50.000Z","dependencies_parsed_at":"2024-03-23T23:22:15.204Z","dependency_job_id":"51343cc4-9679-45c1-a9ec-d0f364570663","html_url":"https://github.com/a2flo/floor","commit_stats":{"total_commits":1825,"total_committers":1,"mean_commits":1825.0,"dds":0.0,"last_synced_commit":"540e3c3c0c28e394bc76513e366a6a9646954419"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a2flo%2Ffloor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a2flo%2Ffloor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a2flo%2Ffloor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a2flo%2Ffloor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/a2flo","download_url":"https://codeload.github.com/a2flo/floor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254573588,"owners_count":22093731,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","compiler","compute","cuda","graphics","ios","linux","macos","metal","opencl","openxr","rendering","spir","spir-v","virtual-reality","vulkan","windows"],"created_at":"2024-08-04T10:01:31.493Z","updated_at":"2025-05-16T17:04:52.877Z","avatar_url":"https://github.com/a2flo.png","language":"C++","readme":"\n:toc:\n\n= Flo's Open libRary =\n\n== What is it? ==\n\nThis project provides a unified compute \u0026 graphics host API, as well as a unified compute \u0026 graphics C++ device language and library to enable same-source CUDA/Host/Metal/OpenCL/Vulkan programming and execution.\n\nThe unified host API is implemented at link:https://github.com/a2flo/floor/tree/master/src/device[device].\nAll backends (CUDA/Host/Metal/OpenCL/Vulkan) currently provide compute support, while graphics support is limited to Metal and Vulkan.\n\nTo provide a unified device language, a clang/LLVM/libc++ 14.0 toolchain has been link:https://github.com/a2flo/floor_llvm[modified].\n\nCertain parts of libfloor are used by both host and device code (link:https://github.com/a2flo/floor/tree/master/include/floor/math[math] and link:https://github.com/a2flo/floor/tree/master/include/floor/constexpr[constexpr]). Additional device library code is located at link:https://github.com/a2flo/floor/tree/master/include/floor/device/backend[backend].\n\nAdvanced examples can be found in the link:https://github.com/a2flo/floor_examples[floor_examples] repository.\n\n=== Example ===\nLet's take this fairly simple C++ kernel below that computes the body/body-interactions in a link:https://www.youtube.com/watch?v=DoLe1c-eokI[N-body simulation] and compile it for each backend. Note that loop unrolling is omitted for conciseness.\n[source,c++]\n----\n// define global constants\nstatic constexpr constant const uint32_t NBODY_TILE_SIZE { 256u };\nstatic constexpr constant const float NBODY_DAMPING { 0.999f };\nstatic constexpr constant const float NBODY_SOFTENING { 0.01f };\n// define a 1D kernel with a required local size of (NBODY_TILE_SIZE = 256, 1, 1)\nkernel_1d(NBODY_TILE_SIZE)\nvoid simplified_nbody(buffer\u003cconst float4\u003e in_positions, // read-only global memory buffer\n                      buffer\u003cfloat4\u003e out_positions, // read-write global memory buffer\n                      buffer\u003cfloat3\u003e inout_velocities, // read-write global memory buffer\n                      param\u003cfloat\u003e time_delta) { // read-only parameter\n  // each work-item represents/computes one body\n  const auto position = in_positions[global_id.x];\n  auto velocity = inout_velocities[global_id.x];\n  float3 acceleration; // vectors are automatically zero-initialized\n  local_buffer\u003cfloat4, NBODY_TILE_SIZE\u003e local_body_positions; // local memory array allocation\n  // loop over all bodies\n  for (uint32_t i = 0, tile = 0, count = global_size.x; i \u003c count; i += NBODY_TILE_SIZE, ++tile) {\n    // move resp. body position/mass from global to local memory\n    local_body_positions[local_id.x] = in_positions[tile * NBODY_TILE_SIZE + local_id.x];\n    local_barrier(); // barrier across all work-items in this work-group\n    // loop over bodies in this work-group\n    for (uint32_t j = 0; j \u003c NBODY_TILE_SIZE; ++j) {\n      const auto r = local_body_positions[j].xyz - position.xyz;\n      const auto dist_sq = r.dot(r) + (NBODY_SOFTENING * NBODY_SOFTENING);\n      const auto inv_dist = rsqrt(dist_sq);\n      const auto s = local_body_positions[j].w * (inv_dist * inv_dist * inv_dist); // .w is mass\n      acceleration += r * s;\n    }\n    local_barrier();\n  }\n  velocity = (velocity + acceleration * time_delta) * NBODY_DAMPING;\n  out_positions[global_id.x].xyz += velocity * time_delta; // update XYZ position\n  inout_velocities[global_id.x] = velocity; // update velocity\n}\n----\n\n_click to unfold the output for each backend_\n++++\n\u003cdetails\u003e\n  \u003csummary\u003eCUDA / PTX\u003c/summary\u003e\n  You can download the PTX file \u003ca href=\"https://github.com/a2flo/floor/blob/master/etc/example/nbody.ptx\"\u003ehere\u003c/a\u003e and the CUBIN file \u003ca href=\"https://github.com/a2flo/floor/blob/master/etc/example/nbody.cubin\"\u003ehere\u003c/a\u003e (note that building CUBINs is optional and requires \u003ccode\u003eptxas\u003c/code\u003e).\n  \n++++\n[source,Unix Assembly]\n----\n//\n// Generated by LLVM NVPTX Back-End\n//\n\n.version 8.4\n.target sm_86\n.address_size 64\n\n\t// .globl\tsimplified_nbody\n// _ZZ16simplified_nbodyE20local_body_positions has been demoted\n\n.visible .entry simplified_nbody(\n\t.param .u64 simplified_nbody_param_0,\n\t.param .u64 simplified_nbody_param_1,\n\t.param .u64 simplified_nbody_param_2,\n\t.param .f32 simplified_nbody_param_3\n)\n.reqntid 256, 1, 1\n{\n\t.reg .pred \t%p\u003c3\u003e;\n\t.reg .b32 \t%r\u003c25\u003e;\n\t.reg .f32 \t%f\u003c71\u003e;\n\t.reg .b64 \t%rd\u003c18\u003e;\n\t// demoted variable\n\t.shared .align 4 .b8 _ZZ16simplified_nbodyE20local_body_positions[4096];\n\tmov.u32 \t%r1, %tid.x;\n\tmov.u32 \t%r11, %ntid.x;\n\tmov.u32 \t%r12, %ctaid.x;\n\tmad.lo.s32 \t%r13, %r12, %r11, %r1;\n\tcvt.u64.u32 \t%rd3, %r13;\n\tmul.wide.u32 \t%rd7, %r13, 12;\n\tld.param.u64 \t%rd8, [simplified_nbody_param_2];\n\tcvta.to.global.u64 \t%rd9, %rd8;\n\tadd.s64 \t%rd4, %rd9, %rd7;\n\tld.global.f32 \t%f6, [%rd4+8];\n\tadd.s64 \t%rd6, %rd4, 8;\n\tld.global.f32 \t%f5, [%rd4+4];\n\tadd.s64 \t%rd5, %rd4, 4;\n\tld.global.f32 \t%f4, [%rd4];\n\tmul.wide.u32 \t%rd10, %r13, 16;\n\tld.param.u64 \t%rd11, [simplified_nbody_param_0];\n\tcvta.to.global.u64 \t%rd2, %rd11;\n\tadd.s64 \t%rd12, %rd2, %rd10;\n\tld.global.nc.f32 \t%f3, [%rd12+8];\n\tld.global.nc.f32 \t%f2, [%rd12+4];\n\tld.global.nc.f32 \t%f1, [%rd12];\n\tmov.u32 \t%r14, %nctaid.x;\n\tmul.lo.s32 \t%r2, %r14, %r11;\n\tshl.b32 \t%r15, %r1, 4;\n\tmov.u32 \t%r16, _ZZ16simplified_nbodyE20local_body_positions;\n\tadd.s32 \t%r3, %r16, %r15;\n\tld.param.u64 \t%rd13, [simplified_nbody_param_1];\n\tcvta.to.global.u64 \t%rd1, %rd13;\n\tmov.f32 \t%f68, 0f00000000;\n\tmov.u32 \t%r10, 0;\n\tld.param.f32 \t%f16, [simplified_nbody_param_3];\n\tmov.u32 \t%r22, %r10;\n\tmov.u32 \t%r23, %r10;\n\tmov.f32 \t%f69, %f68;\n\tmov.f32 \t%f70, %f68;\nLBB0_1:\n\tshl.b32 \t%r18, %r23, 8;\n\tadd.s32 \t%r19, %r18, %r1;\n\tmul.wide.u32 \t%rd14, %r19, 16;\n\tadd.s64 \t%rd15, %rd2, %rd14;\n\tld.global.nc.f32 \t%f18, [%rd15];\n\tst.shared.f32 \t[%r3], %f18;\n\tld.global.nc.f32 \t%f19, [%rd15+4];\n\tst.shared.f32 \t[%r3+4], %f19;\n\tld.global.nc.f32 \t%f20, [%rd15+8];\n\tst.shared.f32 \t[%r3+8], %f20;\n\tld.global.nc.f32 \t%f21, [%rd15+12];\n\tst.shared.f32 \t[%r3+12], %f21;\n\tbarrier.sync \t0;\n\tmov.u32 \t%r24, %r10;\nLBB0_2:\n\tadd.s32 \t%r21, %r16, %r24;\n\tld.shared.f32 \t%f22, [%r21+4];\n\tsub.f32 \t%f23, %f22, %f2;\n\tld.shared.f32 \t%f24, [%r21];\n\tsub.f32 \t%f25, %f24, %f1;\n\tfma.rn.f32 \t%f26, %f25, %f25, 0f38D1B717;\n\tfma.rn.f32 \t%f27, %f23, %f23, %f26;\n\tld.shared.f32 \t%f28, [%r21+8];\n\tsub.f32 \t%f29, %f28, %f3;\n\tfma.rn.f32 \t%f30, %f29, %f29, %f27;\n\trsqrt.approx.ftz.f32 \t%f31, %f30;\n\tmul.f32 \t%f32, %f31, %f31;\n\tmul.f32 \t%f33, %f32, %f31;\n\tld.shared.f32 \t%f34, [%r21+12];\n\tmul.f32 \t%f35, %f33, %f34;\n\tfma.rn.f32 \t%f36, %f35, %f29, %f68;\n\tld.shared.f32 \t%f37, [%r21+20];\n\tsub.f32 \t%f38, %f37, %f2;\n\tld.shared.f32 \t%f39, [%r21+16];\n\tsub.f32 \t%f40, %f39, %f1;\n\tfma.rn.f32 \t%f41, %f40, %f40, 0f38D1B717;\n\tfma.rn.f32 \t%f42, %f38, %f38, %f41;\n\tld.shared.f32 \t%f43, [%r21+24];\n\tsub.f32 \t%f44, %f43, %f3;\n\tfma.rn.f32 \t%f45, %f44, %f44, %f42;\n\trsqrt.approx.ftz.f32 \t%f46, %f45;\n\tmul.f32 \t%f47, %f46, %f46;\n\tmul.f32 \t%f48, %f47, %f46;\n\tld.shared.f32 \t%f49, [%r21+28];\n\tmul.f32 \t%f50, %f48, %f49;\n\tfma.rn.f32 \t%f68, %f50, %f44, %f36;\n\tfma.rn.f32 \t%f51, %f35, %f23, %f69;\n\tfma.rn.f32 \t%f69, %f50, %f38, %f51;\n\tfma.rn.f32 \t%f52, %f35, %f25, %f70;\n\tfma.rn.f32 \t%f70, %f50, %f40, %f52;\n\tadd.s32 \t%r24, %r24, 32;\n\tsetp.eq.s32 \t%p1, %r24, 4096;\n\t@%p1 bra \tLBB0_3;\n\tbra.uni \tLBB0_2;\nLBB0_3:\n\tadd.s32 \t%r22, %r22, 256;\n\tsetp.lt.u32 \t%p2, %r22, %r2;\n\tbarrier.sync \t0;\n\tadd.s32 \t%r23, %r23, 1;\n\t@%p2 bra \tLBB0_1;\n\tfma.rn.f32 \t%f53, %f70, %f16, %f4;\n\tmul.f32 \t%f54, %f53, 0f3F7FBE77;\n\tshl.b64 \t%rd16, %rd3, 4;\n\tadd.s64 \t%rd17, %rd1, %rd16;\n\tld.global.f32 \t%f55, [%rd17];\n\tfma.rn.f32 \t%f56, %f54, %f16, %f55;\n\tst.global.f32 \t[%rd17], %f56;\n\tfma.rn.f32 \t%f57, %f69, %f16, %f5;\n\tmul.f32 \t%f58, %f57, 0f3F7FBE77;\n\tld.global.f32 \t%f59, [%rd17+4];\n\tfma.rn.f32 \t%f60, %f58, %f16, %f59;\n\tst.global.f32 \t[%rd17+4], %f60;\n\tfma.rn.f32 \t%f61, %f68, %f16, %f6;\n\tmul.f32 \t%f62, %f61, 0f3F7FBE77;\n\tld.global.f32 \t%f63, [%rd17+8];\n\tfma.rn.f32 \t%f64, %f62, %f16, %f63;\n\tst.global.f32 \t[%rd17+8], %f64;\n\tst.global.f32 \t[%rd4], %f54;\n\tst.global.f32 \t[%rd5], %f58;\n\tst.global.f32 \t[%rd6], %f62;\n\tret;\n\n}\n----\n++++\n\u003c/code\u003e\u003c/pre\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eHost-Compute (x86 CPU)\u003c/summary\u003e\n  Note that the compiler would usually directly output a \u003ca href=\"https://github.com/a2flo/floor/blob/master/etc/example/nbody_x86_64.bin\"\u003e.bin file\u003c/a\u003e (ELF format). The output below comes from disassembling it with \u003ccode\u003eobjdump -d\u003c/code\u003e.\n  Also note that this has been compiled for the \u003ca href=\"https://github.com/a2flo/floor/blob/master/compute/host/host_common.hpp#L44\"\u003e\u003ccode\u003ex86-5\u003c/code\u003e target\u003c/a\u003e (AVX-512+).\n  \n++++\n[source,Assembly]\n----\nnbody.bin:     file format elf64-x86-64\n\n\nDisassembly of section .text:\n\n0000000000000000 \u003csimplified_nbody\u003e:\n       0:\t55                   \tpush   %rbp\n       1:\t48 89 e5             \tmov    %rsp,%rbp\n       4:\t41 57                \tpush   %r15\n       6:\t41 56                \tpush   %r14\n       8:\t41 55                \tpush   %r13\n       a:\t41 54                \tpush   %r12\n       c:\t53                   \tpush   %rbx\n       d:\t48 83 e4 c0          \tand    $0xffffffffffffffc0,%rsp\n      11:\t48 81 ec 40 09 00 00 \tsub    $0x940,%rsp\n      18:\t48 8d 05 f9 ff ff ff \tlea    -0x7(%rip),%rax        # 18 \u003csimplified_nbody+0x18\u003e\n      1f:\t49 be 00 00 00 00 00 \tmovabs $0x0,%r14\n      26:\t00 00 00 \n      29:\t48 89 4c 24 50       \tmov    %rcx,0x50(%rsp)\n      2e:\t48 89 74 24 68       \tmov    %rsi,0x68(%rsp)\n      33:\t48 89 7c 24 48       \tmov    %rdi,0x48(%rsp)\n      38:\t49 01 c6             \tadd    %rax,%r14\n      3b:\t48 b8 00 00 00 00 00 \tmovabs $0x0,%rax\n      42:\t00 00 00 \n      45:\t49 8b 04 06          \tmov    (%r14,%rax,1),%rax\n      49:\t8b 00                \tmov    (%rax),%eax\n      4b:\t48 8d 0c 40          \tlea    (%rax,%rax,2),%rcx\n      4f:\t48 89 c6             \tmov    %rax,%rsi\n      52:\t48 c1 e6 04          \tshl    $0x4,%rsi\n      56:\t48 89 74 24 58       \tmov    %rsi,0x58(%rsp)\n      5b:\t48 8d 04 8a          \tlea    (%rdx,%rcx,4),%rax\n      5f:\tc5 fa 10 04 8a       \tvmovss (%rdx,%rcx,4),%xmm0\n      64:\tc5 f9 6e 54 8a 04    \tvmovd  0x4(%rdx,%rcx,4),%xmm2\n      6a:\tc5 fa 10 4c 8a 08    \tvmovss 0x8(%rdx,%rcx,4),%xmm1\n      70:\t48 89 44 24 60       \tmov    %rax,0x60(%rsp)\n      75:\t48 b8 00 00 00 00 00 \tmovabs $0x0,%rax\n      7c:\t00 00 00 \n      7f:\t49 8b 04 06          \tmov    (%r14,%rax,1),%rax\n      83:\t8b 18                \tmov    (%rax),%ebx\n      85:\tc5 fa 11 44 24 3c    \tvmovss %xmm0,0x3c(%rsp)\n      8b:\tc5 f9 7e 54 24 40    \tvmovd  %xmm2,0x40(%rsp)\n      91:\tc5 fa 11 4c 24 44    \tvmovss %xmm1,0x44(%rsp)\n      97:\t85 db                \ttest   %ebx,%ebx\n      99:\t0f 84 f9 16 00 00    \tje     1798 \u003csimplified_nbody+0x1798\u003e\n      9f:\t48 8b 44 24 48       \tmov    0x48(%rsp),%rax\n      a4:\t49 bd 00 00 00 00 00 \tmovabs $0x0,%r13\n      ab:\t00 00 00 \n      ae:\t45 31 ff             \txor    %r15d,%r15d\n      b1:\tc5 fa 10 04 30       \tvmovss (%rax,%rsi,1),%xmm0\n      b6:\tc5 fa 10 4c 30 04    \tvmovss 0x4(%rax,%rsi,1),%xmm1\n      bc:\tc5 fa 10 54 30 08    \tvmovss 0x8(%rax,%rsi,1),%xmm2\n      c2:\t48 b8 00 00 00 00 00 \tmovabs $0x0,%rax\n      c9:\t00 00 00 \n      cc:\t49 8b 04 06          \tmov    (%r14,%rax,1),%rax\n      d0:\t48 89 44 24 78       \tmov    %rax,0x78(%rsp)\n      d5:\t4b 8d 04 2e          \tlea    (%r14,%r13,1),%rax\n      d9:\t48 89 44 24 70       \tmov    %rax,0x70(%rsp)\n      de:\t48 b8 00 00 00 00 00 \tmovabs $0x0,%rax\n      e5:\t00 00 00 \n      e8:\t62 f2 7d 48 18 c0    \tvbroadcastss %xmm0,%zmm0\n      ee:\t4d 8b 24 06          \tmov    (%r14,%rax,1),%r12\n      f2:\t62 f2 7d 48 18 c9    \tvbroadcastss %xmm1,%zmm1\n      f8:\t48 b8 00 00 00 00 00 \tmovabs $0x0,%rax\n      ff:\t00 00 00 \n     102:\t62 f1 7c 48 29 44 24 \tvmovaps %zmm0,0x700(%rsp)\n     109:\t1c \n     10a:\t62 f2 7d 48 18 c2    \tvbroadcastss %xmm2,%zmm0\n     110:\t62 d2 fd 48 5b 14 06 \tvbroadcasti64x4 (%r14,%rax,1),%zmm2\n     117:\t48 b8 00 00 00 00 00 \tmovabs $0x0,%rax\n     11e:\t00 00 00 \n     121:\t62 f1 7c 48 29 4c 24 \tvmovaps %zmm1,0x6c0(%rsp)\n     128:\t1b \n     129:\t62 d2 fd 48 5b 0c 06 \tvbroadcasti64x4 (%r14,%rax,1),%zmm1\n     130:\t48 b8 00 00 00 00 00 \tmovabs $0x0,%rax\n     137:\t00 00 00 \n     13a:\t62 f1 7c 48 29 44 24 \tvmovaps %zmm0,0x680(%rsp)\n     141:\t1a \n     142:\tc5 f8 57 c0          \tvxorps %xmm0,%xmm0,%xmm0\n     146:\tc5 f8 29 84 24 80 00 \tvmovaps %xmm0,0x80(%rsp)\n     14d:\t00 00 \n     14f:\t62 f1 fd 48 7f 54 24 \tvmovdqa64 %zmm2,0x640(%rsp)\n     156:\t19 \n     157:\t62 d2 fd 48 5b 14 06 \tvbroadcasti64x4 (%r14,%rax,1),%zmm2\n     15e:\t48 b8 00 00 00 00 00 \tmovabs $0x0,%rax\n     165:\t00 00 00 \n     168:\t62 f1 fd 48 7f 4c 24 \tvmovdqa64 %zmm1,0x840(%rsp)\n     16f:\t21 \n     170:\t62 d2 7d 48 18 0c 06 \tvbroadcastss (%r14,%rax,1),%zmm1\n     177:\t48 b8 00 00 00 00 00 \tmovabs $0x0,%rax\n     17e:\t00 00 00 \n     181:\t62 f1 fd 48 7f 54 24 \tvmovdqa64 %zmm2,0x800(%rsp)\n     188:\t20 \n     189:\t62 d2 fd 48 5b 14 06 \tvbroadcasti64x4 (%r14,%rax,1),%zmm2\n     190:\t48 b8 00 00 00 00 00 \tmovabs $0x0,%rax\n     197:\t00 00 00 \n     19a:\t62 f1 7c 48 29 4c 24 \tvmovaps %zmm1,0x600(%rsp)\n     1a1:\t18 \n     1a2:\t62 d2 7d 48 18 0c 06 \tvbroadcastss (%r14,%rax,1),%zmm1\n     1a9:\t48 b8 00 00 00 00 00 \tmovabs $0x0,%rax\n     1b0:\t00 00 00 \n     1b3:\t62 d2 7d 48 18 04 06 \tvbroadcastss (%r14,%rax,1),%zmm0\n     1ba:\t62 f1 fd 48 7f 54 24 \tvmovdqa64 %zmm2,0x7c0(%rsp)\n     1c1:\t1f \n     1c2:\t62 f1 7c 48 29 4c 24 \tvmovaps %zmm1,0x780(%rsp)\n     1c9:\t1e \n     1ca:\t62 f1 7c 48 29 44 24 \tvmovaps %zmm0,0x740(%rsp)\n     1d1:\t1d \n     1d2:\tc5 f8 57 c0          \tvxorps %xmm0,%xmm0,%xmm0\n     1d6:\tc5 f8 29 84 24 c0 00 \tvmovaps %xmm0,0xc0(%rsp)\n     1dd:\t00 00 \n     1df:\tc5 f8 57 c0          \tvxorps %xmm0,%xmm0,%xmm0\n     1e3:\tc5 f8 29 84 24 00 01 \tvmovaps %xmm0,0x100(%rsp)\n     1ea:\t00 00 \n     1ec:\t0f 1f 40 00          \tnopl   0x0(%rax)\n     1f0:\t48 8b 44 24 78       \tmov    0x78(%rsp),%rax\n     1f5:\t48 8b 54 24 48       \tmov    0x48(%rsp),%rdx\n     1fa:\t8b 00                \tmov    (%rax),%eax\n     1fc:\t42 8d 0c 38          \tlea    (%rax,%r15,1),%ecx\n     200:\t48 c1 e0 04          \tshl    $0x4,%rax\n     204:\t48 c1 e1 04          \tshl    $0x4,%rcx\n     208:\tc5 f8 10 04 0a       \tvmovups (%rdx,%rcx,1),%xmm0\n     20d:\t48 8b 4c 24 70       \tmov    0x70(%rsp),%rcx\n     212:\tc5 f8 29 04 08       \tvmovaps %xmm0,(%rax,%rcx,1)\n     217:\tc5 f8 77             \tvzeroupper\n     21a:\t41 ff d4             \tcall   *%r12\n     21d:\t62 91 7c 48 28 5c 2e \tvmovaps 0x80(%r14,%r13,1),%zmm3\n     224:\t02 \n     225:\t62 f1 7c 48 28 64 24 \tvmovaps 0x640(%rsp),%zmm4\n     22c:\t19 \n     22d:\t62 81 7c 48 28 5c 2e \tvmovaps 0xc0(%r14,%r13,1),%zmm19\n     234:\t03 \n     235:\t62 91 7c 48 28 54 2e \tvmovaps 0x180(%r14,%r13,1),%zmm2\n     23c:\t06 \n     23d:\t62 11 7c 48 28 4c 2e \tvmovaps 0x100(%r14,%r13,1),%zmm9\n     244:\t04 \n     245:\t62 11 7c 48 28 6c 2e \tvmovaps 0x140(%r14,%r13,1),%zmm13\n     24c:\t05 \n     24d:\t62 81 7c 48 28 4c 2e \tvmovaps 0x1c0(%r14,%r13,1),%zmm17\n     254:\t07 \n     255:\t62 71 7c 48 28 74 24 \tvmovaps 0x800(%rsp),%zmm14\n     25c:\t20 \n     25d:\t62 91 7c 48 28 04 2e \tvmovaps (%r14,%r13,1),%zmm0\n     264:\t62 81 7c 48 28 54 2e \tvmovaps 0x40(%r14,%r13,1),%zmm18\n     26b:\t01 \n     26c:\t62 f1 7c 48 28 74 24 \tvmovaps 0x7c0(%rsp),%zmm6\n     273:\t1f \n     274:\t62 01 7c 48 28 44 2e \tvmovaps 0x280(%r14,%r13,1),%zmm24\n     27b:\t0a \n     27c:\t62 81 7c 48 28 74 2e \tvmovaps 0x200(%r14,%r13,1),%zmm22\n     283:\t08 \n     284:\t62 81 7c 48 28 6c 2e \tvmovaps 0x240(%r14,%r13,1),%zmm21\n     28b:\t09 \n     28c:\t62 81 7c 48 28 7c 2e \tvmovaps 0x2c0(%r14,%r13,1),%zmm23\n     293:\t0b \n     294:\t62 01 7c 48 28 64 2e \tvmovaps 0x380(%r14,%r13,1),%zmm28\n     29b:\t0e \n     29c:\t62 01 7c 48 28 54 2e \tvmovaps 0x300(%r14,%r13,1),%zmm26\n     2a3:\t0c \n     2a4:\t62 01 7c 48 28 5c 2e \tvmovaps 0x3c0(%r14,%r13,1),%zmm27\n     2ab:\t0f \n     2ac:\t62 f1 7c 48 28 cb    \tvmovaps %zmm3,%zmm1\n     2b2:\t62 e1 7c 48 28 e2    \tvmovaps %zmm2,%zmm20\n     2b8:\t62 d1 7c 48 28 e9    \tvmovaps %zmm9,%zmm5\n     2be:\t62 61 7c 48 28 ca    \tvmovaps %zmm2,%zmm25\n     2c4:\t62 f1 7c 48 28 f8    \tvmovaps %zmm0,%zmm7\n     2ca:\t62 71 7c 48 28 fb    \tvmovaps %zmm3,%zmm15\n     2d0:\t62 e1 7c 48 28 c0    \tvmovaps %zmm0,%zmm16\n     2d6:\t62 71 7c 48 28 c3    \tvmovaps %zmm3,%zmm8\n     2dc:\t62 71 7c 48 28 e0    \tvmovaps %zmm0,%zmm12\n     2e2:\t62 71 7c 48 28 d2    \tvmovaps %zmm2,%zmm10\n     2e8:\t62 b2 4d 48 7f db    \tvpermt2ps %zmm19,%zmm6,%zmm3\n     2ee:\t62 b2 4d 48 7f c2    \tvpermt2ps %zmm18,%zmm6,%zmm0\n     2f4:\t62 61 7c 48 28 f4    \tvmovaps %zmm4,%zmm30\n     2fa:\t62 b2 4d 48 7f d1    \tvpermt2ps %zmm17,%zmm6,%zmm2\n     300:\t62 51 7c 48 28 d9    \tvmovaps %zmm9,%zmm11\n     306:\t62 01 7c 48 28 e8    \tvmovaps %zmm24,%zmm29\n     30c:\t62 01 7c 48 28 fc    \tvmovaps %zmm28,%zmm31\n     312:\t62 b2 5d 48 7f cb    \tvpermt2ps %zmm19,%zmm4,%zmm1\n     318:\t62 a2 5d 48 7f e1    \tvpermt2ps %zmm17,%zmm4,%zmm20\n     31e:\t62 d2 5d 48 7f ed    \tvpermt2ps %zmm13,%zmm4,%zmm5\n     324:\t62 22 0d 48 7f c9    \tvpermt2ps %zmm17,%zmm14,%zmm25\n     32a:\t62 b2 5d 48 7f fa    \tvpermt2ps %zmm18,%zmm4,%zmm7\n     330:\t62 d1 7c 48 28 e1    \tvmovaps %zmm9,%zmm4\n     336:\t62 32 0d 48 7f fb    \tvpermt2ps %zmm19,%zmm14,%zmm15\n     33c:\t62 a2 0d 48 7f c2    \tvpermt2ps %zmm18,%zmm14,%zmm16\n     342:\t62 52 4d 48 7f cd    \tvpermt2ps %zmm13,%zmm6,%zmm9\n     348:\t62 52 0d 48 7f dd    \tvpermt2ps %zmm13,%zmm14,%zmm11\n     34e:\t62 91 7c 48 28 f2    \tvmovaps %zmm26,%zmm6\n     354:\t62 22 0d 40 7f ef    \tvpermt2ps %zmm23,%zmm30,%zmm29\n     35a:\t62 f3 fd 48 23 c3 e4 \tvshuff64x2 $0xe4,%zmm3,%zmm0,%zmm0\n     361:\t62 91 7c 48 28 dc    \tvmovaps %zmm28,%zmm3\n     367:\t62 f1 7c 48 29 4c 24 \tvmovaps %zmm1,0x140(%rsp)\n     36e:\t05 \n     36f:\t62 f1 7c 48 28 4c 24 \tvmovaps 0x840(%rsp),%zmm1\n     376:\t21 \n     377:\t62 b3 d5 48 23 ec e4 \tvshuff64x2 $0xe4,%zmm20,%zmm5,%zmm5\n     37e:\t62 61 7c 48 29 4c 24 \tvmovaps %zmm25,0x280(%rsp)\n     385:\t0a \n     386:\t62 01 7c 48 28 4c 2e \tvmovaps 0x340(%r14,%r13,1),%zmm25\n     38d:\t0d \n     38e:\t62 a1 7c 48 28 e6    \tvmovaps %zmm22,%zmm20\n     394:\t62 f3 b5 48 23 d2 e4 \tvshuff64x2 $0xe4,%zmm2,%zmm9,%zmm2\n     39b:\t62 71 7c 48 28 4c 24 \tvmovaps 0x640(%rsp),%zmm9\n     3a2:\t19 \n     3a3:\t62 92 0d 48 7f db    \tvpermt2ps %zmm27,%zmm14,%zmm3\n     3a9:\t62 f3 c5 48 23 7c 24 \tvshuff64x2 $0xe4,0x140(%rsp),%zmm7,%zmm7\n     3b0:\t05 e4 \n     3b2:\t62 a2 0d 48 7f e5    \tvpermt2ps %zmm21,%zmm14,%zmm20\n     3b8:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x140(%rsp)\n     3bf:\t05 \n     3c0:\t62 f1 fd 48 29 6c 24 \tvmovapd %zmm5,0x4c0(%rsp)\n     3c7:\t13 \n     3c8:\t62 f1 7c 48 28 6c 24 \tvmovaps 0x7c0(%rsp),%zmm5\n     3cf:\t1f \n     3d0:\t62 f1 fd 48 29 54 24 \tvmovapd %zmm2,0x500(%rsp)\n     3d7:\t14 \n     3d8:\t62 32 75 48 7f c3    \tvpermt2ps %zmm19,%zmm1,%zmm8\n     3de:\t62 32 75 48 7f e2    \tvpermt2ps %zmm18,%zmm1,%zmm12\n     3e4:\t62 a1 7c 48 28 de    \tvmovaps %zmm22,%zmm19\n     3ea:\t62 81 7c 48 28 d0    \tvmovaps %zmm24,%zmm18\n     3f0:\t62 32 75 48 7f d1    \tvpermt2ps %zmm17,%zmm1,%zmm10\n     3f6:\t62 81 7c 48 28 c8    \tvmovaps %zmm24,%zmm17\n     3fc:\t62 d2 75 48 7f e5    \tvpermt2ps %zmm13,%zmm1,%zmm4\n     402:\t62 11 7c 48 28 ee    \tvmovaps %zmm30,%zmm13\n     408:\t62 21 7c 48 28 f6    \tvmovaps %zmm22,%zmm30\n     40e:\t62 a2 0d 48 7f d7    \tvpermt2ps %zmm23,%zmm14,%zmm18\n     414:\t62 a2 75 48 7f cf    \tvpermt2ps %zmm23,%zmm1,%zmm17\n     41a:\t62 a2 75 48 7f dd    \tvpermt2ps %zmm21,%zmm1,%zmm19\n     420:\t62 02 15 48 7f fb    \tvpermt2ps %zmm27,%zmm13,%zmm31\n     426:\t62 92 15 48 7f f1    \tvpermt2ps %zmm25,%zmm13,%zmm6\n     42c:\t62 22 15 48 7f f5    \tvpermt2ps %zmm21,%zmm13,%zmm30\n     432:\t62 11 7c 48 28 ec    \tvmovaps %zmm28,%zmm13\n     438:\t62 f1 fd 48 29 7c 24 \tvmovapd %zmm7,0x240(%rsp)\n     43f:\t09 \n     440:\t62 f3 a5 48 23 7c 24 \tvshuff64x2 $0xe4,0x280(%rsp),%zmm11,%zmm7\n     447:\t0a e4 \n     449:\t62 02 55 48 7f e3    \tvpermt2ps %zmm27,%zmm5,%zmm28\n     44f:\t62 22 55 48 7f c7    \tvpermt2ps %zmm23,%zmm5,%zmm24\n     455:\t62 a2 55 48 7f f5    \tvpermt2ps %zmm21,%zmm5,%zmm22\n     45b:\t62 12 75 48 7f eb    \tvpermt2ps %zmm27,%zmm1,%zmm13\n     461:\t62 81 7c 48 28 7c 2e \tvmovaps 0x4c0(%r14,%r13,1),%zmm23\n     468:\t13 \n     469:\t62 e1 7c 48 28 6c 24 \tvmovaps 0x6c0(%rsp),%zmm21\n     470:\t1b \n     471:\t62 d3 dd 48 23 c2 e4 \tvshuff64x2 $0xe4,%zmm10,%zmm4,%zmm0\n     478:\t62 53 fd 40 23 d7 e4 \tvshuff64x2 $0xe4,%zmm15,%zmm16,%zmm10\n     47f:\t62 11 7c 48 28 fa    \tvmovaps %zmm26,%zmm15\n     485:\t62 53 9d 48 23 c0 e4 \tvshuff64x2 $0xe4,%zmm8,%zmm12,%zmm8\n     48c:\t62 11 7c 48 28 e2    \tvmovaps %zmm26,%zmm12\n     492:\t62 02 55 48 7f d1    \tvpermt2ps %zmm25,%zmm5,%zmm26\n     498:\t62 81 7c 48 28 44 2e \tvmovaps 0x540(%r14,%r13,1),%zmm16\n     49f:\t15 \n     4a0:\t62 33 e5 40 23 d9 e4 \tvshuff64x2 $0xe4,%zmm17,%zmm19,%zmm11\n     4a7:\t62 a3 dd 40 23 d2 e4 \tvshuff64x2 $0xe4,%zmm18,%zmm20,%zmm18\n     4ae:\t62 81 7c 48 28 64 2e \tvmovaps 0x580(%r14,%r13,1),%zmm20\n     4b5:\t16 \n     4b6:\t62 81 7c 48 28 4c 2e \tvmovaps 0x500(%r14,%r13,1),%zmm17\n     4bd:\t14 \n     4be:\t62 12 0d 48 7f f9    \tvpermt2ps %zmm25,%zmm14,%zmm15\n     4c4:\t62 12 75 48 7f e1    \tvpermt2ps %zmm25,%zmm1,%zmm12\n     4ca:\t62 01 7c 48 28 4c 2e \tvmovaps 0x5c0(%r14,%r13,1),%zmm25\n     4d1:\t17 \n     4d2:\t62 93 8d 40 23 d5 e4 \tvshuff64x2 $0xe4,%zmm29,%zmm30,%zmm2\n     4d9:\t62 e1 7c 48 28 5c 24 \tvmovaps 0x780(%rsp),%zmm19\n     4e0:\t1e \n     4e1:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x440(%rsp)\n     4e8:\t11 \n     4e9:\t62 93 cd 48 23 c7 e4 \tvshuff64x2 $0xe4,%zmm31,%zmm6,%zmm0\n     4f0:\t62 f1 fd 48 29 54 24 \tvmovapd %zmm2,0x200(%rsp)\n     4f7:\t08 \n     4f8:\t62 f1 7c 48 28 d5    \tvmovaps %zmm5,%zmm2\n     4fe:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x400(%rsp)\n     505:\t10 \n     506:\t62 93 cd 40 23 c0 e4 \tvshuff64x2 $0xe4,%zmm24,%zmm22,%zmm0\n     50d:\t62 81 7c 48 28 74 2e \tvmovaps 0x400(%r14,%r13,1),%zmm22\n     514:\t10 \n     515:\t62 01 7c 48 28 44 2e \tvmovaps 0x480(%r14,%r13,1),%zmm24\n     51c:\t12 \n     51d:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x480(%rsp)\n     524:\t12 \n     525:\t62 93 ad 40 23 e4 e4 \tvshuff64x2 $0xe4,%zmm28,%zmm26,%zmm4\n     52c:\t62 d3 9d 48 23 ed e4 \tvshuff64x2 $0xe4,%zmm13,%zmm12,%zmm5\n     533:\t62 f3 85 48 23 db e4 \tvshuff64x2 $0xe4,%zmm3,%zmm15,%zmm3\n     53a:\t62 21 7c 48 28 dc    \tvmovaps %zmm20,%zmm27\n     540:\t62 21 7c 48 28 e1    \tvmovaps %zmm17,%zmm28\n     546:\t62 f1 fd 48 29 64 24 \tvmovapd %zmm4,0x280(%rsp)\n     54d:\t0a \n     54e:\t62 91 7c 48 28 64 2e \tvmovaps 0x440(%r14,%r13,1),%zmm4\n     555:\t11 \n     556:\t62 21 7c 48 28 f4    \tvmovaps %zmm20,%zmm30\n     55c:\t62 21 7c 48 28 f9    \tvmovaps %zmm17,%zmm31\n     562:\t62 02 35 48 7f d9    \tvpermt2ps %zmm25,%zmm9,%zmm27\n     568:\t62 22 35 48 7f e0    \tvpermt2ps %zmm16,%zmm9,%zmm28\n     56e:\t62 02 0d 48 7f f1    \tvpermt2ps %zmm25,%zmm14,%zmm30\n     574:\t62 22 0d 48 7f f8    \tvpermt2ps %zmm16,%zmm14,%zmm31\n     57a:\t62 01 7c 48 28 d0    \tvmovaps %zmm24,%zmm26\n     580:\t62 31 7c 48 28 ee    \tvmovaps %zmm22,%zmm13\n     586:\t62 11 7c 48 28 f8    \tvmovaps %zmm24,%zmm15\n     58c:\t62 21 7c 48 28 ee    \tvmovaps %zmm22,%zmm29\n     592:\t62 22 35 48 7f d7    \tvpermt2ps %zmm23,%zmm9,%zmm26\n     598:\t62 32 75 48 7f ff    \tvpermt2ps %zmm23,%zmm1,%zmm15\n     59e:\t62 93 9d 40 23 f3 e4 \tvshuff64x2 $0xe4,%zmm27,%zmm28,%zmm6\n     5a5:\t62 72 35 48 7f ec    \tvpermt2ps %zmm4,%zmm9,%zmm13\n     5ab:\t62 21 7c 48 28 e4    \tvmovaps %zmm20,%zmm28\n     5b1:\t62 62 0d 48 7f ec    \tvpermt2ps %zmm4,%zmm14,%zmm29\n     5b7:\t62 02 75 48 7f e1    \tvpermt2ps %zmm25,%zmm1,%zmm28\n     5bd:\t62 f1 fd 48 29 74 24 \tvmovapd %zmm6,0x1c0(%rsp)\n     5c4:\t07 \n     5c5:\t62 b1 7c 48 28 f6    \tvmovaps %zmm22,%zmm6\n     5cb:\t62 f2 75 48 7f f4    \tvpermt2ps %zmm4,%zmm1,%zmm6\n     5d1:\t62 93 95 48 23 c2 e4 \tvshuff64x2 $0xe4,%zmm26,%zmm13,%zmm0\n     5d8:\t62 71 7c 48 28 e9    \tvmovaps %zmm1,%zmm13\n     5de:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x180(%rsp)\n     5e5:\t06 \n     5e6:\t62 d3 cd 48 23 c7 e4 \tvshuff64x2 $0xe4,%zmm15,%zmm6,%zmm0\n     5ed:\t62 f1 7c 48 28 74 24 \tvmovaps 0x600(%rsp),%zmm6\n     5f4:\t18 \n     5f5:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x300(%rsp)\n     5fc:\t0c \n     5fd:\t62 b1 7c 48 28 c1    \tvmovaps %zmm17,%zmm0\n     603:\t62 b2 75 48 7f c0    \tvpermt2ps %zmm16,%zmm1,%zmm0\n     609:\t62 f1 7c 48 28 4c 24 \tvmovaps 0x240(%rsp),%zmm1\n     610:\t09 \n     611:\t62 93 fd 48 23 c4 e4 \tvshuff64x2 $0xe4,%zmm28,%zmm0,%zmm0\n     618:\t62 61 7c 48 28 e2    \tvmovaps %zmm2,%zmm28\n     61e:\t62 e2 1d 40 7f f4    \tvpermt2ps %zmm4,%zmm28,%zmm22\n     624:\t62 f1 7c 48 28 64 24 \tvmovaps 0x4c0(%rsp),%zmm4\n     62b:\t13 \n     62c:\t62 a2 1d 40 7f c8    \tvpermt2ps %zmm16,%zmm28,%zmm17\n     632:\t62 82 1d 40 7f e1    \tvpermt2ps %zmm25,%zmm28,%zmm20\n     638:\t62 e1 7c 48 28 44 24 \tvmovaps 0x1c0(%rsp),%zmm16\n     63f:\t07 \n     640:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x2c0(%rsp)\n     647:\t0b \n     648:\t62 91 7c 48 28 c0    \tvmovaps %zmm24,%zmm0\n     64e:\t62 22 6d 48 7f c7    \tvpermt2ps %zmm23,%zmm2,%zmm24\n     654:\t62 f1 7c 48 28 54 24 \tvmovaps 0x680(%rsp),%zmm2\n     65b:\t1a \n     65c:\t62 b2 0d 48 7f c7    \tvpermt2ps %zmm23,%zmm14,%zmm0\n     662:\t62 e1 7c 48 28 7c 24 \tvmovaps 0x740(%rsp),%zmm23\n     669:\t1d \n     66a:\t62 a3 f5 40 23 e4 e4 \tvshuff64x2 $0xe4,%zmm20,%zmm17,%zmm20\n     671:\t62 83 cd 40 23 f0 e4 \tvshuff64x2 $0xe4,%zmm24,%zmm22,%zmm22\n     678:\t62 f3 95 40 23 c0 e4 \tvshuff64x2 $0xe4,%zmm0,%zmm29,%zmm0\n     67f:\t62 03 85 40 23 ee e4 \tvshuff64x2 $0xe4,%zmm30,%zmm31,%zmm29\n     686:\t62 21 3c 48 5c f5    \tvsubps %zmm21,%zmm8,%zmm30\n     68c:\t62 71 7c 48 28 44 24 \tvmovaps 0x440(%rsp),%zmm8\n     693:\t11 \n     694:\t62 61 2c 48 5c fa    \tvsubps %zmm2,%zmm10,%zmm31\n     69a:\t62 61 44 48 5c da    \tvsubps %zmm2,%zmm7,%zmm27\n     6a0:\t62 b1 7c 48 28 fb    \tvmovaps %zmm19,%zmm7\n     6a6:\t62 f1 64 48 5c da    \tvsubps %zmm2,%zmm3,%zmm3\n     6ac:\t62 f1 7c 48 29 5c 24 \tvmovaps %zmm3,0x240(%rsp)\n     6b3:\t09 \n     6b4:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x5c0(%rsp)\n     6bb:\t17 \n     6bc:\t62 f1 7c 48 28 44 24 \tvmovaps 0x700(%rsp),%zmm0\n     6c3:\t1c \n     6c4:\t62 21 3c 48 5c d5    \tvsubps %zmm21,%zmm8,%zmm26\n     6ca:\t62 71 74 48 5c e0    \tvsubps %zmm0,%zmm1,%zmm12\n     6d0:\t62 f1 5c 48 5c e0    \tvsubps %zmm0,%zmm4,%zmm4\n     6d6:\t62 e1 7c 40 5c c0    \tvsubps %zmm0,%zmm16,%zmm16\n     6dc:\t62 51 7c 48 28 cc    \tvmovaps %zmm12,%zmm9\n     6e2:\t62 71 7c 48 28 c4    \tvmovaps %zmm4,%zmm8\n     6e8:\t62 e1 7c 48 29 44 24 \tvmovaps %zmm16,0x340(%rsp)\n     6ef:\t0d \n     6f0:\t62 72 1d 48 a8 ce    \tvfmadd213ps %zmm6,%zmm12,%zmm9\n     6f6:\t62 72 5d 48 a8 c6    \tvfmadd213ps %zmm6,%zmm4,%zmm8\n     6fc:\t62 12 0d 40 b8 ce    \tvfmadd231ps %zmm30,%zmm30,%zmm9\n     702:\t62 12 2d 40 b8 c2    \tvfmadd231ps %zmm26,%zmm26,%zmm8\n     708:\t62 12 05 40 b8 cf    \tvfmadd231ps %zmm31,%zmm31,%zmm9\n     70e:\t62 12 25 40 b8 c3    \tvfmadd231ps %zmm27,%zmm27,%zmm8\n     714:\t62 52 7d 48 4e d1    \tvrsqrt14ps %zmm9,%zmm10\n     71a:\t62 52 7d 48 4e f8    \tvrsqrt14ps %zmm8,%zmm15\n     720:\t62 51 34 48 59 ca    \tvmulps %zmm10,%zmm9,%zmm9\n     726:\t62 51 3c 48 59 c7    \tvmulps %zmm15,%zmm8,%zmm8\n     72c:\t62 32 2d 48 a8 cb    \tvfmadd213ps %zmm19,%zmm10,%zmm9\n     732:\t62 31 2c 48 59 d7    \tvmulps %zmm23,%zmm10,%zmm10\n     738:\t62 32 05 48 a8 c3    \tvfmadd213ps %zmm19,%zmm15,%zmm8\n     73e:\t62 51 2c 48 59 d1    \tvmulps %zmm9,%zmm10,%zmm10\n     744:\t62 31 04 48 59 cf    \tvmulps %zmm23,%zmm15,%zmm9\n     74a:\t62 71 7c 48 28 7c 24 \tvmovaps 0x200(%rsp),%zmm15\n     751:\t08 \n     752:\t62 d1 34 48 59 c8    \tvmulps %zmm8,%zmm9,%zmm1\n     758:\t62 31 24 48 5c cd    \tvsubps %zmm21,%zmm11,%zmm9\n     75e:\t62 71 6c 40 5c c2    \tvsubps %zmm2,%zmm18,%zmm8\n     764:\t62 71 7c 48 29 4c 24 \tvmovaps %zmm9,0x200(%rsp)\n     76b:\t08 \n     76c:\t62 71 7c 48 29 44 24 \tvmovaps %zmm8,0x3c0(%rsp)\n     773:\t0f \n     774:\t62 e1 04 48 5c d8    \tvsubps %zmm0,%zmm15,%zmm19\n     77a:\t62 31 7c 48 28 db    \tvmovaps %zmm19,%zmm11\n     780:\t62 72 65 40 a8 de    \tvfmadd213ps %zmm6,%zmm19,%zmm11\n     786:\t62 52 35 48 b8 d9    \tvfmadd231ps %zmm9,%zmm9,%zmm11\n     78c:\t62 71 7c 48 28 4c 24 \tvmovaps 0x400(%rsp),%zmm9\n     793:\t10 \n     794:\t62 52 3d 48 b8 d8    \tvfmadd231ps %zmm8,%zmm8,%zmm11\n     79a:\t62 31 54 48 5c c5    \tvsubps %zmm21,%zmm5,%zmm8\n     7a0:\t62 c2 7d 48 4e d3    \tvrsqrt14ps %zmm11,%zmm18\n     7a6:\t62 71 7c 48 29 44 24 \tvmovaps %zmm8,0x380(%rsp)\n     7ad:\t0e \n     7ae:\t62 31 24 48 59 da    \tvmulps %zmm18,%zmm11,%zmm11\n     7b4:\t62 72 6d 40 a8 df    \tvfmadd213ps %zmm7,%zmm18,%zmm11\n     7ba:\t62 a1 6c 40 59 d7    \tvmulps %zmm23,%zmm18,%zmm18\n     7c0:\t62 c1 6c 40 59 d3    \tvmulps %zmm11,%zmm18,%zmm18\n     7c6:\t62 61 6c 40 59 44 24 \tvmulps 0x480(%rsp),%zmm18,%zmm24\n     7cd:\t12 \n     7ce:\t62 71 34 48 5c f8    \tvsubps %zmm0,%zmm9,%zmm15\n     7d4:\t62 d1 7c 48 28 ef    \tvmovaps %zmm15,%zmm5\n     7da:\t62 f2 05 48 a8 ee    \tvfmadd213ps %zmm6,%zmm15,%zmm5\n     7e0:\t62 d2 3d 48 b8 e8    \tvfmadd231ps %zmm8,%zmm8,%zmm5\n     7e6:\t62 71 74 48 59 44 24 \tvmulps 0x500(%rsp),%zmm1,%zmm8\n     7ed:\t14 \n     7ee:\t62 f1 74 48 59 c9    \tvmulps %zmm1,%zmm1,%zmm1\n     7f4:\t62 f2 65 48 b8 eb    \tvfmadd231ps %zmm3,%zmm3,%zmm5\n     7fa:\t62 f1 2c 48 59 5c 24 \tvmulps 0x140(%rsp),%zmm10,%zmm3\n     801:\t05 \n     802:\t62 51 2c 48 59 d2    \tvmulps %zmm10,%zmm10,%zmm10\n     808:\t62 72 7d 48 4e dd    \tvrsqrt14ps %zmm5,%zmm11\n     80e:\t62 d1 54 48 59 eb    \tvmulps %zmm11,%zmm5,%zmm5\n     814:\t62 f2 25 48 a8 ef    \tvfmadd213ps %zmm7,%zmm11,%zmm5\n     81a:\t62 31 24 48 59 df    \tvmulps %zmm23,%zmm11,%zmm11\n     820:\t62 51 74 48 59 c0    \tvmulps %zmm8,%zmm1,%zmm8\n     826:\t62 91 7c 48 28 4c 2e \tvmovaps 0x780(%r14,%r13,1),%zmm1\n     82d:\t1e \n     82e:\t62 61 2c 48 59 cb    \tvmulps %zmm3,%zmm10,%zmm25\n     834:\tc4 41 28 57 d2       \tvxorps %xmm10,%xmm10,%xmm10\n     839:\tc4 63 29 0c 8c 24 00 \tvblendps $0x1,0x100(%rsp),%xmm10,%xmm9\n     840:\t01 00 00 01 \n     844:\t62 f1 24 48 59 ed    \tvmulps %zmm5,%zmm11,%zmm5\n     84a:\tc4 63 29 0c 9c 24 c0 \tvblendps $0x1,0xc0(%rsp),%xmm10,%xmm11\n     851:\t00 00 00 01 \n     855:\tc4 e3 29 0c 9c 24 80 \tvblendps $0x1,0x80(%rsp),%xmm10,%xmm3\n     85c:\t00 00 00 01 \n     860:\t62 71 7c 48 28 54 24 \tvmovaps 0x180(%rsp),%zmm10\n     867:\t06 \n     868:\t62 e1 2c 48 5c c8    \tvsubps %zmm0,%zmm10,%zmm17\n     86e:\t62 f1 7c 48 28 44 24 \tvmovaps 0x300(%rsp),%zmm0\n     875:\t0c \n     876:\t62 71 7c 48 28 54 24 \tvmovaps 0x2c0(%rsp),%zmm10\n     87d:\t0b \n     87e:\t62 f1 7c 48 29 5c 24 \tvmovaps %zmm3,0x100(%rsp)\n     885:\t04 \n     886:\t62 b1 6c 40 59 da    \tvmulps %zmm18,%zmm18,%zmm3\n     88c:\t62 e1 54 48 59 54 24 \tvmulps 0x280(%rsp),%zmm5,%zmm18\n     893:\t0a \n     894:\t62 f1 54 48 59 ed    \tvmulps %zmm5,%zmm5,%zmm5\n     89a:\t62 12 35 40 b8 de    \tvfmadd231ps %zmm30,%zmm25,%zmm11\n     8a0:\t62 01 7c 48 28 74 2e \tvmovaps 0x600(%r14,%r13,1),%zmm30\n     8a7:\t18 \n     8a8:\t62 52 35 40 b8 cc    \tvfmadd231ps %zmm12,%zmm25,%zmm9\n     8ae:\t62 01 64 48 59 c0    \tvmulps %zmm24,%zmm3,%zmm24\n     8b4:\t62 f1 14 40 5c da    \tvsubps %zmm2,%zmm29,%zmm3\n     8ba:\t62 21 7c 48 28 e8    \tvmovaps %zmm16,%zmm29\n     8c0:\t62 e1 7c 48 29 4c 24 \tvmovaps %zmm17,0x80(%rsp)\n     8c7:\t02 \n     8c8:\t62 e2 75 40 a8 ce    \tvfmadd213ps %zmm6,%zmm17,%zmm17\n     8ce:\t62 62 15 40 a8 ee    \tvfmadd213ps %zmm6,%zmm29,%zmm29\n     8d4:\t62 f1 3c 48 59 f4    \tvmulps %zmm4,%zmm8,%zmm6\n     8da:\t62 32 3d 40 b8 cb    \tvfmadd231ps %zmm19,%zmm24,%zmm9\n     8e0:\t62 81 7c 48 28 5c 2e \tvmovaps 0x700(%r14,%r13,1),%zmm19\n     8e7:\t1c \n     8e8:\t62 f1 7c 48 29 5c 24 \tvmovaps %zmm3,0x1c0(%rsp)\n     8ef:\t07 \n     8f0:\t62 a1 54 48 59 c2    \tvmulps %zmm18,%zmm5,%zmm16\n     8f6:\t62 e1 7c 48 28 54 24 \tvmovaps 0x640(%rsp),%zmm18\n     8fd:\t19 \n     8fe:\t62 d2 7d 40 b8 f7    \tvfmadd231ps %zmm15,%zmm16,%zmm6\n     904:\t62 11 7c 48 28 fc    \tvmovaps %zmm28,%zmm15\n     90a:\t62 b1 7c 48 5c c5    \tvsubps %zmm21,%zmm0,%zmm0\n     910:\t62 31 2c 48 5c d5    \tvsubps %zmm21,%zmm10,%zmm10\n     916:\t62 e1 7c 48 28 6c 24 \tvmovaps 0x5c0(%rsp),%zmm21\n     91d:\t17 \n     91e:\t62 11 7c 48 28 e6    \tvmovaps %zmm30,%zmm12\n     924:\t62 e2 7d 48 b8 c8    \tvfmadd231ps %zmm0,%zmm0,%zmm17\n     92a:\t62 42 2d 48 b8 ea    \tvfmadd231ps %zmm10,%zmm10,%zmm29\n     930:\t62 f1 7c 48 29 44 24 \tvmovaps %zmm0,0xc0(%rsp)\n     937:\t03 \n     938:\t62 71 7c 48 29 54 24 \tvmovaps %zmm10,0x140(%rsp)\n     93f:\t05 \n     940:\t62 51 7c 48 28 d3    \tvmovaps %zmm11,%zmm10\n     946:\t62 71 7c 48 28 de    \tvmovaps %zmm6,%zmm11\n     94c:\t62 62 65 48 b8 eb    \tvfmadd231ps %zmm3,%zmm3,%zmm29\n     952:\t62 b1 7c 48 28 f3    \tvmovaps %zmm19,%zmm6\n     958:\t62 92 7d 48 4e c5    \tvrsqrt14ps %zmm29,%zmm0\n     95e:\t62 f1 14 40 59 e8    \tvmulps %zmm0,%zmm29,%zmm5\n     964:\t62 f2 7d 48 a8 ef    \tvfmadd213ps %zmm7,%zmm0,%zmm5\n     96a:\t62 e1 54 40 5c ea    \tvsubps %zmm2,%zmm21,%zmm21\n     970:\t62 a2 55 40 b8 cd    \tvfmadd231ps %zmm21,%zmm21,%zmm17\n     976:\t62 e1 7c 48 29 6c 24 \tvmovaps %zmm21,0x180(%rsp)\n     97d:\t06 \n     97e:\t62 81 3c 48 59 ea    \tvmulps %zmm26,%zmm8,%zmm21\n     984:\t62 01 7c 48 28 54 2e \tvmovaps 0x940(%r14,%r13,1),%zmm26\n     98b:\t25 \n     98c:\t62 b2 7d 48 4e d1    \tvrsqrt14ps %zmm17,%zmm2\n     992:\t62 e2 7d 40 b8 6c 24 \tvfmadd231ps 0x380(%rsp),%zmm16,%zmm21\n     999:\t0e \n     99a:\t62 f1 74 40 59 e2    \tvmulps %zmm2,%zmm17,%zmm4\n     9a0:\t62 f2 6d 48 a8 e7    \tvfmadd213ps %zmm7,%zmm2,%zmm4\n     9a6:\t62 b1 6c 48 59 d7    \tvmulps %zmm23,%zmm2,%zmm2\n     9ac:\t62 f1 6c 48 59 d4    \tvmulps %zmm4,%zmm2,%zmm2\n     9b2:\t62 b1 7c 48 59 e7    \tvmulps %zmm23,%zmm0,%zmm4\n     9b8:\t62 81 3c 48 59 fb    \tvmulps %zmm27,%zmm8,%zmm23\n     9be:\t62 51 7c 48 28 c1    \tvmovaps %zmm9,%zmm8\n     9c4:\t62 01 7c 48 28 5c 2e \tvmovaps 0xb40(%r14,%r13,1),%zmm27\n     9cb:\t2d \n     9cc:\t62 61 5c 48 59 ed    \tvmulps %zmm5,%zmm4,%zmm29\n     9d2:\t62 f1 7c 48 28 6c 24 \tvmovaps 0x100(%rsp),%zmm5\n     9d9:\t04 \n     9da:\t62 f1 6c 48 59 e2    \tvmulps %zmm2,%zmm2,%zmm4\n     9e0:\t62 f1 4c 40 59 d2    \tvmulps %zmm2,%zmm22,%zmm2\n     9e6:\t62 81 7c 48 28 74 2e \tvmovaps 0x640(%r14,%r13,1),%zmm22\n     9ed:\t19 \n     9ee:\t62 e1 5c 48 59 ca    \tvmulps %zmm2,%zmm4,%zmm17\n     9f4:\t62 91 7c 48 28 54 2e \tvmovaps 0x6c0(%r14,%r13,1),%zmm2\n     9fb:\t1b \n     9fc:\t62 91 7c 48 28 64 2e \tvmovaps 0x740(%r14,%r13,1),%zmm4\n     a03:\t1d \n     a04:\t62 72 75 40 b8 44 24 \tvfmadd231ps 0x80(%rsp),%zmm17,%zmm8\n     a0b:\t02 \n     a0c:\t62 71 7c 48 29 44 24 \tvmovaps %zmm8,0x80(%rsp)\n     a13:\t02 \n     a14:\t62 71 7c 48 28 44 24 \tvmovaps 0x600(%rsp),%zmm8\n     a1b:\t18 \n     a1c:\t62 92 35 40 b8 ef    \tvfmadd231ps %zmm31,%zmm25,%zmm5\n     a22:\t62 01 7c 48 28 7c 2e \tvmovaps 0x680(%r14,%r13,1),%zmm31\n     a29:\t1a \n     a2a:\t62 01 7c 48 28 4c 2e \tvmovaps 0x7c0(%r14,%r13,1),%zmm25\n     a31:\t1f \n     a32:\t62 32 6d 40 7f e6    \tvpermt2ps %zmm22,%zmm18,%zmm12\n     a38:\t62 f2 15 48 7f f4    \tvpermt2ps %zmm4,%zmm13,%zmm6\n     a3e:\t62 f2 3d 40 b8 6c 24 \tvfmadd231ps 0x3c0(%rsp),%zmm24,%zmm5\n     a45:\t0f \n     a46:\t62 f2 75 40 b8 6c 24 \tvfmadd231ps 0x180(%rsp),%zmm17,%zmm5\n     a4d:\t06 \n     a4e:\t62 91 7c 48 28 ff    \tvmovaps %zmm31,%zmm7\n     a54:\t62 11 7c 48 28 cf    \tvmovaps %zmm31,%zmm9\n     a5a:\t62 f2 6d 40 7f fa    \tvpermt2ps %zmm2,%zmm18,%zmm7\n     a60:\t62 72 15 48 7f ca    \tvpermt2ps %zmm2,%zmm13,%zmm9\n     a66:\t62 f1 7c 48 29 6c 24 \tvmovaps %zmm5,0x100(%rsp)\n     a6d:\t04 \n     a6e:\t62 f1 7c 48 28 6c 24 \tvmovaps 0x700(%rsp),%zmm5\n     a75:\t1c \n     a76:\t62 f3 9d 48 23 c7 e4 \tvshuff64x2 $0xe4,%zmm7,%zmm12,%zmm0\n     a7d:\t62 f1 7c 48 28 f9    \tvmovaps %zmm1,%zmm7\n     a83:\t62 31 7c 48 28 e3    \tvmovaps %zmm19,%zmm12\n     a89:\t62 92 6d 40 7f f9    \tvpermt2ps %zmm25,%zmm18,%zmm7\n     a8f:\t62 72 6d 40 7f e4    \tvpermt2ps %zmm4,%zmm18,%zmm12\n     a95:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x4c0(%rsp)\n     a9c:\t13 \n     a9d:\t62 f3 9d 48 23 df e4 \tvshuff64x2 $0xe4,%zmm7,%zmm12,%zmm3\n     aa4:\t62 71 7c 48 28 e1    \tvmovaps %zmm1,%zmm12\n     aaa:\t62 91 7c 48 28 fe    \tvmovaps %zmm30,%zmm7\n     ab0:\t62 12 15 48 7f e1    \tvpermt2ps %zmm25,%zmm13,%zmm12\n     ab6:\t62 b2 15 48 7f fe    \tvpermt2ps %zmm22,%zmm13,%zmm7\n     abc:\t62 f1 fd 48 29 5c 24 \tvmovapd %zmm3,0x500(%rsp)\n     ac3:\t14 \n     ac4:\t62 d3 cd 48 23 f4 e4 \tvshuff64x2 $0xe4,%zmm12,%zmm6,%zmm6\n     acb:\t62 53 c5 48 23 c9 e4 \tvshuff64x2 $0xe4,%zmm9,%zmm7,%zmm9\n     ad2:\t62 91 7c 48 28 ff    \tvmovaps %zmm31,%zmm7\n     ad8:\t62 62 1d 40 7f fa    \tvpermt2ps %zmm2,%zmm28,%zmm31\n     ade:\t62 11 7c 48 28 64 2e \tvmovaps 0x980(%r14,%r13,1),%zmm12\n     ae5:\t26 \n     ae6:\t62 f2 0d 48 7f fa    \tvpermt2ps %zmm2,%zmm14,%zmm7\n     aec:\t62 f1 7c 48 28 d1    \tvmovaps %zmm1,%zmm2\n     af2:\t62 92 1d 40 7f c9    \tvpermt2ps %zmm25,%zmm28,%zmm1\n     af8:\t62 f1 fd 48 29 74 24 \tvmovapd %zmm6,0x440(%rsp)\n     aff:\t11 \n     b00:\t62 91 7c 48 28 f6    \tvmovaps %zmm30,%zmm6\n     b06:\t62 92 0d 48 7f d1    \tvpermt2ps %zmm25,%zmm14,%zmm2\n     b0c:\t62 22 1d 40 7f f6    \tvpermt2ps %zmm22,%zmm28,%zmm30\n     b12:\t62 01 7c 48 28 4c 2e \tvmovaps 0x880(%r14,%r13,1),%zmm25\n     b19:\t22 \n     b1a:\t62 b2 0d 48 7f f6    \tvpermt2ps %zmm22,%zmm14,%zmm6\n     b20:\t62 f3 cd 48 23 df e4 \tvshuff64x2 $0xe4,%zmm7,%zmm6,%zmm3\n     b27:\t62 91 5c 40 59 f5    \tvmulps %zmm29,%zmm20,%zmm6\n     b2d:\t62 93 8d 40 23 ff e4 \tvshuff64x2 $0xe4,%zmm31,%zmm30,%zmm7\n     b34:\t62 01 7c 48 28 74 2e \tvmovaps 0x9c0(%r14,%r13,1),%zmm30\n     b3b:\t27 \n     b3c:\t62 c1 7c 48 28 e4    \tvmovaps %zmm12,%zmm20\n     b42:\t62 01 7c 48 28 7c 2e \tvmovaps 0xa80(%r14,%r13,1),%zmm31\n     b49:\t2a \n     b4a:\t62 f1 fd 48 29 5c 24 \tvmovapd %zmm3,0x400(%rsp)\n     b51:\t10 \n     b52:\t62 d1 7c 48 28 da    \tvmovaps %zmm10,%zmm3\n     b58:\t62 31 7c 48 28 d3    \tvmovaps %zmm19,%zmm10\n     b5e:\t62 e2 1d 40 7f dc    \tvpermt2ps %zmm4,%zmm28,%zmm19\n     b64:\t62 f1 fd 48 29 7c 24 \tvmovapd %zmm7,0x280(%rsp)\n     b6b:\t0a \n     b6c:\t62 72 0d 48 7f d4    \tvpermt2ps %zmm4,%zmm14,%zmm10\n     b72:\t62 91 7c 48 28 64 2e \tvmovaps 0x800(%r14,%r13,1),%zmm4\n     b79:\t20 \n     b7a:\t62 f2 3d 40 b8 5c 24 \tvfmadd231ps 0x200(%rsp),%zmm24,%zmm3\n     b81:\t08 \n     b82:\t62 01 7c 48 28 44 2e \tvmovaps 0x8c0(%r14,%r13,1),%zmm24\n     b89:\t23 \n     b8a:\t62 f2 75 40 b8 5c 24 \tvfmadd231ps 0xc0(%rsp),%zmm17,%zmm3\n     b91:\t03 \n     b92:\t62 82 15 48 7f e6    \tvpermt2ps %zmm30,%zmm13,%zmm20\n     b98:\t62 81 7c 48 28 cf    \tvmovaps %zmm31,%zmm17\n     b9e:\t62 f3 e5 40 23 c1 e4 \tvshuff64x2 $0xe4,%zmm1,%zmm19,%zmm0\n     ba5:\t62 91 7c 48 28 4c 2e \tvmovaps 0x840(%r14,%r13,1),%zmm1\n     bac:\t21 \n     bad:\t62 81 7c 48 28 d9    \tvmovaps %zmm25,%zmm19\n     bb3:\t62 f3 ad 48 23 d2 e4 \tvshuff64x2 $0xe4,%zmm2,%zmm10,%zmm2\n     bba:\t62 11 14 40 59 d5    \tvmulps %zmm29,%zmm29,%zmm10\n     bc0:\t62 01 7c 48 28 6c 2e \tvmovaps 0x900(%r14,%r13,1),%zmm29\n     bc7:\t24 \n     bc8:\t62 82 6d 40 7f d8    \tvpermt2ps %zmm24,%zmm18,%zmm19\n     bce:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x480(%rsp)\n     bd5:\t12 \n     bd6:\t62 f1 7c 48 29 5c 24 \tvmovaps %zmm3,0xc0(%rsp)\n     bdd:\t03 \n     bde:\t62 f1 fd 48 29 54 24 \tvmovapd %zmm2,0x200(%rsp)\n     be5:\t08 \n     be6:\t62 b1 7c 48 28 d7    \tvmovaps %zmm23,%zmm2\n     bec:\t62 e1 2c 48 59 fe    \tvmulps %zmm6,%zmm10,%zmm23\n     bf2:\t62 f1 7c 48 28 f4    \tvmovaps %zmm4,%zmm6\n     bf8:\t62 51 7c 48 28 d4    \tvmovaps %zmm12,%zmm10\n     bfe:\t62 12 6d 40 7f d6    \tvpermt2ps %zmm30,%zmm18,%zmm10\n     c04:\t62 72 45 40 b8 5c 24 \tvfmadd231ps 0x340(%rsp),%zmm23,%zmm11\n     c0b:\t0d \n     c0c:\t62 f2 7d 40 b8 54 24 \tvfmadd231ps 0x240(%rsp),%zmm16,%zmm2\n     c13:\t09 \n     c14:\t62 e2 45 40 b8 6c 24 \tvfmadd231ps 0x140(%rsp),%zmm23,%zmm21\n     c1b:\t05 \n     c1c:\t62 f2 6d 40 7f f1    \tvpermt2ps %zmm1,%zmm18,%zmm6\n     c22:\t62 f2 45 40 b8 54 24 \tvfmadd231ps 0x1c0(%rsp),%zmm23,%zmm2\n     c29:\t07 \n     c2a:\t62 e1 7c 48 29 6c 24 \tvmovaps %zmm21,0x140(%rsp)\n     c31:\t05 \n     c32:\t62 71 7c 48 29 5c 24 \tvmovaps %zmm11,0x240(%rsp)\n     c39:\t09 \n     c3a:\t62 81 7c 48 28 f5    \tvmovaps %zmm29,%zmm22\n     c40:\t62 91 7c 48 28 fd    \tvmovaps %zmm29,%zmm7\n     c46:\t62 82 6d 40 7f f2    \tvpermt2ps %zmm26,%zmm18,%zmm22\n     c4c:\t62 92 15 48 7f fa    \tvpermt2ps %zmm26,%zmm13,%zmm7\n     c52:\t62 f1 7c 48 29 54 24 \tvmovaps %zmm2,0x1c0(%rsp)\n     c59:\t07 \n     c5a:\t62 b3 cd 48 23 c3 e4 \tvshuff64x2 $0xe4,%zmm19,%zmm6,%zmm0\n     c61:\t62 91 7c 48 28 f1    \tvmovaps %zmm25,%zmm6\n     c67:\t62 92 15 48 7f f0    \tvpermt2ps %zmm24,%zmm13,%zmm6\n     c6d:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x380(%rsp)\n     c74:\t0e \n     c75:\t62 d3 cd 40 23 c2 e4 \tvshuff64x2 $0xe4,%zmm10,%zmm22,%zmm0\n     c7c:\t62 71 7c 48 28 d4    \tvmovaps %zmm4,%zmm10\n     c82:\t62 72 0d 48 7f d1    \tvpermt2ps %zmm1,%zmm14,%zmm10\n     c88:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x3c0(%rsp)\n     c8f:\t0f \n     c90:\t62 f1 7c 48 28 c4    \tvmovaps %zmm4,%zmm0\n     c96:\t62 f2 1d 40 7f e1    \tvpermt2ps %zmm1,%zmm28,%zmm4\n     c9c:\t62 f2 15 48 7f c1    \tvpermt2ps %zmm1,%zmm13,%zmm0\n     ca2:\t62 f3 fd 48 23 c6 e4 \tvshuff64x2 $0xe4,%zmm6,%zmm0,%zmm0\n     ca9:\t62 91 7c 48 28 74 2e \tvmovaps 0xb80(%r14,%r13,1),%zmm6\n     cb0:\t2e \n     cb1:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x2c0(%rsp)\n     cb8:\t0b \n     cb9:\t62 b3 c5 48 23 c4 e4 \tvshuff64x2 $0xe4,%zmm20,%zmm7,%zmm0\n     cc0:\t62 f1 7c 48 28 7c 24 \tvmovaps 0x6c0(%rsp),%zmm7\n     cc7:\t1b \n     cc8:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x340(%rsp)\n     ccf:\t0d \n     cd0:\t62 91 7c 48 28 c1    \tvmovaps %zmm25,%zmm0\n     cd6:\t62 02 1d 40 7f c8    \tvpermt2ps %zmm24,%zmm28,%zmm25\n     cdc:\t62 92 0d 48 7f c0    \tvpermt2ps %zmm24,%zmm14,%zmm0\n     ce2:\t62 41 7c 48 28 c4    \tvmovaps %zmm12,%zmm24\n     ce8:\t62 12 1d 40 7f e6    \tvpermt2ps %zmm30,%zmm28,%zmm12\n     cee:\t62 02 0d 48 7f c6    \tvpermt2ps %zmm30,%zmm14,%zmm24\n     cf4:\t62 01 7c 48 28 74 2e \tvmovaps 0xa40(%r14,%r13,1),%zmm30\n     cfb:\t29 \n     cfc:\t62 93 dd 48 23 c9 e4 \tvshuff64x2 $0xe4,%zmm25,%zmm4,%zmm1\n     d03:\t62 91 7c 48 28 64 2e \tvmovaps 0xbc0(%r14,%r13,1),%zmm4\n     d0a:\t2f \n     d0b:\t62 e3 ad 48 23 f0 e4 \tvshuff64x2 $0xe4,%zmm0,%zmm10,%zmm22\n     d12:\t62 91 7c 48 28 c5    \tvmovaps %zmm29,%zmm0\n     d18:\t62 02 1d 40 7f ea    \tvpermt2ps %zmm26,%zmm28,%zmm29\n     d1e:\t62 01 7c 48 28 64 2e \tvmovaps 0xa00(%r14,%r13,1),%zmm28\n     d25:\t28 \n     d26:\t62 11 7c 48 28 54 2e \tvmovaps 0xac0(%r14,%r13,1),%zmm10\n     d2d:\t2b \n     d2e:\t62 92 0d 48 7f c2    \tvpermt2ps %zmm26,%zmm14,%zmm0\n     d34:\t62 f1 fd 48 29 4c 24 \tvmovapd %zmm1,0x300(%rsp)\n     d3b:\t0c \n     d3c:\t62 91 7c 48 28 4c 2e \tvmovaps 0xb00(%r14,%r13,1),%zmm1\n     d43:\t2c \n     d44:\t62 61 7c 48 28 54 24 \tvmovaps 0x780(%rsp),%zmm26\n     d4b:\t1e \n     d4c:\t62 d3 95 40 23 dc e4 \tvshuff64x2 $0xe4,%zmm12,%zmm29,%zmm3\n     d53:\t62 71 34 48 5c e7    \tvsubps %zmm7,%zmm9,%zmm12\n     d59:\t62 71 7c 48 28 4c 24 \tvmovaps 0x680(%rsp),%zmm9\n     d60:\t1a \n     d61:\t62 81 7c 48 28 fc    \tvmovaps %zmm28,%zmm23\n     d67:\t62 c2 6d 40 7f ca    \tvpermt2ps %zmm10,%zmm18,%zmm17\n     d6d:\t62 83 fd 48 23 c0 e4 \tvshuff64x2 $0xe4,%zmm24,%zmm0,%zmm16\n     d74:\t62 01 7c 48 28 c7    \tvmovaps %zmm31,%zmm24\n     d7a:\t62 01 7c 48 28 cc    \tvmovaps %zmm28,%zmm25\n     d80:\t62 82 6d 40 7f fe    \tvpermt2ps %zmm30,%zmm18,%zmm23\n     d86:\t62 42 15 48 7f c2    \tvpermt2ps %zmm10,%zmm13,%zmm24\n     d8c:\t62 02 15 48 7f ce    \tvpermt2ps %zmm30,%zmm13,%zmm25\n     d92:\t62 f1 fd 48 29 5c 24 \tvmovapd %zmm3,0x180(%rsp)\n     d99:\t06 \n     d9a:\t62 f1 7c 48 28 5c 24 \tvmovaps 0x4c0(%rsp),%zmm3\n     da1:\t13 \n     da2:\t62 c1 7c 40 5c c1    \tvsubps %zmm9,%zmm16,%zmm16\n     da8:\t62 b3 c5 40 23 c1 e4 \tvshuff64x2 $0xe4,%zmm17,%zmm23,%zmm0\n     daf:\t62 e1 7c 48 28 ce    \tvmovaps %zmm6,%zmm17\n     db5:\t62 e1 7c 48 28 f9    \tvmovaps %zmm1,%zmm23\n     dbb:\t62 93 b5 40 23 d0 e4 \tvshuff64x2 $0xe4,%zmm24,%zmm25,%zmm2\n     dc2:\t62 e2 6d 40 7f cc    \tvpermt2ps %zmm4,%zmm18,%zmm17\n     dc8:\t62 82 15 48 7f fb    \tvpermt2ps %zmm27,%zmm13,%zmm23\n     dce:\t62 71 64 48 5c dd    \tvsubps %zmm5,%zmm3,%zmm11\n     dd4:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x540(%rsp)\n     ddb:\t15 \n     ddc:\t62 f1 7c 48 28 c1    \tvmovaps %zmm1,%zmm0\n     de2:\t62 f1 fd 48 29 54 24 \tvmovapd %zmm2,0x8c0(%rsp)\n     de9:\t23 \n     dea:\t62 f1 7c 48 28 54 24 \tvmovaps 0x400(%rsp),%zmm2\n     df1:\t10 \n     df2:\t62 92 6d 40 7f c3    \tvpermt2ps %zmm27,%zmm18,%zmm0\n     df8:\t62 b3 fd 48 23 c1 e4 \tvshuff64x2 $0xe4,%zmm17,%zmm0,%zmm0\n     dff:\t62 e1 7c 48 28 ce    \tvmovaps %zmm6,%zmm17\n     e05:\t62 c1 6c 48 5c e1    \tvsubps %zmm9,%zmm2,%zmm20\n     e0b:\t62 f1 7c 48 28 54 24 \tvmovaps 0x440(%rsp),%zmm2\n     e12:\t11 \n     e13:\t62 e1 7c 48 29 44 24 \tvmovaps %zmm16,0x440(%rsp)\n     e1a:\t11 \n     e1b:\t62 e2 15 48 7f cc    \tvpermt2ps %zmm4,%zmm13,%zmm17\n     e21:\t62 f1 fd 48 29 44 24 \tvmovapd %zmm0,0x880(%rsp)\n     e28:\t22 \n     e29:\t62 a3 c5 40 23 e9 e4 \tvshuff64x2 $0xe4,%zmm17,%zmm23,%zmm21\n     e30:\t62 81 7c 48 28 cf    \tvmovaps %zmm31,%zmm17\n     e36:\t62 42 05 48 7f fa    \tvpermt2ps %zmm10,%zmm15,%zmm31\n     e3c:\t62 e1 7c 48 28 7c 24 \tvmovaps 0x740(%rsp),%zmm23\n     e43:\t1d \n     e44:\t62 f1 6c 48 5c d7    \tvsubps %zmm7,%zmm2,%zmm2\n     e4a:\t62 c2 0d 48 7f ca    \tvpermt2ps %zmm10,%zmm14,%zmm17\n     e50:\t62 11 7c 48 28 d4    \tvmovaps %zmm28,%zmm10\n     e56:\t62 f1 7c 48 29 54 24 \tvmovaps %zmm2,0x580(%rsp)\n     e5d:\t16 \n     e5e:\t62 02 05 48 7f e6    \tvpermt2ps %zmm30,%zmm15,%zmm28\n     e64:\t62 12 0d 48 7f d6    \tvpermt2ps %zmm30,%zmm14,%zmm10\n     e6a:\t62 03 9d 40 23 f7 e4 \tvshuff64x2 $0xe4,%zmm31,%zmm28,%zmm30\n     e71:\t62 a3 ad 48 23 d9 e4 \tvshuff64x2 $0xe4,%zmm17,%zmm10,%zmm19\n     e78:\t62 51 7c 48 28 d3    \tvmovaps %zmm11,%zmm10\n     e7e:\t62 52 25 48 a8 d0    \tvfmadd213ps %zmm8,%zmm11,%zmm10\n     e84:\t62 52 1d 48 b8 d4    \tvfmadd231ps %zmm12,%zmm12,%zmm10\n     e8a:\t62 32 5d 40 b8 d4    \tvfmadd231ps %zmm20,%zmm20,%zmm10\n     e90:\t62 c2 7d 48 4e ca    \tvrsqrt14ps %zmm10,%zmm17\n     e96:\t62 31 2c 48 59 d1    \tvmulps %zmm17,%zmm10,%zmm10\n     e9c:\t62 12 75 40 a8 d2    \tvfmadd213ps %zmm26,%zmm17,%zmm10\n     ea2:\t62 a1 74 40 59 cf    \tvmulps %zmm23,%zmm17,%zmm17\n     ea8:\t62 d1 74 40 59 da    \tvmulps %zmm10,%zmm17,%zmm3\n     eae:\t62 71 7c 48 28 54 24 \tvmovaps 0x500(%rsp),%zmm10\n     eb5:\t14 \n     eb6:\t62 61 2c 48 5c cd    \tvsubps %zmm5,%zmm10,%zmm25\n     ebc:\t62 71 7c 48 28 54 24 \tvmovaps 0x200(%rsp),%zmm10\n     ec3:\t08 \n     ec4:\t62 81 7c 48 28 c9    \tvmovaps %zmm25,%zmm17\n     eca:\t62 c2 35 40 a8 c8    \tvfmadd213ps %zmm8,%zmm25,%zmm17\n     ed0:\t62 e2 6d 48 b8 ca    \tvfmadd231ps %zmm2,%zmm2,%zmm17\n     ed6:\t62 d1 2c 48 5c c1    \tvsubps %zmm9,%zmm10,%zmm0\n     edc:\t62 71 7c 48 28 54 24 \tvmovaps 0x2c0(%rsp),%zmm10\n     ee3:\t0b \n     ee4:\t62 e2 7d 48 b8 c8    \tvfmadd231ps %zmm0,%zmm0,%zmm17\n     eea:\t62 f1 7c 48 29 44 24 \tvmovaps %zmm0,0x5c0(%rsp)\n     ef1:\t17 \n     ef2:\t62 d1 4c 40 5c c1    \tvsubps %zmm9,%zmm22,%zmm0\n     ef8:\t62 22 7d 48 4e c1    \tvrsqrt14ps %zmm17,%zmm24\n     efe:\t62 f1 7c 48 29 44 24 \tvmovaps %zmm0,0x200(%rsp)\n     f05:\t08 \n     f06:\t62 81 74 40 59 c8    \tvmulps %zmm24,%zmm17,%zmm17\n     f0c:\t62 82 3d 40 a8 ca    \tvfmadd213ps %zmm26,%zmm24,%zmm17\n     f12:\t62 21 3c 40 59 c7    \tvmulps %zmm23,%zmm24,%zmm24\n     f18:\t62 b1 3c 40 59 d1    \tvmulps %zmm17,%zmm24,%zmm2\n     f1e:\t62 e1 7c 48 28 4c 24 \tvmovaps 0x380(%rsp),%zmm17\n     f25:\t0e \n     f26:\t62 71 2c 48 5c d7    \tvsubps %zmm7,%zmm10,%zmm10\n     f2c:\t62 71 7c 48 29 54 24 \tvmovaps %zmm10,0x380(%rsp)\n     f33:\t0e \n     f34:\t62 61 74 40 5c c5    \tvsubps %zmm5,%zmm17,%zmm24\n     f3a:\t62 81 7c 48 28 c8    \tvmovaps %zmm24,%zmm17\n     f40:\t62 c2 3d 40 a8 c8    \tvfmadd213ps %zmm8,%zmm24,%zmm17\n     f46:\t62 c2 2d 48 b8 ca    \tvfmadd231ps %zmm10,%zmm10,%zmm17\n     f4c:\t62 71 7c 48 28 54 24 \tvmovaps 0x340(%rsp),%zmm10\n     f53:\t0d \n     f54:\t62 e2 7d 48 b8 c8    \tvfmadd231ps %zmm0,%zmm0,%zmm17\n     f5a:\t62 a2 7d 48 4e f1    \tvrsqrt14ps %zmm17,%zmm22\n     f60:\t62 a1 74 40 59 ce    \tvmulps %zmm22,%zmm17,%zmm17\n     f66:\t62 82 4d 40 a8 ca    \tvfmadd213ps %zmm26,%zmm22,%zmm17\n     f6c:\t62 a1 4c 40 59 f7    \tvmulps %zmm23,%zmm22,%zmm22\n     f72:\t62 a1 4c 40 59 f1    \tvmulps %zmm17,%zmm22,%zmm22\n     f78:\t62 e1 7c 48 28 4c 24 \tvmovaps 0x3c0(%rsp),%zmm17\n     f7f:\t0f \n     f80:\t62 71 2c 48 5c d7    \tvsubps %zmm7,%zmm10,%zmm10\n     f86:\t62 71 7c 48 29 54 24 \tvmovaps %zmm10,0x340(%rsp)\n     f8d:\t0d \n     f8e:\t62 e1 74 40 5c cd    \tvsubps %zmm5,%zmm17,%zmm17\n     f94:\t62 21 7c 48 28 e9    \tvmovaps %zmm17,%zmm29\n     f9a:\t62 42 75 40 a8 e8    \tvfmadd213ps %zmm8,%zmm17,%zmm29\n     fa0:\t62 42 2d 48 b8 ea    \tvfmadd231ps %zmm10,%zmm10,%zmm29\n     fa6:\t62 71 64 48 59 54 24 \tvmulps 0x280(%rsp),%zmm3,%zmm10\n     fad:\t0a \n     fae:\t62 f1 64 48 59 db    \tvmulps %zmm3,%zmm3,%zmm3\n     fb4:\t62 22 7d 40 b8 e8    \tvfmadd231ps %zmm16,%zmm16,%zmm29\n     fba:\t62 92 7d 48 4e c5    \tvrsqrt14ps %zmm29,%zmm0\n     fc0:\t62 e1 14 40 59 c0    \tvmulps %zmm0,%zmm29,%zmm16\n     fc6:\t62 61 7c 48 28 e9    \tvmovaps %zmm1,%zmm29\n     fcc:\t62 92 05 48 7f cb    \tvpermt2ps %zmm27,%zmm15,%zmm1\n     fd2:\t62 82 7d 48 a8 c2    \tvfmadd213ps %zmm26,%zmm0,%zmm16\n     fd8:\t62 b1 7c 48 59 c7    \tvmulps %zmm23,%zmm0,%zmm0\n     fde:\t62 d1 64 48 59 da    \tvmulps %zmm10,%zmm3,%zmm3\n     fe4:\t62 02 0d 48 7f eb    \tvpermt2ps %zmm27,%zmm14,%zmm29\n     fea:\t62 a1 7c 48 59 c0    \tvmulps %zmm16,%zmm0,%zmm16\n     ff0:\t62 f1 7c 48 28 c6    \tvmovaps %zmm6,%zmm0\n     ff6:\t62 f2 05 48 7f f4    \tvpermt2ps %zmm4,%zmm15,%zmm6\n     ffc:\t62 f2 0d 48 7f c4    \tvpermt2ps %zmm4,%zmm14,%zmm0\n    1002:\t62 f1 6c 48 59 64 24 \tvmulps 0x480(%rsp),%zmm2,%zmm4\n    1009:\t12 \n    100a:\t62 f1 6c 48 59 d2    \tvmulps %zmm2,%zmm2,%zmm2\n    1010:\t62 61 6c 48 59 e4    \tvmulps %zmm4,%zmm2,%zmm28\n    1016:\t62 f1 4c 40 59 54 24 \tvmulps 0x300(%rsp),%zmm22,%zmm2\n    101d:\t0c \n    101e:\t62 73 f5 48 23 d6 e4 \tvshuff64x2 $0xe4,%zmm6,%zmm1,%zmm10\n    1025:\t62 b1 4c 40 59 ce    \tvmulps %zmm22,%zmm22,%zmm1\n    102b:\t62 d1 64 40 5c f1    \tvsubps %zmm9,%zmm19,%zmm6\n    1031:\t62 e1 7c 40 59 5c 24 \tvmulps 0x180(%rsp),%zmm16,%zmm19\n    1038:\t06 \n    1039:\t62 a1 7c 40 59 c0    \tvmulps %zmm16,%zmm16,%zmm16\n    103f:\t62 63 95 40 23 e8 e4 \tvshuff64x2 $0xe4,%zmm0,%zmm29,%zmm29\n    1046:\t62 f1 7c 48 28 44 24 \tvmovaps 0x880(%rsp),%zmm0\n    104d:\t22 \n    104e:\t62 f1 7c 48 29 74 24 \tvmovaps %zmm6,0x280(%rsp)\n    1055:\t0a \n    1056:\t62 e1 74 48 59 f2    \tvmulps %zmm2,%zmm1,%zmm22\n    105c:\t62 f1 7c 48 28 54 24 \tvmovaps 0x540(%rsp),%zmm2\n    1063:\t15 \n    1064:\t62 91 7c 48 28 4c 2e \tvmovaps 0xd80(%r14,%r13,1),%zmm1\n    106b:\t36 \n    106c:\t62 51 14 40 5c c9    \tvsubps %zmm9,%zmm29,%zmm9\n    1072:\t62 71 7c 48 29 4c 24 \tvmovaps %zmm9,0x4c0(%rsp)\n    1079:\t13 \n    107a:\t62 61 7c 48 5c dd    \tvsubps %zmm5,%zmm0,%zmm27\n    1080:\t62 f1 7c 48 28 c7    \tvmovaps %zmm7,%zmm0\n    1086:\t62 f1 6c 48 5c e5    \tvsubps %zmm5,%zmm2,%zmm4\n    108c:\t62 f1 7c 48 28 54 24 \tvmovaps 0x8c0(%rsp),%zmm2\n    1093:\t23 \n    1094:\t62 91 7c 48 28 eb    \tvmovaps %zmm27,%zmm5\n    109a:\t62 61 7c 48 29 5c 24 \tvmovaps %zmm27,0x400(%rsp)\n    10a1:\t10 \n    10a2:\t62 f1 7c 48 29 64 24 \tvmovaps %zmm4,0x3c0(%rsp)\n    10a9:\t0f \n    10aa:\t62 d2 5d 48 a8 e0    \tvfmadd213ps %zmm8,%zmm4,%zmm4\n    10b0:\t62 d2 25 40 a8 e8    \tvfmadd213ps %zmm8,%zmm27,%zmm5\n    10b6:\t62 21 7c 40 59 db    \tvmulps %zmm19,%zmm16,%zmm27\n    10bc:\t62 11 7c 48 28 44 2e \tvmovaps 0xcc0(%r14,%r13,1),%zmm8\n    10c3:\t33 \n    10c4:\t62 81 7c 48 28 44 2e \tvmovaps 0xc40(%r14,%r13,1),%zmm16\n    10cb:\t31 \n    10cc:\t62 f1 6c 48 5c ff    \tvsubps %zmm7,%zmm2,%zmm7\n    10d2:\t62 f1 54 40 5c d0    \tvsubps %zmm0,%zmm21,%zmm2\n    10d8:\t62 e1 7c 48 28 e8    \tvmovaps %zmm0,%zmm21\n    10de:\t62 f1 7c 48 28 44 24 \tvmovaps 0x80(%rsp),%zmm0\n    10e5:\t02 \n    10e6:\t62 f2 45 48 b8 e7    \tvfmadd231ps %zmm7,%zmm7,%zmm4\n    10ec:\t62 f1 7c 48 29 7c 24 \tvmovaps %zmm7,0x480(%rsp)\n    10f3:\t12 \n    10f4:\t62 f2 6d 48 b8 ea    \tvfmadd231ps %zmm2,%zmm2,%zmm5\n    10fa:\t62 f1 7c 48 29 54 24 \tvmovaps %zmm2,0x500(%rsp)\n    1101:\t14 \n    1102:\t62 91 7c 48 28 54 2e \tvmovaps 0xc00(%r14,%r13,1),%zmm2\n    1109:\t30 \n    110a:\t62 f2 4d 48 b8 e6    \tvfmadd231ps %zmm6,%zmm6,%zmm4\n    1110:\t62 d2 35 48 b8 e9    \tvfmadd231ps %zmm9,%zmm9,%zmm5\n    1116:\t62 91 7c 48 28 74 2e \tvmovaps 0xdc0(%r14,%r13,1),%zmm6\n    111d:\t37 \n    111e:\t62 f2 7d 48 4e fc    \tvrsqrt14ps %zmm4,%zmm7\n    1124:\t62 f1 5c 48 59 e7    \tvmulps %zmm7,%zmm4,%zmm4\n    112a:\t62 92 45 48 a8 e2    \tvfmadd213ps %zmm26,%zmm7,%zmm4\n    1130:\t62 b1 44 48 59 ff    \tvmulps %zmm23,%zmm7,%zmm7\n    1136:\t62 f1 44 48 59 e4    \tvmulps %zmm4,%zmm7,%zmm4\n    113c:\t62 f2 7d 48 4e fd    \tvrsqrt14ps %zmm5,%zmm7\n    1142:\t62 f1 54 48 59 ef    \tvmulps %zmm7,%zmm5,%zmm5\n    1148:\t62 d2 65 48 b8 c3    \tvfmadd231ps %zmm11,%zmm3,%zmm0\n    114e:\t62 92 45 48 a8 ea    \tvfmadd213ps %zmm26,%zmm7,%zmm5\n    1154:\t62 b1 44 48 59 ff    \tvmulps %zmm23,%zmm7,%zmm7\n    115a:\t62 71 7c 48 28 d9    \tvmovaps %zmm1,%zmm11\n    1160:\t62 f1 7c 48 29 44 24 \tvmovaps %zmm0,0x80(%rsp)\n    1167:\t02 \n    1168:\t62 f1 7c 48 28 44 24 \tvmovaps 0xc0(%rsp),%zmm0\n    116f:\t03 \n    1170:\t62 71 44 48 59 cd    \tvmulps %zmm5,%zmm7,%zmm9\n    1176:\t62 91 7c 48 28 6c 2e \tvmovaps 0xc80(%r14,%r13,1),%zmm5\n    117d:\t32 \n    117e:\t62 f1 0c 40 59 fc    \tvmulps %zmm4,%zmm30,%zmm7\n    1184:\t62 61 7c 48 28 ea    \tvmovaps %zmm2,%zmm29\n    118a:\t62 f1 5c 48 59 e4    \tvmulps %zmm4,%zmm4,%zmm4\n    1190:\t62 61 7c 48 28 f2    \tvmovaps %zmm2,%zmm30\n    1196:\t62 61 5c 48 59 ff    \tvmulps %zmm7,%zmm4,%zmm31\n    119c:\t62 91 7c 48 28 64 2e \tvmovaps 0xd40(%r14,%r13,1),%zmm4\n    11a3:\t35 \n    11a4:\t62 22 15 48 7f e8    \tvpermt2ps %zmm16,%zmm13,%zmm29\n    11aa:\t62 22 6d 40 7f f0    \tvpermt2ps %zmm16,%zmm18,%zmm30\n    11b0:\t62 72 15 48 7f de    \tvpermt2ps %zmm6,%zmm13,%zmm11\n    11b6:\t62 51 2c 48 59 d1    \tvmulps %zmm9,%zmm10,%zmm10\n    11bc:\t62 51 34 48 59 c9    \tvmulps %zmm9,%zmm9,%zmm9\n    11c2:\t62 f1 7c 48 28 f9    \tvmovaps %zmm1,%zmm7\n    11c8:\t62 f2 6d 40 7f fe    \tvpermt2ps %zmm6,%zmm18,%zmm7\n    11ce:\t62 d2 65 48 b8 c4    \tvfmadd231ps %zmm12,%zmm3,%zmm0\n    11d4:\t62 71 7c 48 28 64 24 \tvmovaps 0x100(%rsp),%zmm12\n    11db:\t04 \n    11dc:\t62 f1 7c 48 29 44 24 \tvmovaps %zmm0,0xc0(%rsp)\n    11e3:\t03 \n    11e4:\t62 91 7c 48 28 44 2e \tvmovaps 0xd00(%r14,%r13,1),%zmm0\n    11eb:\t34 \n    11ec:\t62 32 65 48 b8 e4    \tvfmadd231ps %zmm20,%zmm3,%zmm12\n    11f2:\t62 e1 7c 48 28 e5    \tvmovaps %zmm5,%zmm20\n    11f8:\t62 f1 7c 48 28 dd    \tvmovaps %zmm5,%zmm3\n    11fe:\t62 c2 15 48 7f e0    \tvpermt2ps %zmm8,%zmm13,%zmm20\n    1204:\t62 d2 6d 40 7f d8    \tvpermt2ps %zmm8,%zmm18,%zmm3\n    120a:\t62 e1 7c 48 28 d8    \tvmovaps %zmm0,%zmm19\n    1210:\t62 72 4d 40 b8 64 24 \tvfmadd231ps 0x200(%rsp),%zmm22,%zmm12\n    1217:\t08 \n    1218:\t62 e2 6d 40 7f dc    \tvpermt2ps %zmm4,%zmm18,%zmm19\n    121e:\t62 72 05 40 b8 64 24 \tvfmadd231ps 0x280(%rsp),%zmm31,%zmm12\n    1225:\t0a \n    1226:\t62 a3 95 40 23 e4 e4 \tvshuff64x2 $0xe4,%zmm20,%zmm29,%zmm20\n    122d:\t62 61 7c 48 28 e8    \tvmovaps %zmm0,%zmm29\n    1233:\t62 63 8d 40 23 f3 e4 \tvshuff64x2 $0xe4,%zmm3,%zmm30,%zmm30\n    123a:\t62 d1 34 48 59 da    \tvmulps %zmm10,%zmm9,%zmm3\n    1240:\t62 71 7c 48 28 d2    \tvmovaps %zmm2,%zmm10\n    1246:\t62 b2 05 48 7f d0    \tvpermt2ps %zmm16,%zmm15,%zmm2\n    124c:\t62 62 15 48 7f ec    \tvpermt2ps %zmm4,%zmm13,%zmm29\n    1252:\t62 f1 7c 48 29 5c 24 \tvmovaps %zmm3,0x540(%rsp)\n    1259:\t15 \n    125a:\t62 f1 7c 48 28 dd    \tvmovaps %zmm5,%zmm3\n    1260:\t62 32 0d 48 7f d0    \tvpermt2ps %zmm16,%zmm14,%zmm10\n    1266:\t62 e1 7c 48 28 44 24 \tvmovaps 0x700(%rsp),%zmm16\n    126d:\t1c \n    126e:\t62 d2 05 48 7f e8    \tvpermt2ps %zmm8,%zmm15,%zmm5\n    1274:\t62 f3 e5 40 23 ff e4 \tvshuff64x2 $0xe4,%zmm7,%zmm19,%zmm7\n    127b:\t62 71 7c 48 29 64 24 \tvmovaps %zmm12,0x100(%rsp)\n    1282:\t04 \n    1283:\t62 d2 0d 48 7f d8    \tvpermt2ps %zmm8,%zmm14,%zmm3\n    1289:\t62 71 7c 48 28 c1    \tvmovaps %zmm1,%zmm8\n    128f:\t62 f2 05 48 7f ce    \tvpermt2ps %zmm6,%zmm15,%zmm1\n    1295:\t62 72 0d 48 7f c6    \tvpermt2ps %zmm6,%zmm14,%zmm8\n    129b:\t62 43 95 40 23 eb e4 \tvshuff64x2 $0xe4,%zmm11,%zmm29,%zmm29\n    12a2:\t62 71 7c 48 28 d8    \tvmovaps %zmm0,%zmm11\n    12a8:\t62 f2 05 48 7f c4    \tvpermt2ps %zmm4,%zmm15,%zmm0\n    12ae:\t62 f3 ed 48 23 ed e4 \tvshuff64x2 $0xe4,%zmm5,%zmm2,%zmm5\n    12b5:\t62 b1 7c 48 28 d5    \tvmovaps %zmm21,%zmm2\n    12bb:\t62 73 ad 48 23 cb e4 \tvshuff64x2 $0xe4,%zmm3,%zmm10,%zmm9\n    12c2:\t62 f1 7c 48 28 5c 24 \tvmovaps 0x240(%rsp),%zmm3\n    12c9:\t09 \n    12ca:\t62 b1 0c 40 5c f0    \tvsubps %zmm16,%zmm30,%zmm6\n    12d0:\t62 21 7c 48 28 f0    \tvmovaps %zmm16,%zmm30\n    12d6:\t62 72 0d 48 7f dc    \tvpermt2ps %zmm4,%zmm14,%zmm11\n    12dc:\t62 f1 7c 48 28 64 24 \tvmovaps 0x80(%rsp),%zmm4\n    12e3:\t02 \n    12e4:\t62 71 14 40 5c d2    \tvsubps %zmm2,%zmm29,%zmm10\n    12ea:\t62 71 7c 48 29 54 24 \tvmovaps %zmm10,0x180(%rsp)\n    12f1:\t06 \n    12f2:\t62 e3 fd 48 23 d9 e4 \tvshuff64x2 $0xe4,%zmm1,%zmm0,%zmm19\n    12f9:\t62 b1 44 48 5c c0    \tvsubps %zmm16,%zmm7,%zmm0\n    12ff:\t62 e1 7c 48 28 44 24 \tvmovaps 0x600(%rsp),%zmm16\n    1306:\t18 \n    1307:\t62 53 a5 48 23 c0 e4 \tvshuff64x2 $0xe4,%zmm8,%zmm11,%zmm8\n    130e:\t62 71 7c 48 28 5c 24 \tvmovaps 0x140(%rsp),%zmm11\n    1315:\t05 \n    1316:\t62 f1 7c 48 29 44 24 \tvmovaps %zmm0,0x2c0(%rsp)\n    131d:\t0b \n    131e:\t62 92 1d 40 b8 d9    \tvfmadd231ps %zmm25,%zmm28,%zmm3\n    1324:\t62 21 5c 40 5c cd    \tvsubps %zmm21,%zmm20,%zmm25\n    132a:\t62 e1 7c 48 28 64 24 \tvmovaps 0x680(%rsp),%zmm20\n    1331:\t1a \n    1332:\t62 92 4d 40 b8 e0    \tvfmadd231ps %zmm24,%zmm22,%zmm4\n    1338:\t62 e1 7c 48 28 6c 24 \tvmovaps 0x1c0(%rsp),%zmm21\n    133f:\t07 \n    1340:\t62 01 7c 48 28 44 2e \tvmovaps 0xfc0(%r14,%r13,1),%zmm24\n    1347:\t3f \n    1348:\t62 b2 25 40 b8 d9    \tvfmadd231ps %zmm17,%zmm27,%zmm3\n    134e:\t62 61 7c 48 29 4c 24 \tvmovaps %zmm25,0x300(%rsp)\n    1355:\t0c \n    1356:\t62 f2 05 40 b8 64 24 \tvfmadd231ps 0x3c0(%rsp),%zmm31,%zmm4\n    135d:\t0f \n    135e:\t62 f1 7c 48 29 64 24 \tvmovaps %zmm4,0x80(%rsp)\n    1365:\t02 \n    1366:\t62 b2 7d 48 a8 c0    \tvfmadd213ps %zmm16,%zmm0,%zmm0\n    136c:\t62 d2 2d 48 b8 c2    \tvfmadd231ps %zmm10,%zmm10,%zmm0\n    1372:\t62 71 7c 48 28 d3    \tvmovaps %zmm3,%zmm10\n    1378:\t62 f1 7c 48 28 5c 24 \tvmovaps 0xc0(%rsp),%zmm3\n    137f:\t03 \n    1380:\t62 72 1d 40 b8 5c 24 \tvfmadd231ps 0x580(%rsp),%zmm28,%zmm11\n    1387:\t16 \n    1388:\t62 f1 7c 48 29 74 24 \tvmovaps %zmm6,0x580(%rsp)\n    138f:\t16 \n    1390:\t62 b2 4d 48 a8 f0    \tvfmadd213ps %zmm16,%zmm6,%zmm6\n    1396:\t62 b1 34 48 5c fc    \tvsubps %zmm20,%zmm9,%zmm7\n    139c:\t62 92 35 40 b8 f1    \tvfmadd231ps %zmm25,%zmm25,%zmm6\n    13a2:\t62 01 7c 48 28 4c 2e \tvmovaps 0xf80(%r14,%r13,1),%zmm25\n    13a9:\t3e \n    13aa:\t62 31 3c 48 5c c4    \tvsubps %zmm20,%zmm8,%zmm8\n    13b0:\t62 11 7c 48 28 4c 2e \tvmovaps 0xf40(%r14,%r13,1),%zmm9\n    13b7:\t3d \n    13b8:\t62 e2 1d 40 b8 6c 24 \tvfmadd231ps 0x5c0(%rsp),%zmm28,%zmm21\n    13bf:\t17 \n    13c0:\t62 f2 45 48 b8 f7    \tvfmadd231ps %zmm7,%zmm7,%zmm6\n    13c6:\t62 f1 7c 48 29 7c 24 \tvmovaps %zmm7,0x240(%rsp)\n    13cd:\t09 \n    13ce:\t62 d2 3d 48 b8 c0    \tvfmadd231ps %zmm8,%zmm8,%zmm0\n    13d4:\t62 71 7c 48 29 44 24 \tvmovaps %zmm8,0x1c0(%rsp)\n    13db:\t07 \n    13dc:\t62 f2 7d 48 4e fe    \tvrsqrt14ps %zmm6,%zmm7\n    13e2:\t62 f2 7d 48 4e d0    \tvrsqrt14ps %zmm0,%zmm2\n    13e8:\t62 f1 4c 48 59 cf    \tvmulps %zmm7,%zmm6,%zmm1\n    13ee:\t62 f1 7c 48 59 f2    \tvmulps %zmm2,%zmm0,%zmm6\n    13f4:\t62 72 25 40 b8 5c 24 \tvfmadd231ps 0x340(%rsp),%zmm27,%zmm11\n    13fb:\t0d \n    13fc:\t62 92 45 48 a8 ca    \tvfmadd213ps %zmm26,%zmm7,%zmm1\n    1402:\t62 b1 44 48 59 ff    \tvmulps %zmm23,%zmm7,%zmm7\n    1408:\t62 92 6d 48 a8 f2    \tvfmadd213ps %zmm26,%zmm2,%zmm6\n    140e:\t62 e2 25 40 b8 6c 24 \tvfmadd231ps 0x440(%rsp),%zmm27,%zmm21\n    1415:\t11 \n    1416:\t62 f1 44 48 59 c9    \tvmulps %zmm1,%zmm7,%zmm1\n    141c:\t62 b1 6c 48 59 ff    \tvmulps %zmm23,%zmm2,%zmm7\n    1422:\t62 f1 44 48 59 f6    \tvmulps %zmm6,%zmm7,%zmm6\n    1428:\t62 f1 74 48 59 c1    \tvmulps %zmm1,%zmm1,%zmm0\n    142e:\t62 f1 54 48 59 c9    \tvmulps %zmm1,%zmm5,%zmm1\n    1434:\t62 91 7c 48 28 7c 2e \tvmovaps 0xe80(%r14,%r13,1),%zmm7\n    143b:\t3a \n    143c:\t62 91 7c 48 28 6c 2e \tvmovaps 0xe00(%r14,%r13,1),%zmm5\n    1443:\t38 \n    1444:\t62 f2 4d 40 b8 5c 24 \tvfmadd231ps 0x380(%rsp),%zmm22,%zmm3\n    144b:\t0e \n    144c:\t62 81 7c 48 28 74 2e \tvmovaps 0xf00(%r14,%r13,1),%zmm22\n    1453:\t3c \n    1454:\t62 e1 7c 48 59 c9    \tvmulps %zmm1,%zmm0,%zmm17\n    145a:\t62 f1 64 40 59 c6    \tvmulps %zmm6,%zmm19,%zmm0\n    1460:\t62 71 4c 48 59 c6    \tvmulps %zmm6,%zmm6,%zmm8\n    1466:\t62 91 7c 48 28 4c 2e \tvmovaps 0xe40(%r14,%r13,1),%zmm1\n    146d:\t39 \n    146e:\t62 91 7c 48 28 74 2e \tvmovaps 0xec0(%r14,%r13,1),%zmm6\n    1475:\t3b \n    1476:\t62 71 7c 48 29 5c 24 \tvmovaps %zmm11,0x140(%rsp)\n    147d:\t05 \n    147e:\t62 f1 3c 48 59 d0    \tvmulps %zmm0,%zmm8,%zmm2\n    1484:\t62 11 7c 48 28 c1    \tvmovaps %zmm25,%zmm8\n    148a:\t62 12 6d 40 7f c0    \tvpermt2ps %zmm24,%zmm18,%zmm8\n    1490:\t62 f2 05 40 b8 5c 24 \tvfmadd231ps 0x480(%rsp),%zmm31,%zmm3\n    1497:\t12 \n    1498:\t62 61 7c 48 28 df    \tvmovaps %zmm7,%zmm27\n    149e:\t62 f1 7c 48 28 c5    \tvmovaps %zmm5,%zmm0\n    14a4:\t62 71 7c 48 28 df    \tvmovaps %zmm7,%zmm11\n    14aa:\t62 71 7c 48 28 e7    \tvmovaps %zmm7,%zmm12\n    14b0:\t62 e1 7c 48 28 dd    \tvmovaps %zmm5,%zmm19\n    14b6:\t62 21 7c 48 28 e6    \tvmovaps %zmm22,%zmm28\n    14bc:\t62 21 7c 48 28 ee    \tvmovaps %zmm22,%zmm29\n    14c2:\t62 62 6d 40 7f de    \tvpermt2ps %zmm6,%zmm18,%zmm27\n    14c8:\t62 f2 6d 40 7f c1    \tvpermt2ps %zmm1,%zmm18,%zmm0\n    14ce:\t62 72 15 48 7f de    \tvpermt2ps %zmm6,%zmm13,%zmm11\n    14d4:\t62 72 0d 48 7f e6    \tvpermt2ps %zmm6,%zmm14,%zmm12\n    14da:\t62 e2 0d 48 7f d9    \tvpermt2ps %zmm1,%zmm14,%zmm19\n    14e0:\t62 f2 05 48 7f fe    \tvpermt2ps %zmm6,%zmm15,%zmm7\n    14e6:\t62 42 6d 40 7f e1    \tvpermt2ps %zmm9,%zmm18,%zmm28\n    14ec:\t62 81 7c 48 28 d1    \tvmovaps %zmm25,%zmm18\n    14f2:\t62 42 15 48 7f e9    \tvpermt2ps %zmm9,%zmm13,%zmm29\n    14f8:\t62 82 15 48 7f d0    \tvpermt2ps %zmm24,%zmm13,%zmm18\n    14fe:\t62 93 fd 48 23 e3 e4 \tvshuff64x2 $0xe4,%zmm27,%zmm0,%zmm4\n    1505:\t62 f1 7c 48 28 44 24 \tvmovaps 0x540(%rsp),%zmm0\n    150c:\t15 \n    150d:\t62 53 e5 40 23 e4 e4 \tvshuff64x2 $0xe4,%zmm12,%zmm19,%zmm12\n    1514:\t62 81 7c 48 28 d9    \tvmovaps %zmm25,%zmm19\n    151a:\t62 02 05 48 7f c8    \tvpermt2ps %zmm24,%zmm15,%zmm25\n    1520:\t62 53 9d 40 23 c0 e4 \tvshuff64x2 $0xe4,%zmm8,%zmm28,%zmm8\n    1527:\t62 61 7c 48 28 e5    \tvmovaps %zmm5,%zmm28\n    152d:\t62 f2 05 48 7f e9    \tvpermt2ps %zmm1,%zmm15,%zmm5\n    1533:\t62 82 0d 48 7f d8    \tvpermt2ps %zmm24,%zmm14,%zmm19\n    1539:\t62 62 15 48 7f e1    \tvpermt2ps %zmm1,%zmm13,%zmm28\n    153f:\t62 72 7d 48 b8 54 24 \tvfmadd231ps 0x400(%rsp),%zmm0,%zmm10\n    1546:\t10 \n    1547:\t62 e2 7d 48 b8 6c 24 \tvfmadd231ps 0x4c0(%rsp),%zmm0,%zmm21\n    154e:\t13 \n    154f:\t62 f3 d5 48 23 cf e4 \tvshuff64x2 $0xe4,%zmm7,%zmm5,%zmm1\n    1556:\t62 91 3c 48 5c ee    \tvsubps %zmm30,%zmm8,%zmm5\n    155c:\t62 31 1c 48 5c c4    \tvsubps %zmm20,%zmm12,%zmm8\n    1562:\t62 43 9d 40 23 db e4 \tvshuff64x2 $0xe4,%zmm11,%zmm28,%zmm27\n    1569:\t62 23 95 40 23 e2 e4 \tvshuff64x2 $0xe4,%zmm18,%zmm29,%zmm28\n    1570:\t62 e1 7c 48 28 54 24 \tvmovaps 0x140(%rsp),%zmm18\n    1577:\t05 \n    1578:\t62 71 7c 48 28 db    \tvmovaps %zmm3,%zmm11\n    157e:\t62 b1 7c 48 28 de    \tvmovaps %zmm22,%zmm3\n    1584:\t62 c2 05 48 7f f1    \tvpermt2ps %zmm9,%zmm15,%zmm22\n    158a:\t62 d2 0d 48 7f d9    \tvpermt2ps %zmm9,%zmm14,%zmm3\n    1590:\t62 71 7c 48 28 74 24 \tvmovaps 0x80(%rsp),%zmm14\n    1597:\t02 \n    1598:\t62 51 7c 48 28 eb    \tvmovaps %zmm11,%zmm13\n    159e:\t62 71 7c 48 28 5c 24 \tvmovaps 0x100(%rsp),%zmm11\n    15a5:\t04 \n    15a6:\t62 72 75 40 b8 6c 24 \tvfmadd231ps 0x300(%rsp),%zmm17,%zmm13\n    15ad:\t0c \n    15ae:\t62 72 6d 48 b8 54 24 \tvfmadd231ps 0x2c0(%rsp),%zmm2,%zmm10\n    15b5:\t0b \n    15b6:\t62 e2 6d 48 b8 6c 24 \tvfmadd231ps 0x1c0(%rsp),%zmm2,%zmm21\n    15bd:\t07 \n    15be:\t62 e2 7d 48 b8 54 24 \tvfmadd231ps 0x500(%rsp),%zmm0,%zmm18\n    15c5:\t14 \n    15c6:\t62 91 5c 48 5c c6    \tvsubps %zmm30,%zmm4,%zmm0\n    15cc:\t62 f1 7c 48 28 64 24 \tvmovaps 0x6c0(%rsp),%zmm4\n    15d3:\t1b \n    15d4:\t62 a3 e5 48 23 db e4 \tvshuff64x2 $0xe4,%zmm19,%zmm3,%zmm19\n    15db:\t62 93 cd 40 23 d9 e4 \tvshuff64x2 $0xe4,%zmm25,%zmm22,%zmm3\n    15e2:\t62 71 7c 48 28 e0    \tvmovaps %zmm0,%zmm12\n    15e8:\t62 72 75 40 b8 74 24 \tvfmadd231ps 0x580(%rsp),%zmm17,%zmm14\n    15ef:\t16 \n    15f0:\t62 72 75 40 b8 5c 24 \tvfmadd231ps 0x240(%rsp),%zmm17,%zmm11\n    15f7:\t09 \n    15f8:\t62 32 7d 48 a8 e0    \tvfmadd213ps %zmm16,%zmm0,%zmm12\n    15fe:\t62 31 64 40 5c cc    \tvsubps %zmm20,%zmm19,%zmm9\n    1604:\t62 e2 6d 48 b8 54 24 \tvfmadd231ps 0x180(%rsp),%zmm2,%zmm18\n    160b:\t06 \n    160c:\t62 f1 24 40 5c f4    \tvsubps %zmm4,%zmm27,%zmm6\n    1612:\t62 f1 1c 40 5c fc    \tvsubps %zmm4,%zmm28,%zmm7\n    1618:\t62 f1 7c 48 28 e5    \tvmovaps %zmm5,%zmm4\n    161e:\t62 b2 55 48 a8 e0    \tvfmadd213ps %zmm16,%zmm5,%zmm4\n    1624:\t62 72 4d 48 b8 e6    \tvfmadd231ps %zmm6,%zmm6,%zmm12\n    162a:\t62 f2 45 48 b8 e7    \tvfmadd231ps %zmm7,%zmm7,%zmm4\n    1630:\t62 52 3d 48 b8 e0    \tvfmadd231ps %zmm8,%zmm8,%zmm12\n    1636:\t62 d2 35 48 b8 e1    \tvfmadd231ps %zmm9,%zmm9,%zmm4\n    163c:\t62 c2 7d 48 4e e4    \tvrsqrt14ps %zmm12,%zmm20\n    1642:\t62 e2 7d 48 4e c4    \tvrsqrt14ps %zmm4,%zmm16\n    1648:\t62 31 1c 48 59 e4    \tvmulps %zmm20,%zmm12,%zmm12\n    164e:\t62 a1 5c 48 59 d8    \tvmulps %zmm16,%zmm4,%zmm19\n    1654:\t62 12 5d 40 a8 e2    \tvfmadd213ps %zmm26,%zmm20,%zmm12\n    165a:\t62 a1 5c 40 59 e7    \tvmulps %zmm23,%zmm20,%zmm20\n    1660:\t62 b1 7c 40 59 e7    \tvmulps %zmm23,%zmm16,%zmm4\n    1666:\t62 82 7d 40 a8 da    \tvfmadd213ps %zmm26,%zmm16,%zmm19\n    166c:\t62 51 5c 40 59 e4    \tvmulps %zmm12,%zmm20,%zmm12\n    1672:\t62 b1 5c 48 59 e3    \tvmulps %zmm19,%zmm4,%zmm4\n    1678:\t62 c1 1c 48 59 c4    \tvmulps %zmm12,%zmm12,%zmm16\n    167e:\t62 d1 74 48 59 cc    \tvmulps %zmm12,%zmm1,%zmm1\n    1684:\t62 f1 64 48 59 d4    \tvmulps %zmm4,%zmm3,%zmm2\n    168a:\t62 e1 5c 48 59 cc    \tvmulps %zmm4,%zmm4,%zmm17\n    1690:\t62 f1 7c 40 59 c9    \tvmulps %zmm1,%zmm16,%zmm1\n    1696:\t62 d1 7c 48 28 de    \tvmovaps %zmm14,%zmm3\n    169c:\t62 b1 7c 48 28 e2    \tvmovaps %zmm18,%zmm4\n    16a2:\t62 f1 74 40 59 d2    \tvmulps %zmm2,%zmm17,%zmm2\n    16a8:\t62 f2 75 48 b8 d8    \tvfmadd231ps %zmm0,%zmm1,%zmm3\n    16ae:\t62 72 6d 48 b8 d5    \tvfmadd231ps %zmm5,%zmm2,%zmm10\n    16b4:\t62 f2 6d 48 b8 e7    \tvfmadd231ps %zmm7,%zmm2,%zmm4\n    16ba:\t62 c2 6d 48 b8 e9    \tvfmadd231ps %zmm9,%zmm2,%zmm21\n    16c0:\t62 f1 2c 48 58 c3    \tvaddps %zmm3,%zmm10,%zmm0\n    16c6:\t62 d1 7c 48 28 dd    \tvmovaps %zmm13,%zmm3\n    16cc:\t62 f3 fd 48 1b c2 01 \tvextractf64x4 $0x1,%zmm0,%ymm2\n    16d3:\t62 f2 75 48 b8 de    \tvfmadd231ps %zmm6,%zmm1,%zmm3\n    16d9:\t62 f1 5c 48 58 db    \tvaddps %zmm3,%zmm4,%zmm3\n    16df:\t62 d1 7c 48 28 e3    \tvmovaps %zmm11,%zmm4\n    16e5:\t62 f1 7c 48 58 c2    \tvaddps %zmm2,%zmm0,%zmm0\n    16eb:\t62 d2 75 48 b8 e0    \tvfmadd231ps %zmm8,%zmm1,%zmm4\n    16f1:\tc4 e3 7d 19 c2 01    \tvextractf128 $0x1,%ymm0,%xmm2\n    16f7:\t62 f1 54 40 58 cc    \tvaddps %zmm4,%zmm21,%zmm1\n    16fd:\t62 f3 fd 48 1b dc 01 \tvextractf64x4 $0x1,%zmm3,%ymm4\n    1704:\tc5 f8 58 c2          \tvaddps %xmm2,%xmm0,%xmm0\n    1708:\t62 f1 64 48 58 dc    \tvaddps %zmm4,%zmm3,%zmm3\n    170e:\tc4 e3 7d 19 dc 01    \tvextractf128 $0x1,%ymm3,%xmm4\n    1714:\tc4 e3 79 05 d0 01    \tvpermilpd $0x1,%xmm0,%xmm2\n    171a:\tc5 e0 58 dc          \tvaddps %xmm4,%xmm3,%xmm3\n    171e:\tc5 f8 58 c2          \tvaddps %xmm2,%xmm0,%xmm0\n    1722:\tc5 fa 16 d0          \tvmovshdup %xmm0,%xmm2\n    1726:\tc5 fa 58 c2          \tvaddss %xmm2,%xmm0,%xmm0\n    172a:\tc5 f8 29 84 24 00 01 \tvmovaps %xmm0,0x100(%rsp)\n    1731:\t00 00 \n    1733:\tc4 e3 79 05 c3 01    \tvpermilpd $0x1,%xmm3,%xmm0\n    1739:\tc5 e0 58 c0          \tvaddps %xmm0,%xmm3,%xmm0\n    173d:\t62 f3 fd 48 1b cb 01 \tvextractf64x4 $0x1,%zmm1,%ymm3\n    1744:\t62 f1 74 48 58 cb    \tvaddps %zmm3,%zmm1,%zmm1\n    174a:\tc5 fa 16 d0          \tvmovshdup %xmm0,%xmm2\n    174e:\tc5 fa 58 c2          \tvaddss %xmm2,%xmm0,%xmm0\n    1752:\tc5 f8 29 84 24 c0 00 \tvmovaps %xmm0,0xc0(%rsp)\n    1759:\t00 00 \n    175b:\tc4 e3 7d 19 c8 01    \tvextractf128 $0x1,%ymm1,%xmm0\n    1761:\tc5 f0 58 c0          \tvaddps %xmm0,%xmm1,%xmm0\n    1765:\tc4 e3 79 05 c8 01    \tvpermilpd $0x1,%xmm0,%xmm1\n    176b:\tc5 f8 58 c1          \tvaddps %xmm1,%xmm0,%xmm0\n    176f:\tc5 fa 16 c8          \tvmovshdup %xmm0,%xmm1\n    1773:\tc5 fa 58 c1          \tvaddss %xmm1,%xmm0,%xmm0\n    1777:\tc5 f8 29 84 24 80 00 \tvmovaps %xmm0,0x80(%rsp)\n    177e:\t00 00 \n    1780:\tc5 f8 77             \tvzeroupper\n    1783:\t41 ff d4             \tcall   *%r12\n    1786:\t41 81 c7 00 01 00 00 \tadd    $0x100,%r15d\n    178d:\t41 39 df             \tcmp    %ebx,%r15d\n    1790:\t0f 82 5a ea ff ff    \tjb     1f0 \u003csimplified_nbody+0x1f0\u003e\n    1796:\teb 27                \tjmp    17bf \u003csimplified_nbody+0x17bf\u003e\n    1798:\tc5 f8 57 c0          \tvxorps %xmm0,%xmm0,%xmm0\n    179c:\tc5 f8 29 84 24 00 01 \tvmovaps %xmm0,0x100(%rsp)\n    17a3:\t00 00 \n    17a5:\tc5 f8 57 c0          \tvxorps %xmm0,%xmm0,%xmm0\n    17a9:\tc5 f8 29 84 24 c0 00 \tvmovaps %xmm0,0xc0(%rsp)\n    17b0:\t00 00 \n    17b2:\tc5 f8 57 c0          \tvxorps %xmm0,%xmm0,%xmm0\n    17b6:\tc5 f8 29 84 24 80 00 \tvmovaps %xmm0,0x80(%rsp)\n    17bd:\t00 00 \n    17bf:\t48 8b 44 24 50       \tmov    0x50(%rsp),%rax\n    17c4:\tc5 f8 28 94 24 00 01 \tvmovaps 0x100(%rsp),%xmm2\n    17cb:\t00 00 \n    17cd:\tc5 f8 28 9c 24 c0 00 \tvmovaps 0xc0(%rsp),%xmm3\n    17d4:\t00 00 \n    17d6:\tc5 f8 28 a4 24 80 00 \tvmovaps 0x80(%rsp),%xmm4\n    17dd:\t00 00 \n    17df:\t48 8b 4c 24 58       \tmov    0x58(%rsp),%rcx\n    17e4:\tc5 fa 10 00          \tvmovss (%rax),%xmm0\n    17e8:\t48 b8 00 00 00 00 00 \tmovabs $0x0,%rax\n    17ef:\t00 00 00 \n    17f2:\tc4 c1 7a 10 0c 06    \tvmovss (%r14,%rax,1),%xmm1\n    17f8:\t48 8b 44 24 68       \tmov    0x68(%rsp),%rax\n    17fd:\tc4 e2 79 a9 54 24 3c \tvfmadd213ss 0x3c(%rsp),%xmm0,%xmm2\n    1804:\tc4 e2 79 a9 5c 24 40 \tvfmadd213ss 0x40(%rsp),%xmm0,%xmm3\n    180b:\tc4 e2 79 a9 64 24 44 \tvfmadd213ss 0x44(%rsp),%xmm0,%xmm4\n    1812:\tc5 ea 59 d1          \tvmulss %xmm1,%xmm2,%xmm2\n    1816:\tc5 e2 59 d9          \tvmulss %xmm1,%xmm3,%xmm3\n    181a:\tc5 da 59 c9          \tvmulss %xmm1,%xmm4,%xmm1\n    181e:\tc5 fa 10 24 08       \tvmovss (%rax,%rcx,1),%xmm4\n    1823:\tc4 e2 69 b9 e0       \tvfmadd231ss %xmm0,%xmm2,%xmm4\n    1828:\tc5 fa 11 24 08       \tvmovss %xmm4,(%rax,%rcx,1)\n    182d:\tc5 fa 10 64 08 04    \tvmovss 0x4(%rax,%rcx,1),%xmm4\n    1833:\tc4 e2 61 b9 e0       \tvfmadd231ss %xmm0,%xmm3,%xmm4\n    1838:\tc5 fa 11 64 08 04    \tvmovss %xmm4,0x4(%rax,%rcx,1)\n    183e:\tc4 e2 71 a9 44 08 08 \tvfmadd213ss 0x8(%rax,%rcx,1),%xmm1,%xmm0\n    1845:\tc5 fa 11 44 08 08    \tvmovss %xmm0,0x8(%rax,%rcx,1)\n    184b:\t48 8b 4c 24 60       \tmov    0x60(%rsp),%rcx\n    1850:\tc5 fa 11 11          \tvmovss %xmm2,(%rcx)\n    1854:\tc5 fa 11 59 04       \tvmovss %xmm3,0x4(%rcx)\n    1859:\tc5 fa 11 49 08       \tvmovss %xmm1,0x8(%rcx)\n    185e:\t48 8d 65 d8          \tlea    -0x28(%rbp),%rsp\n    1862:\t5b                   \tpop    %rbx\n    1863:\t41 5c                \tpop    %r12\n    1865:\t41 5d                \tpop    %r13\n    1867:\t41 5e                \tpop    %r14\n    1869:\t41 5f                \tpop    %r15\n    186b:\t5d                   \tpop    %rbp\n    186c:\tc3                   \tret\n----\n++++\n\u003c/code\u003e\u003c/pre\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eHost-Compute (ARM CPU)\u003c/summary\u003e\n  Note that the compiler would usually directly output a \u003ca href=\"https://github.com/a2flo/floor/blob/master/etc/example/nbody_aarch64.bin\"\u003e.bin file\u003c/a\u003e (ELF format). The output below comes from disassembling it with \u003ccode\u003eobjdump -d\u003c/code\u003e.\n  Also note that this has been compiled for the \u003ca href=\"https://github.com/a2flo/floor/blob/master/compute/host/host_common.hpp#L62\"\u003e\u003ccode\u003earm-7\u003c/code\u003e target\u003c/a\u003e (ARMv8.6 + FP16 + FP16FML, e.g. Apple M2+/A15+).\n\n++++\n[source,Assembly]\n----\nnbody_aarch64.bin:\tfile format elf64-littleaarch64\n\n\nDisassembly of section .text:\n\n0000000000000000 \u003csimplified_nbody\u003e:\n   0:\td104c3ff \tsub\tsp, sp, #0x130\n   4:\t90000008 \tadrp\tx8, 0 \u003cfloor_global_idx\u003e\n   8:\t6d0a33ed \tstp\td13, d12, [sp, #160]\n   c:\t6d0b2beb \tstp\td11, d10, [sp, #176]\n  10:\t6d0c23e9 \tstp\td9, d8, [sp, #192]\n  14:\ta90d7bfd \tstp\tx29, x30, [sp, #208]\n  18:\t910343fd \tadd\tx29, sp, #0xd0\n  1c:\ta90e6ffc \tstp\tx28, x27, [sp, #224]\n  20:\ta90f67fa \tstp\tx26, x25, [sp, #240]\n  24:\ta9105ff8 \tstp\tx24, x23, [sp, #256]\n  28:\ta91157f6 \tstp\tx22, x21, [sp, #272]\n  2c:\ta9124ff4 \tstp\tx20, x19, [sp, #288]\n  30:\tf9400108 \tldr\tx8, [x8]\n  34:\tb9400117 \tldr\tw23, [x8]\n  38:\t52800188 \tmov\tw8, #0xc                   \t// #12\n  3c:\t9b080af6 \tmadd\tx22, x23, x8, x2\n  40:\t90000008 \tadrp\tx8, 0 \u003cfloor_global_work_size\u003e\n  44:\taa1603f8 \tmov\tx24, x22\n  48:\tf9400108 \tldr\tx8, [x8]\n  4c:\tfd4002c8 \tldr\td8, [x22]\n  50:\tbc408f09 \tldr\ts9, [x24, #8]!\n  54:\tb9400119 \tldr\tw25, [x8]\n  58:\t34000d79 \tcbz\tw25, 204 \u003csimplified_nbody+0x204\u003e\n  5c:\t2f00e403 \tmovi\td3, #0x0\n  60:\t8b171008 \tadd\tx8, x0, x23, lsl #4\n  64:\t9000001c \tadrp\tx28, 0 \u003cfloor_local_idx\u003e\n  68:\ta90007e3 \tstp\tx3, x1, [sp]\n  6c:\t90000013 \tadrp\tx19, 0 \u003csimplified_nbody\u003e\n  70:\t90000014 \tadrp\tx20, 0 \u003chost_compute_device_barrier\u003e\n  74:\taa0003f5 \tmov\tx21, x0\n  78:\t2a1f03fa \tmov\tw26, wzr\n  7c:\tf940039c \tldr\tx28, [x28]\n  80:\t3c9a03a3 \tstur\tq3, [x29, #-96]\n  84:\t2d400500 \tldp\ts0, s1, [x8]\n  88:\tbd400902 \tldr\ts2, [x8, #8]\n  8c:\t5296e2e8 \tmov\tw8, #0xb717                \t// #46871\n  90:\t4f03f603 \tfmov\tv3.4s, #1.000000000000000000e+00\n  94:\t72a71a28 \tmovk\tw8, #0x38d1, lsl #16\n  98:\t2a1f03fb \tmov\tw27, wzr\n  9c:\t3d8017e3 \tstr\tq3, [sp, #80]\n  a0:\t4e040403 \tdup\tv3.4s, v0.s[0]\n  a4:\t4e040d00 \tdup\tv0.4s, w8\n  a8:\tf9400273 \tldr\tx19, [x19]\n  ac:\tad018fe0 \tstp\tq0, q3, [sp, #48]\n  b0:\t2f00e400 \tmovi\td0, #0x0\n  b4:\t4e040423 \tdup\tv3.4s, v1.s[0]\n  b8:\t3d801be0 \tstr\tq0, [sp, #96]\n  bc:\t2f00e400 \tmovi\td0, #0x0\n  c0:\t3c9b03a0 \tstur\tq0, [x29, #-80]\n  c4:\t4e040440 \tdup\tv0.4s, v2.s[0]\n  c8:\tf9400294 \tldr\tx20, [x20]\n  cc:\tad008fe0 \tstp\tq0, q3, [sp, #16]\n  d0:\tb9400388 \tldr\tw8, [x28]\n  d4:\t0b1b2109 \tadd\tw9, w8, w27, lsl #8\n  d8:\t3ce95aa0 \tldr\tq0, [x21, w9, uxtw #4]\n  dc:\t3ca87a60 \tstr\tq0, [x19, x8, lsl #4]\n  e0:\td63f0280 \tblr\tx20\n  e4:\t6f00e400 \tmovi\tv0.2d, #0x0\n  e8:\t3cda03a4 \tldur\tq4, [x29, #-96]\n  ec:\t6f00e402 \tmovi\tv2.2d, #0x0\n  f0:\taa1f03e8 \tmov\tx8, xzr\n  f4:\t6f00e403 \tmovi\tv3.2d, #0x0\n  f8:\tad41abeb \tldp\tq11, q10, [sp, #48]\n  fc:\t6e040480 \tmov\tv0.s[0], v4.s[0]\n 100:\t6f00e401 \tmovi\tv1.2d, #0x0\n 104:\t6f00e405 \tmovi\tv5.2d, #0x0\n 108:\tad4293ff \tldp\tq31, q4, [sp, #80]\n 10c:\t6e040482 \tmov\tv2.s[0], v4.s[0]\n 110:\t3cdb03a4 \tldur\tq4, [x29, #-80]\n 114:\tad40b3ed \tldp\tq13, q12, [sp, #16]\n 118:\t6e040483 \tmov\tv3.s[0], v4.s[0]\n 11c:\t6f00e404 \tmovi\tv4.2d, #0x0\n 120:\t8b080269 \tadd\tx9, x19, x8\n 124:\t91020108 \tadd\tx8, x8, #0x80\n 128:\t4eab1d67 \tmov\tv7.16b, v11.16b\n 12c:\tf140051f \tcmp\tx8, #0x1, lsl #12\n 130:\t4eab1d7b \tmov\tv27.16b, v11.16b\n 134:\t4cdf0930 \tld4\t{v16.4s-v19.4s}, [x9], #64\n 138:\t4eaad606 \tfsub\tv6.4s, v16.4s, v10.4s\n 13c:\t4eacd638 \tfsub\tv24.4s, v17.4s, v12.4s\n 140:\t4eadd659 \tfsub\tv25.4s, v18.4s, v13.4s\n 144:\t4e26ccc7 \tfmla\tv7.4s, v6.4s, v6.4s\n 148:\t4e38cf07 \tfmla\tv7.4s, v24.4s, v24.4s\n 14c:\t4c400934 \tld4\t{v20.4s-v23.4s}, [x9]\n 150:\t4e39cf27 \tfmla\tv7.4s, v25.4s, v25.4s\n 154:\t6ea1f8e7 \tfsqrt\tv7.4s, v7.4s\n 158:\t4eaad69a \tfsub\tv26.4s, v20.4s, v10.4s\n 15c:\t4eacd6bc \tfsub\tv28.4s, v21.4s, v12.4s\n 160:\t4eadd6dd \tfsub\tv29.4s, v22.4s, v13.4s\n 164:\t4e3acf5b \tfmla\tv27.4s, v26.4s, v26.4s\n 168:\t6e27ffe7 \tfdiv\tv7.4s, v31.4s, v7.4s\n 16c:\t4e3ccf9b \tfmla\tv27.4s, v28.4s, v28.4s\n 170:\t4e3dcfbb \tfmla\tv27.4s, v29.4s, v29.4s\n 174:\t6ea1fb7b \tfsqrt\tv27.4s, v27.4s\n 178:\t6e27dcfe \tfmul\tv30.4s, v7.4s, v7.4s\n 17c:\t6e27de67 \tfmul\tv7.4s, v19.4s, v7.4s\n 180:\t6e27dfc7 \tfmul\tv7.4s, v30.4s, v7.4s\n 184:\t6e3bfffb \tfdiv\tv27.4s, v31.4s, v27.4s\n 188:\t4e26cce3 \tfmla\tv3.4s, v7.4s, v6.4s\n 18c:\t4e38cce2 \tfmla\tv2.4s, v7.4s, v24.4s\n 190:\t4e39cce0 \tfmla\tv0.4s, v7.4s, v25.4s\n 194:\t6e3bdf70 \tfmul\tv16.4s, v27.4s, v27.4s\n 198:\t6e3bdef1 \tfmul\tv17.4s, v23.4s, v27.4s\n 19c:\t6e31de10 \tfmul\tv16.4s, v16.4s, v17.4s\n 1a0:\t4e3ace05 \tfmla\tv5.4s, v16.4s, v26.4s\n 1a4:\t4e3cce04 \tfmla\tv4.4s, v16.4s, v28.4s\n 1a8:\t4e3dce01 \tfmla\tv1.4s, v16.4s, v29.4s\n 1ac:\t54fffba1 \tb.ne\t120 \u003csimplified_nbody+0x120\u003e  // b.any\n 1b0:\t4e23d4a3 \tfadd\tv3.4s, v5.4s, v3.4s\n 1b4:\t4e20d420 \tfadd\tv0.4s, v1.4s, v0.4s\n 1b8:\t4e22d482 \tfadd\tv2.4s, v4.4s, v2.4s\n 1bc:\t6e20d461 \tfaddp\tv1.4s, v3.4s, v0.4s\n 1c0:\t6e20d442 \tfaddp\tv2.4s, v2.4s, v0.4s\n 1c4:\t6e20d400 \tfaddp\tv0.4s, v0.4s, v0.4s\n 1c8:\t7e30d821 \tfaddp\ts1, v1.2s\n 1cc:\t7e30d800 \tfaddp\ts0, v0.2s\n 1d0:\tad3d07a0 \tstp\tq0, q1, [x29, #-96]\n 1d4:\t7e30d841 \tfaddp\ts1, v2.2s\n 1d8:\t3d801be1 \tstr\tq1, [sp, #96]\n 1dc:\td63f0280 \tblr\tx20\n 1e0:\t1104035a \tadd\tw26, w26, #0x100\n 1e4:\t1100077b \tadd\tw27, w27, #0x1\n 1e8:\t6b19035f \tcmp\tw26, w25\n 1ec:\t54fff723 \tb.cc\td0 \u003csimplified_nbody+0xd0\u003e  // b.lo, b.ul, b.last\n 1f0:\tad7d07a2 \tldp\tq2, q1, [x29, #-96]\n 1f4:\t3dc01be0 \tldr\tq0, [sp, #96]\n 1f8:\ta94007e3 \tldp\tx3, x1, [sp]\n 1fc:\t6e0c0401 \tmov\tv1.s[1], v0.s[0]\n 200:\t14000003 \tb\t20c \u003csimplified_nbody+0x20c\u003e\n 204:\t2f00e401 \tmovi\td1, #0x0\n 208:\t2f00e402 \tmovi\td2, #0x0\n 20c:\t5297cee8 \tmov\tw8, #0xbe77                \t// #48759\n 210:\tbd400060 \tldr\ts0, [x3]\n 214:\t72a7efe8 \tmovk\tw8, #0x3f7f, lsl #16\n 218:\t8b171029 \tadd\tx9, x1, x23, lsl #4\n 21c:\ta9524ff4 \tldp\tx20, x19, [sp, #288]\n 220:\t0f801028 \tfmla\tv8.2s, v1.2s, v0.s[0]\n 224:\t1f022402 \tfmadd\ts2, s0, s2, s9\n 228:\t0e040d01 \tdup\tv1.2s, w8\n 22c:\t1e270103 \tfmov\ts3, w8\n 230:\tfd400124 \tldr\td4, [x9]\n 234:\ta94f67fa \tldp\tx26, x25, [sp, #240]\n 238:\t1e230842 \tfmul\ts2, s2, s3\n 23c:\t2e21dd01 \tfmul\tv1.2s, v8.2s, v1.2s\n 240:\tbd400923 \tldr\ts3, [x9, #8]\n 244:\ta94e6ffc \tldp\tx28, x27, [sp, #224]\n 248:\tbd000302 \tstr\ts2, [x24]\n 24c:\t0f801024 \tfmla\tv4.2s, v1.2s, v0.s[0]\n 250:\t1f000c40 \tfmadd\ts0, s2, s0, s3\n 254:\tfd0002c1 \tstr\td1, [x22]\n 258:\ta95157f6 \tldp\tx22, x21, [sp, #272]\n 25c:\ta9505ff8 \tldp\tx24, x23, [sp, #256]\n 260:\tfd000124 \tstr\td4, [x9]\n 264:\ta94d7bfd \tldp\tx29, x30, [sp, #208]\n 268:\tbd000920 \tstr\ts0, [x9, #8]\n 26c:\t6d4c23e9 \tldp\td9, d8, [sp, #192]\n 270:\t6d4b2beb \tldp\td11, d10, [sp, #176]\n 274:\t6d4a33ed \tldp\td13, d12, [sp, #160]\n 278:\t9104c3ff \tadd\tsp, sp, #0x130\n 27c:\td65f03c0 \tret\n\n----\n++++\n\u003c/code\u003e\u003c/pre\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eMetal / AIR\u003c/summary\u003e\n  Note that the compiler would usually directly output a \u003ca href=\"https://github.com/a2flo/floor/blob/master/etc/example/nbody.metallib\"\u003e.metallib file\u003c/a\u003e. The output below comes from disassembling it with \u003ccode\u003emetallib-dis\u003c/code\u003e (provided by the \u003ca href=\"#computegraphics-toolchain\"\u003etoolchain\u003c/a\u003e).\n  \n++++\n[source,LLVM]\n----\n; ModuleID = 'bc_module'\nsource_filename = \"simplified_nbody\"\ntarget datalayout = \"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32\"\ntarget triple = \"air64-apple-macosx14.0.0\"\n\n%class.vector4 = type { %union.anon }\n%union.anon = type { %struct.anon }\n%struct.anon = type { float, float, float, float }\n%class.vector3 = type { %union.anon.8 }\n%union.anon.8 = type { %struct.anon.9 }\n%struct.anon.9 = type { float, float, float }\n\n@_ZZ16simplified_nbodyE20local_body_positions = internal addrspace(3) unnamed_addr global [256 x %class.vector4] undef, align 16\n\n; Function Attrs: nounwind\ndefine void @simplified_nbody(%class.vector4 addrspace(1)* noalias nocapture readonly %0, %class.vector4 addrspace(1)* noalias nocapture %1, %class.vector3 addrspace(1)* noalias nocapture %2, float addrspace(2)* noalias nocapture readonly align 4 dereferenceable(4) %3, \u003c3 x i32\u003e %4, \u003c3 x i32\u003e %5, \u003c3 x i32\u003e %6, \u003c3 x i32\u003e %7, \u003c3 x i32\u003e %8, \u003c3 x i32\u003e %9, i32 %10, i32 %11, i32 %12, i32 %13) local_unnamed_addr #0 !reqd_work_group_size !33 !kernel_dim !34 {\n  %15 = extractelement \u003c3 x i32\u003e %4, i64 0\n  %16 = zext i32 %15 to i64\n  %17 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %0, i64 %16, i32 0, i32 0, i32 0\n  %18 = bitcast float addrspace(1)* %17 to \u003c3 x float\u003e addrspace(1)*\n  %19 = load \u003c3 x float\u003e, \u003c3 x float\u003e addrspace(1)* %18, align 4\n  %20 = extractelement \u003c3 x float\u003e %19, i64 0\n  %21 = getelementptr inbounds %class.vector3, %class.vector3 addrspace(1)* %2, i64 %16, i32 0, i32 0, i32 0\n  %22 = bitcast float addrspace(1)* %21 to \u003c3 x float\u003e addrspace(1)*\n  %23 = load \u003c3 x float\u003e, \u003c3 x float\u003e addrspace(1)* %22, align 4\n  %24 = extractelement \u003c3 x i32\u003e %5, i64 0\n  %25 = extractelement \u003c3 x i32\u003e %6, i64 0\n  %26 = zext i32 %25 to i64\n  %27 = getelementptr inbounds [256 x %class.vector4], [256 x %class.vector4] addrspace(3)* @_ZZ16simplified_nbodyE20local_body_positions, i64 0, i64 %26, i32 0, i32 0, i32 0\n  %28 = bitcast float addrspace(3)* %27 to \u003c4 x float\u003e addrspace(3)*\n  %29 = shufflevector \u003c3 x float\u003e %19, \u003c3 x float\u003e undef, \u003c2 x i32\u003e \u003ci32 1, i32 2\u003e\n  br label %57\n\n30:                                               ; preds = %68\n  %31 = extractelement \u003c3 x float\u003e %23, i64 0\n  %32 = load float, float addrspace(2)* %3, align 4\n  %33 = fmul fast float %32, %100\n  %34 = insertelement \u003c2 x float\u003e undef, float %32, i64 0\n  %35 = shufflevector \u003c2 x float\u003e %34, \u003c2 x float\u003e undef, \u003c2 x i32\u003e zeroinitializer\n  %36 = fmul fast \u003c2 x float\u003e %35, %101\n  %37 = fadd fast float %33, %31\n  %38 = shufflevector \u003c3 x float\u003e %23, \u003c3 x float\u003e undef, \u003c2 x i32\u003e \u003ci32 1, i32 2\u003e\n  %39 = fadd fast \u003c2 x float\u003e %36, %38\n  %40 = fmul fast float %37, 0x3FEFF7CEE0000000\n  %41 = fmul fast \u003c2 x float\u003e %39, \u003cfloat 0x3FEFF7CEE0000000, float 0x3FEFF7CEE0000000\u003e\n  %42 = fmul fast float %40, %32\n  %43 = fmul fast \u003c2 x float\u003e %41, %35\n  %44 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %1, i64 %16, i32 0, i32 0, i32 0\n  %45 = bitcast float addrspace(1)* %44 to \u003c3 x float\u003e addrspace(1)*\n  %46 = load \u003c3 x float\u003e, \u003c3 x float\u003e addrspace(1)* %45, align 4, !tbaa !35\n  %47 = extractelement \u003c3 x float\u003e %46, i64 0\n  %48 = fadd fast float %42, %47\n  %49 = shufflevector \u003c3 x float\u003e %46, \u003c3 x float\u003e undef, \u003c2 x i32\u003e \u003ci32 1, i32 2\u003e\n  %50 = fadd fast \u003c2 x float\u003e %43, %49\n  %51 = insertelement \u003c3 x float\u003e undef, float %48, i64 0\n  %52 = shufflevector \u003c2 x float\u003e %50, \u003c2 x float\u003e undef, \u003c3 x i32\u003e \u003ci32 0, i32 1, i32 undef\u003e\n  %53 = shufflevector \u003c3 x float\u003e %51, \u003c3 x float\u003e %52, \u003c3 x i32\u003e \u003ci32 0, i32 3, i32 4\u003e\n  store \u003c3 x float\u003e %53, \u003c3 x float\u003e addrspace(1)* %45, align 4, !tbaa !35\n  %54 = insertelement \u003c3 x float\u003e undef, float %40, i64 0\n  %55 = shufflevector \u003c2 x float\u003e %41, \u003c2 x float\u003e undef, \u003c3 x i32\u003e \u003ci32 0, i32 1, i32 undef\u003e\n  %56 = shufflevector \u003c3 x float\u003e %54, \u003c3 x float\u003e %55, \u003c3 x i32\u003e \u003ci32 0, i32 3, i32 4\u003e\n  store \u003c3 x float\u003e %56, \u003c3 x float\u003e addrspace(1)* %22, align 4, !tbaa !35\n  ret void\n\n57:                                               ; preds = %68, %14\n  %58 = phi i32 [ 0, %14 ], [ %69, %68 ]\n  %59 = phi i32 [ 0, %14 ], [ %70, %68 ]\n  %60 = phi float [ 0.000000e+00, %14 ], [ %100, %68 ]\n  %61 = phi \u003c2 x float\u003e [ zeroinitializer, %14 ], [ %101, %68 ]\n  %62 = shl i32 %59, 8\n  %63 = add i32 %25, %62\n  %64 = zext i32 %63 to i64\n  %65 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %0, i64 %64, i32 0, i32 0, i32 0\n  %66 = bitcast float addrspace(1)* %65 to \u003c4 x float\u003e addrspace(1)*\n  %67 = load \u003c4 x float\u003e, \u003c4 x float\u003e addrspace(1)* %66, align 4\n  store \u003c4 x float\u003e %67, \u003c4 x float\u003e addrspace(3)* %28, align 4, !tbaa !35\n  tail call void @air.wg.barrier(i32 2, i32 1) #3\n  br label %72\n\n68:                                               ; preds = %72\n  tail call void @air.wg.barrier(i32 2, i32 1) #3\n  %69 = add i32 %58, 256\n  %70 = add i32 %59, 1\n  %71 = icmp ult i32 %69, %24\n  br i1 %71, label %57, label %30, !llvm.loop !38\n\n72:                                               ; preds = %72, %57\n  %73 = phi i32 [ 0, %57 ], [ %102, %72 ]\n  %74 = phi float [ %60, %57 ], [ %100, %72 ]\n  %75 = phi \u003c2 x float\u003e [ %61, %57 ], [ %101, %72 ]\n  %76 = zext i32 %73 to i64\n  %77 = getelementptr inbounds [256 x %class.vector4], [256 x %class.vector4] addrspace(3)* @_ZZ16simplified_nbodyE20local_body_positions, i64 0, i64 %76, i32 0, i32 0, i32 0\n  %78 = bitcast float addrspace(3)* %77 to \u003c4 x float\u003e addrspace(3)*\n  %79 = load \u003c4 x float\u003e, \u003c4 x float\u003e addrspace(3)* %78, align 4\n  %80 = extractelement \u003c4 x float\u003e %79, i64 0\n  %81 = extractelement \u003c4 x float\u003e %79, i64 3\n  %82 = fsub fast float %80, %20\n  %83 = shufflevector \u003c4 x float\u003e %79, \u003c4 x float\u003e undef, \u003c2 x i32\u003e \u003ci32 1, i32 2\u003e\n  %84 = fsub fast \u003c2 x float\u003e %83, %29\n  %85 = fmul fast float %82, %82\n  %86 = fmul fast \u003c2 x float\u003e %84, %84\n  %87 = extractelement \u003c2 x float\u003e %86, i64 0\n  %88 = extractelement \u003c2 x float\u003e %86, i64 1\n  %89 = fadd fast float %85, 0x3F1A36E2E0000000\n  %90 = fadd fast float %89, %87\n  %91 = fadd fast float %90, %88\n  %92 = tail call fast float @air.fast_rsqrt.f32(float %91) #4\n  %93 = fmul fast float %92, %92\n  %94 = fmul fast float %93, %92\n  %95 = fmul fast float %94, %81\n  %96 = fmul fast float %95, %82\n  %97 = insertelement \u003c2 x float\u003e undef, float %95, i64 0\n  %98 = shufflevector \u003c2 x float\u003e %97, \u003c2 x float\u003e undef, \u003c2 x i32\u003e zeroinitializer\n  %99 = fmul fast \u003c2 x float\u003e %98, %84\n  %100 = fadd fast float %96, %74\n  %101 = fadd fast \u003c2 x float\u003e %99, %75\n  %102 = add nuw nsw i32 %73, 1\n  %103 = icmp eq i32 %102, 256\n  br i1 %103, label %68, label %72, !llvm.loop !40\n}\n\n; Function Attrs: nounwind readnone\ndeclare float @air.fast_rsqrt.f32(float) local_unnamed_addr #1\n\n; Function Attrs: convergent noduplicate\ndeclare void @air.wg.barrier(i32, i32) local_unnamed_addr #2\n\nattributes #0 = { nounwind \"approx-func-fp-math\"=\"true\" \"frame-pointer\"=\"all\" \"less-precise-fpmad\"=\"true\" \"no-infs-fp-math\"=\"true\" \"no-nans-fp-math\"=\"true\" \"no-signed-zeros-fp-math\"=\"true\" \"no-trapping-math\"=\"true\" \"stack-protector-buffer-size\"=\"8\" \"uniform-work-group-size\"=\"true\" \"unsafe-fp-math\"=\"true\" }\nattributes #1 = { nounwind readnone \"approx-func-fp-math\"=\"true\" \"frame-pointer\"=\"all\" \"less-precise-fpmad\"=\"true\" \"no-infs-fp-math\"=\"true\" \"no-nans-fp-math\"=\"true\" \"no-signed-zeros-fp-math\"=\"true\" \"no-trapping-math\"=\"true\" \"stack-protector-buffer-size\"=\"8\" \"unsafe-fp-math\"=\"true\" }\nattributes #2 = { convergent noduplicate \"approx-func-fp-math\"=\"true\" \"frame-pointer\"=\"all\" \"less-precise-fpmad\"=\"true\" \"no-infs-fp-math\"=\"true\" \"no-nans-fp-math\"=\"true\" \"no-signed-zeros-fp-math\"=\"true\" \"no-trapping-math\"=\"true\" \"stack-protector-buffer-size\"=\"8\" \"unsafe-fp-math\"=\"true\" }\nattributes #3 = { convergent noduplicate nounwind }\nattributes #4 = { nounwind readnone }\n\n!air.kernel = !{!0}\n!air.version = !{!18}\n!air.language_version = !{!19}\n!air.compile_options = !{!20, !21, !22}\n!llvm.module.flags = !{!23, !24, !25, !26, !27, !28, !29, !30, !31}\n!llvm.ident = !{!32}\n\n!0 = !{void (%class.vector4 addrspace(1)*, %class.vector4 addrspace(1)*, %class.vector3 addrspace(1)*, float addrspace(2)*, \u003c3 x i32\u003e, \u003c3 x i32\u003e, \u003c3 x i32\u003e, \u003c3 x i32\u003e, \u003c3 x i32\u003e, \u003c3 x i32\u003e, i32, i32, i32, i32)* @simplified_nbody, !1, !2, !17}\n!1 = !{}\n!2 = !{!3, !4, !5, !6, !7, !8, !9, !10, !11, !12, !13, !14, !15, !16}\n!3 = !{i32 0, !\"air.buffer\", !\"air.location_index\", i32 0, i32 1, !\"air.read\", !\"air.address_space\", i32 1, !\"air.arg_type_size\", i32 16, !\"air.arg_type_align_size\", i32 16, !\"air.arg_type_name\", !\"float4\", !\"air.arg_name\", !\"in_positions\"}\n!4 = !{i32 1, !\"air.buffer\", !\"air.location_index\", i32 1, i32 1, !\"air.read_write\", !\"air.address_space\", i32 1, !\"air.arg_type_size\", i32 16, !\"air.arg_type_align_size\", i32 16, !\"air.arg_type_name\", !\"float4\", !\"air.arg_name\", !\"out_positions\"}\n!5 = !{i32 2, !\"air.buffer\", !\"air.location_index\", i32 2, i32 1, !\"air.read_write\", !\"air.address_space\", i32 1, !\"air.arg_type_size\", i32 12, !\"air.arg_type_align_size\", i32 12, !\"air.arg_type_name\", !\"float3\", !\"air.arg_name\", !\"inout_velocities\"}\n!6 = !{i32 3, !\"air.buffer\", !\"air.buffer_size\", i32 4, !\"air.location_index\", i32 3, i32 1, !\"air.read\", !\"air.address_space\", i32 2, !\"air.arg_type_size\", i32 4, !\"air.arg_type_align_size\", i32 4, !\"air.arg_type_name\", !\"float\", !\"air.arg_name\", !\"time_delta\"}\n!7 = !{i32 4, !\"air.thread_position_in_grid\", !\"air.arg_type_name\", !\"uint3\", !\"air.arg_name\", !\"__metal__global_id__\"}\n!8 = !{i32 5, !\"air.threads_per_grid\", !\"air.arg_type_name\", !\"uint3\", !\"air.arg_name\", !\"__metal__global_size__\"}\n!9 = !{i32 6, !\"air.thread_position_in_threadgroup\", !\"air.arg_type_name\", !\"uint3\", !\"air.arg_name\", !\"__metal__local_id__\"}\n!10 = !{i32 7, !\"air.threads_per_threadgroup\", !\"air.arg_type_name\", !\"uint3\", !\"air.arg_name\", !\"__metal__local_size__\"}\n!11 = !{i32 8, !\"air.threadgroup_position_in_grid\", !\"air.arg_type_name\", !\"uint3\", !\"air.arg_name\", !\"__metal__group_id__\"}\n!12 = !{i32 9, !\"air.threadgroups_per_grid\", !\"air.arg_type_name\", !\"uint3\", !\"air.arg_name\", !\"__metal__group_size__\"}\n!13 = !{i32 10, !\"air.simdgroup_index_in_threadgroup\", !\"air.arg_type_name\", !\"uint\", !\"air.arg_name\", !\"__metal__sub_group_id__\"}\n!14 = !{i32 11, !\"air.thread_index_in_simdgroup\", !\"air.arg_type_name\", !\"uint\", !\"air.arg_name\", !\"__metal__sub_group_local_id__\"}\n!15 = !{i32 12, !\"air.threads_per_simdgroup\", !\"air.arg_type_name\", !\"uint\", !\"air.arg_name\", !\"__metal__sub_group_size__\"}\n!16 = !{i32 13, !\"air.simdgroups_per_threadgroup\", !\"air.arg_type_name\", !\"uint\", !\"air.arg_name\", !\"__metal__num_sub_groups__\"}\n!17 = !{!\"air.max_work_group_size\", i32 256}\n!18 = !{i32 2, i32 6, i32 0}\n!19 = !{!\"Metal\", i32 3, i32 1, i32 0}\n!20 = !{!\"air.compile.denorms_disable\"}\n!21 = !{!\"air.compile.fast_math_enable\"}\n!22 = !{!\"air.compile.framebuffer_fetch_enable\"}\n!23 = !{i32 7, !\"air.max_device_buffers\", i32 31}\n!24 = !{i32 7, !\"air.max_constant_buffers\", i32 31}\n!25 = !{i32 7, !\"air.max_threadgroup_buffers\", i32 31}\n!26 = !{i32 7, !\"air.max_textures\", i32 128}\n!27 = !{i32 7, !\"air.max_read_write_textures\", i32 8}\n!28 = !{i32 7, !\"air.max_samplers\", i32 16}\n!29 = !{i32 1, !\"wchar_size\", i32 4}\n!30 = !{i32 7, !\"frame-pointer\", i32 2}\n!31 = !{i32 2, !\"SDK Version\", [2 x i32] [i32 14, i32 0]}\n!32 = !{!\"Apple metal version 32023.155 (metalfe-32023.155)\"}\n!33 = !{i32 256, i32 1, i32 1}\n!34 = !{i32 1}\n!35 = !{!36, !36, i64 0}\n!36 = !{!\"omnipotent char\", !37, i64 0}\n!37 = !{!\"Simple C++ TBAA\"}\n!38 = distinct !{!38, !39}\n!39 = !{!\"llvm.loop.mustprogress\"}\n!40 = distinct !{!40, !39}\n----\n++++\n\u003c/code\u003e\u003c/pre\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eOpenCL / SPIR\u003c/summary\u003e\n  Note that the compiler would usually directly output a \u003ca href=\"https://github.com/a2flo/floor/blob/master/etc/example/nbody_cl.bc\"\u003e.bc file\u003c/a\u003e. The output below comes from disassembling it with \u003ccode\u003ellvm-dis\u003c/code\u003e (provided by the \u003ca href=\"#computegraphics-toolchain\"\u003etoolchain\u003c/a\u003e). Also note that the bitcode file is exported in a LLVM 3.2 / SPIR 1.2 compatible format, but the output below uses LLVM 14.0 syntax.\n  \n++++\n[source,LLVM]\n----\n; ModuleID = 'spir.bc'\nsource_filename = \"spir.bc\"\ntarget datalayout = \"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024\"\ntarget triple = \"spir64-unknown-unknown\"\n\n%class.vector4 = type { %union.anon }\n%union.anon = type { %struct.anon }\n%struct.anon = type { float, float, float, float }\n%class.vector3 = type { %union.anon.8 }\n%union.anon.8 = type { %struct.anon.9 }\n%struct.anon.9 = type { float, float, float }\n\n@simplified_nbody.local_body_positions = internal unnamed_addr addrspace(3) global [256 x %class.vector4] undef, align 4\n\ndefine floor_kernel void @simplified_nbody(%class.vector4 addrspace(1)* %0, %class.vector4 addrspace(1)* %1, %class.vector3 addrspace(1)* %2, float %3) {\n  %5 = tail call floor_func i64 @_Z13get_global_idj(i32 0), !range !14\n  %6 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %0, i64 %5, i32 0, i32 0, i32 0\n  %7 = load float, float addrspace(1)* %6, align 4\n  %8 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %0, i64 %5, i32 0, i32 0, i32 1\n  %9 = load float, float addrspace(1)* %8, align 4\n  %10 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %0, i64 %5, i32 0, i32 0, i32 2\n  %11 = load float, float addrspace(1)* %10, align 4\n  %12 = getelementptr inbounds %class.vector3, %class.vector3 addrspace(1)* %2, i64 %5, i32 0, i32 0, i32 0\n  %13 = load float, float addrspace(1)* %12, align 4\n  %14 = getelementptr inbounds %class.vector3, %class.vector3 addrspace(1)* %2, i64 %5, i32 0, i32 0, i32 1\n  %15 = load float, float addrspace(1)* %14, align 4\n  %16 = getelementptr inbounds %class.vector3, %class.vector3 addrspace(1)* %2, i64 %5, i32 0, i32 0, i32 2\n  %17 = load float, float addrspace(1)* %16, align 4\n  %18 = tail call floor_func i64 @_Z15get_global_sizej(i32 0), !range !15\n  %19 = trunc i64 %18 to i32, !range !16\n  %20 = tail call floor_func i64 @_Z12get_local_idj(i32 0), !range !17\n  %21 = trunc i64 %20 to i32, !range !18\n  %22 = getelementptr inbounds [256 x %class.vector4], [256 x %class.vector4] addrspace(3)* @simplified_nbody.local_body_positions, i64 0, i64 %20, i32 0, i32 0, i32 0\n  %23 = getelementptr inbounds [256 x %class.vector4], [256 x %class.vector4] addrspace(3)* @simplified_nbody.local_body_positions, i64 0, i64 %20, i32 0, i32 0, i32 1\n  %24 = getelementptr inbounds [256 x %class.vector4], [256 x %class.vector4] addrspace(3)* @simplified_nbody.local_body_positions, i64 0, i64 %20, i32 0, i32 0, i32 2\n  %25 = getelementptr inbounds [256 x %class.vector4], [256 x %class.vector4] addrspace(3)* @simplified_nbody.local_body_positions, i64 0, i64 %20, i32 0, i32 0, i32 3\n  br label %48\n\n26:                                               ; preds = %65\n  %27 = fmul float %98, %3\n  %28 = fmul float %99, %3\n  %29 = fmul float %100, %3\n  %30 = fadd float %27, %13\n  %31 = fadd float %28, %15\n  %32 = fadd float %29, %17\n  %33 = fmul float %30, 0x3FEFF7CEE0000000\n  %34 = fmul float %31, 0x3FEFF7CEE0000000\n  %35 = fmul float %32, 0x3FEFF7CEE0000000\n  %36 = fmul float %33, %3\n  %37 = fmul float %34, %3\n  %38 = fmul float %35, %3\n  %39 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %1, i64 %5, i32 0, i32 0, i32 0\n  %40 = load float, float addrspace(1)* %39, align 4, !tbaa !19\n  %41 = fadd float %40, %36\n  store float %41, float addrspace(1)* %39, align 4, !tbaa !19\n  %42 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %1, i64 %5, i32 0, i32 0, i32 1\n  %43 = load float, float addrspace(1)* %42, align 4, !tbaa !19\n  %44 = fadd float %43, %37\n  store float %44, float addrspace(1)* %42, align 4, !tbaa !19\n  %45 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %1, i64 %5, i32 0, i32 0, i32 2\n  %46 = load float, float addrspace(1)* %45, align 4, !tbaa !19\n  %47 = fadd float %46, %38\n  store float %47, float addrspace(1)* %45, align 4, !tbaa !19\n  store float %33, float addrspace(1)* %12, align 4, !tbaa !19\n  store float %34, float addrspace(1)* %14, align 4, !tbaa !19\n  store float %35, float addrspace(1)* %16, align 4, !tbaa !19\n  ret void\n\n48:                                               ; preds = %65, %4\n  %49 = phi i32 [ 0, %4 ], [ %66, %65 ]\n  %50 = phi i32 [ 0, %4 ], [ %67, %65 ]\n  %51 = phi float [ 0.000000e+00, %4 ], [ %100, %65 ]\n  %52 = phi float [ 0.000000e+00, %4 ], [ %99, %65 ]\n  %53 = phi float [ 0.000000e+00, %4 ], [ %98, %65 ]\n  %54 = shl i32 %50, 8\n  %55 = add i32 %54, %21\n  %56 = zext i32 %55 to i64\n  %57 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %0, i64 %56, i32 0, i32 0, i32 0\n  %58 = load float, float addrspace(1)* %57, align 4\n  %59 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %0, i64 %56, i32 0, i32 0, i32 1\n  %60 = load float, float addrspace(1)* %59, align 4\n  %61 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %0, i64 %56, i32 0, i32 0, i32 2\n  %62 = load float, float addrspace(1)* %61, align 4\n  %63 = getelementptr inbounds %class.vector4, %class.vector4 addrspace(1)* %0, i64 %56, i32 0, i32 0, i32 3\n  %64 = load float, float addrspace(1)* %63, align 4\n  store float %58, float addrspace(3)* %22, align 4, !tbaa !19\n  store float %60, float addrspace(3)* %23, align 4, !tbaa !19\n  store float %62, float addrspace(3)* %24, align 4, !tbaa !19\n  store float %64, float addrspace(3)* %25, align 4, !tbaa !19\n  tail call floor_func void @_Z7barrierj(i32 1)\n  br label %69\n\n65:                                               ; preds = %69\n  tail call floor_func void @_Z7barrierj(i32 1)\n  %66 = add i32 %49, 256\n  %67 = add i32 %50, 1\n  %68 = icmp ult i32 %66, %19\n  br i1 %68, label %48, label %26, !llvm.loop !22\n\n69:                                               ; preds = %69, %48\n  %70 = phi i64 [ 0, %48 ], [ %101, %69 ]\n  %71 = phi float [ %51, %48 ], [ %100, %69 ]\n  %72 = phi float [ %52, %48 ], [ %99, %69 ]\n  %73 = phi float [ %53, %48 ], [ %98, %69 ]\n  %74 = getelementptr inbounds [256 x %class.vector4], [256 x %class.vector4] addrspace(3)* @simplified_nbody.local_body_positions, i64 0, i64 %70, i32 0, i32 0, i32 0\n  %75 = load float, float addrspace(3)* %74, align 4\n  %76 = getelementptr inbounds [256 x %class.vector4], [256 x %class.vector4] addrspace(3)* @simplified_nbody.local_body_positions, i64 0, i64 %70, i32 0, i32 0, i32 1\n  %77 = load float, float addrspace(3)* %76, align 4\n  %78 = getelementptr inbounds [256 x %class.vector4], [256 x %class.vector4] addrspace(3)* @simplified_nbody.local_body_positions, i64 0, i64 %70, i32 0, i32 0, i32 2\n  %79 = load float, float addrspace(3)* %78, align 4\n  %80 = fsub float %75, %7\n  %81 = fsub float %77, %9\n  %82 = fsub float %79, %11\n  %83 = fmul float %80, %80\n  %84 = fmul float %81, %81\n  %85 = fmul float %82, %82\n  %86 = fadd float %83, 0x3F1A36E2E0000000\n  %87 = fadd float %86, %84\n  %88 = fadd float %87, %85\n  %89 = tail call floor_func float @_Z5rsqrtf(float %88)\n  %90 = getelementptr inbounds [256 x %class.vector4], [256 x %class.vector4] addrspace(3)* @simplified_nbody.local_body_positions, i64 0, i64 %70, i32 0, i32 0, i32 3\n  %91 = load float, float addrspace(3)* %90, align 4, !tbaa !19\n  %92 = fmul float %89, %89\n  %93 = fmul float %92, %89\n  %94 = fmul float %93, %91\n  %95 = fmul float %94, %80\n  %96 = fmul float %94, %81\n  %97 = fmul float %94, %82\n  %98 = fadd float %95, %73\n  %99 = fadd float %96, %72\n  %100 = fadd float %97, %71\n  %101 = add nuw nsw i64 %70, 1\n  %102 = icmp eq i64 %101, 256\n  br i1 %102, label %65, label %69, !llvm.loop !24\n}\n\ndeclare floor_func i64 @_Z13get_global_idj(i32)\n\ndeclare floor_func i64 @_Z15get_global_sizej(i32)\n\ndeclare floor_func i64 @_Z12get_local_idj(i32)\n\ndeclare floor_func float @_Z5rsqrtf(float)\n\ndeclare floor_func void @_Z7barrierj(i32)\n\n!opencl.kernels = !{!0}\n!llvm.linker.options = !{}\n!llvm.module.flags = !{!7, !8}\n!opencl.ocl.version = !{!9}\n!opencl.spir.version = !{!9}\n!opencl.enable.FP_CONTRACT = !{}\n!opencl.used.extensions = !{!10}\n!opencl.used.optional.core.features = !{!11}\n!opencl.compiler.options = !{!12}\n!llvm.ident = !{!13}\n\n!0 = !{void (%class.vector4 addrspace(1)*, %class.vector4 addrspace(1)*, %class.vector3 addrspace(1)*, float)* @simplified_nbody, !1, !2, !3, !4, !5, !6}\n!1 = !{!\"kernel_arg_addr_space\", i32 1, i32 1, i32 1, i32 0}\n!2 = !{!\"kernel_arg_access_qual\", !\"none\", !\"none\", !\"none\", !\"none\"}\n!3 = !{!\"kernel_arg_type\", !\"compute_global_buffer\u003cconst float4\u003e\", !\"compute_global_buffer\u003cfloat4\u003e\", !\"compute_global_buffer\u003cfloat3\u003e\", !\"param\u003cfloat\u003e\"}\n!4 = !{!\"kernel_arg_base_type\", !\"struct __class vector4\u003cfloat\u003e*\", !\"struct __class vector4\u003cfloat\u003e*\", !\"struct __class vector3\u003cfloat\u003e*\", !\"float\"}\n!5 = !{!\"kernel_arg_type_qual\", !\"restrict const\", !\"restrict\", !\"restrict\", !\"const\"}\n!6 = !{!\"kernel_arg_name\", !\"in_positions\", !\"out_positions\", !\"inout_velocities\", !\"time_delta\"}\n!7 = !{i32 1, !\"wchar_size\", i32 4}\n!8 = !{i32 7, !\"frame-pointer\", i32 2}\n!9 = !{i32 1, i32 2}\n!10 = !{!\"cl_khr_byte_addressable_store\", !\"cl_khr_global_int32_base_atomics\", !\"cl_khr_global_int32_extended_atomics\", !\"cl_khr_local_int32_base_atomics\", !\"cl_khr_local_int32_extended_atomics\", !\"cl_khr_fp64\", !\"cl_khr_fp16\", !\"cl_khr_gl_msaa_sharing\"}\n!11 = !{!\"cl_doubles\"}\n!12 = !{!\"-cl-kernel-arg-info\", !\"-cl-mad-enable\", !\"-cl-denorms-are-zero\", !\"-cl-unsafe-math-optimizations\"}\n!13 = !{!\"clang version 14.0.6 (https://github.com/a2flo/floor_llvm.git 85a83a4073c340ac03ca1c8fcd131db30339db24)\"}\n!14 = !{i64 0, i64 4294967295}\n!15 = !{i64 1, i64 4294967295}\n!16 = !{i32 1, i32 -1}\n!17 = !{i64 0, i64 2048}\n!18 = !{i32 0, i32 2048}\n!19 = !{!20, !20, i64 0}\n!20 = !{!\"omnipotent char\", !21, i64 0}\n!21 = !{!\"Simple C++ TBAA\"}\n!22 = distinct !{!22, !23}\n!23 = !{!\"llvm.loop.mustprogress\"}\n!24 = distinct !{!24, !23}\n----\n++++\n\u003c/code\u003e\u003c/pre\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eOpenCL / SPIR-V\u003c/summary\u003e\n  Note that the compiler would usually directly output a \u003ca href=\"https://github.com/a2flo/floor/blob/master/etc/example/nbody_cl.spv\"\u003e.spv file\u003c/a\u003e. The output below comes from disassembling it with \u003ccode\u003espirv-dis\u003c/code\u003e (provided by the \u003ca href=\"#computegraphics-toolchain\"\u003etoolchain\u003c/a\u003e).\n  Also note that the output below has been generated with extended readability (--debug-asm).\n\n++++\n[source,LLVM]\n----\n; SPIR-V\n; Version: 1.0\n; Generator: Khronos LLVM/SPIR-V Translator; 14\n; Bound: 153\n; Schema: 0\n                                         Capability Addresses\n                                         Capability Linkage\n                                         Capability Kernel\n                                         Capability Int64\n                                    %1 = ExtInstImport \"OpenCL.std\"\n                                         MemoryModel Physical64 OpenCL\n                                         EntryPoint Kernel %simplified_nbody \"simplified_nbody\" %__spirv_BuiltInGlobalInvocationId %__spirv_BuiltInGlobalSize %__spirv_BuiltInLocalInvocationId\n                                         ExecutionMode %simplified_nbody LocalSize 256 1 1\n                                         SourceExtension \"cl_khr_byte_addressable_store\"\n                                         SourceExtension \"cl_khr_fp16\"\n                                         SourceExtension \"cl_khr_fp64\"\n                                         SourceExtension \"cl_khr_gl_msaa_sharing\"\n                                         SourceExtension \"cl_khr_global_int32_base_atomics\"\n                                         SourceExtension \"cl_khr_global_int32_extended_atomics\"\n                                         SourceExtension \"cl_khr_local_int32_base_atomics\"\n                                         SourceExtension \"cl_khr_local_int32_extended_atomics\"\n                                         Source OpenCL_C 102000\n                                         Decorate %simplified_nbody.local_body_positions Alignment 4\n                                         Decorate %19 FuncParamAttr NoAlias\n                                         Decorate %19 FuncParamAttr NoCapture\n                                         Decorate %19 FuncParamAttr NoWrite\n                                         Decorate %20 FuncParamAttr NoAlias\n                                         Decorate %20 FuncParamAttr NoCapture\n                                         Decorate %21 FuncParamAttr NoAlias\n                                         Decorate %21 FuncParamAttr NoCapture\n                                         Decorate %__spirv_BuiltInGlobalInvocationId LinkageAttributes \"__spirv_BuiltInGlobalInvocationId\" Import\n                                         Decorate %__spirv_BuiltInGlobalInvocationId Constant\n                                         Decorate %__spirv_BuiltInGlobalInvocationId BuiltIn GlobalInvocationId\n                                         Decorate %__spirv_BuiltInGlobalSize LinkageAttributes \"__spirv_BuiltInGlobalSize\" Import\n                                         Decorate %__spirv_BuiltInGlobalSize Constant\n                                         Decorate %__spirv_BuiltInGlobalSize BuiltIn GlobalSize\n                                         Decorate %__spirv_BuiltInLocalInvocationId LinkageAttributes \"__spirv_BuiltInLocalInvocationId\" Import\n                                         Decorate %__spirv_BuiltInLocalInvocationId Constant\n                                         Decorate %__spirv_BuiltInLocalInvocationId BuiltIn LocalInvocationId\n                                         Decorate %70 FPFastMathMode Fast\n                                         Decorate %72 FPFastMathMode Fast\n                                         Decorate %74 FPFastMathMode Fast\n                                         Decorate %101 FPFastMathMode Fast\n                                         Decorate %102 FPFastMathMode Fast\n                                         Decorate %103 FPFastMathMode Fast\n                                         Decorate %104 FPFastMathMode Fast\n                                         Decorate %105 FPFastMathMode Fast\n                                         Decorate %106 FPFastMathMode Fast\n                                         Decorate %108 FPFastMathMode Fast\n                                         Decorate %109 FPFastMathMode Fast\n                                         Decorate %110 FPFastMathMode Fast\n                                         Decorate %114 FPFastMathMode Fast\n                                         Decorate %115 FPFastMathMode Fast\n                                         Decorate %116 FPFastMathMode Fast\n                                         Decorate %117 FPFastMathMode Fast\n                                         Decorate %118 FPFastMathMode Fast\n                                         Decorate %119 FPFastMathMode Fast\n                                         Decorate %131 FPFastMathMode Fast\n                                         Decorate %132 FPFastMathMode Fast\n                                         Decorate %133 FPFastMathMode Fast\n                                         Decorate %134 FPFastMathMode Fast\n                                         Decorate %135 FPFastMathMode Fast\n                                         Decorate %136 FPFastMathMode Fast\n                                         Decorate %138 FPFastMathMode Fast\n                                         Decorate %139 FPFastMathMode Fast\n                                         Decorate %140 FPFastMathMode Fast\n                                         Decorate %141 FPFastMathMode Fast\n                                         Decorate %142 FPFastMathMode Fast\n                                         Decorate %143 FPFastMathMode Fast\n                                         Decorate %146 FPFastMathMode Fast\n                                         Decorate %149 FPFastMathMode Fast\n                                         Decorate %152 FPFastMathMode Fast\n                                %ulong = TypeInt 64 0\n                                 %uint = TypeInt 32 0\n                                %256ul = Constant %ulong 256\n                                   %0u = Constant %uint 0\n                                   %1u = Constant %uint 1\n                                   %2u = Constant %uint 2\n                                  %0ul = Constant %ulong 0\n                                   %3u = Constant %uint 3\n                                   %8u = Constant %uint 8\n                                 %272u = Constant %uint 272\n                                %0ul_0 = Constant %ulong 0\n                                  %1ul = Constant %ulong 1\n                                 %256u = Constant %uint 256\n                                %float = TypeFloat 32\n                          %struct.anon = TypeStruct %float %float %float %float\n                           %union.anon = TypeStruct %struct.anon\n                        %class.vector4 = TypeStruct %union.anon\n                 %class.vector4[256ul] = TypeArray %class.vector4 %256ul\n     %(Workgroup)class.vector4[256ul]* = TypePointer Workgroup %class.vector4[256ul]\n                                 %void = TypeVoid\n       %(CrossWorkgroup)class.vector4* = TypePointer CrossWorkgroup %class.vector4\n                        %struct.anon.9 = TypeStruct %float %float %float\n                         %union.anon.8 = TypeStruct %struct.anon.9\n                        %class.vector3 = TypeStruct %union.anon.8\n       %(CrossWorkgroup)class.vector3* = TypePointer CrossWorkgroup %class.vector3\n                             %void(#4) = TypeFunction %void %(CrossWorkgroup)class.vector4* %(CrossWorkgroup)class.vector4* %(CrossWorkgroup)class.vector3* %float\n                            %\u003c3xulong\u003e = TypeVector %ulong 3\n                    %(Input)\u003c3xulong\u003e* = TypePointer Input %\u003c3xulong\u003e\n               %(CrossWorkgroup)float* = TypePointer CrossWorkgroup %float\n                    %(Workgroup)float* = TypePointer Workgroup %float\n                                 %bool = TypeBool\n%simplified_nbody.local_body_positions = Variable %(Workgroup)class.vector4[256ul]* Workgroup\n    %__spirv_BuiltInGlobalInvocationId = Variable %(Input)\u003c3xulong\u003e* Input\n            %__spirv_BuiltInGlobalSize = Variable %(Input)\u003c3xulong\u003e* Input\n     %__spirv_BuiltInLocalInvocationId = Variable %(Input)\u003c3xulong\u003e* Input\n                                 %0.0f = Constant %float 0\n                      %9.99999975e-05f = Constant %float 9.99999975e-05\n                         %0.999000013f = Constant %float 0.999000013\n\nfunction void simplified_nbody ( %void(#4) ) {\n                                   %19 = FunctionParameter %(CrossWorkgroup)class.vector4*\n                                   %20 = FunctionParameter %(CrossWorkgroup)class.vector4*\n                                   %21 = FunctionParameter %(CrossWorkgroup)class.vector3*\n                                   %22 = FunctionParameter %float\n23:\n                                   %31 = Load %\u003c3xulong\u003e %__spirv_BuiltInGlobalInvocationId Aligned 32\n                                   %32 = CompositeExtract %ulong %31 0\n                                   %36 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %19 %32 %0u %0u %0u\n                                   %37 = Load %float %36 Aligned 4\n                                   %39 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %19 %32 %0u %0u %1u\n                                   %40 = Load %float %39 Aligned 4\n                                   %42 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %19 %32 %0u %0u %2u\n                                   %43 = Load %float %42 Aligned 4\n                                   %44 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %21 %32 %0u %0u %0u\n                                   %45 = Load %float %44 Aligned 4\n                                   %46 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %21 %32 %0u %0u %1u\n                                   %47 = Load %float %46 Aligned 4\n                                   %48 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %21 %32 %0u %0u %2u\n                                   %49 = Load %float %48 Aligned 4\n                                   %51 = Load %\u003c3xulong\u003e %__spirv_BuiltInGlobalSize Aligned 32\n                                   %52 = CompositeExtract %ulong %51 0\n                                   %53 = UConvert %uint %52\n                                   %55 = Load %\u003c3xulong\u003e %__spirv_BuiltInLocalInvocationId Aligned 32\n                                   %56 = CompositeExtract %ulong %55 0\n                                   %57 = UConvert %uint %56\n                                   %60 = InBoundsPtrAccessChain %(Workgroup)float* %simplified_nbody.local_body_positions %0ul %56 %0u %0u %0u\n                                   %61 = InBoundsPtrAccessChain %(Workgroup)float* %simplified_nbody.local_body_positions %0ul %56 %0u %0u %1u\n                                   %62 = InBoundsPtrAccessChain %(Workgroup)float* %simplified_nbody.local_body_positions %0ul %56 %0u %0u %2u\n                                   %64 = InBoundsPtrAccessChain %(Workgroup)float* %simplified_nbody.local_body_positions %0ul %56 %0u %0u %3u\n                                         Branch %24\n\n24:\n                                   %66 = Phi %uint ( %65 \u003c- %26, %0u \u003c- %23 )\n                                   %68 = Phi %uint ( %67 \u003c- %26, %0u \u003c- %23 )\n                                   %71 = Phi %float ( %0.0f \u003c- %23, %70 \u003c- %26 )\n                                   %73 = Phi %float ( %0.0f \u003c- %23, %72 \u003c- %26 )\n                                   %75 = Phi %float ( %0.0f \u003c- %23, %74 \u003c- %26 )\n                                   %77 = ShiftLeftLogical %uint %68 %8u\n                                   %78 = IAdd %uint %77 %57\n                                   %79 = UConvert %ulong %78\n                                   %80 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %19 %79 %0u %0u %0u\n                                   %81 = Load %float %80 Aligned 4\n                                   %82 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %19 %79 %0u %0u %1u\n                                   %83 = Load %float %82 Aligned 4\n                                   %84 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %19 %79 %0u %0u %2u\n                                   %85 = Load %float %84 Aligned 4\n                                   %86 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %19 %79 %0u %0u %3u\n                                   %87 = Load %float %86 Aligned 4\n                                         Store %60 %81 Aligned 4\n                                         Store %61 %83 Aligned 4\n                                         Store %62 %85 Aligned 4\n                                         Store %64 %87 Aligned 4\n                                         ControlBarrier %2u %2u %272u\n                                         Branch %25\n\n25:\n                                   %91 = Phi %ulong ( %89 \u003c- %25, %0ul_0 \u003c- %24 )\n                                   %92 = Phi %float ( %71 \u003c- %24, %70 \u003c- %25 )\n                                   %93 = Phi %float ( %73 \u003c- %24, %72 \u003c- %25 )\n                                   %94 = Phi %float ( %75 \u003c- %24, %74 \u003c- %25 )\n                                   %95 = InBoundsPtrAccessChain %(Workgroup)float* %simplified_nbody.local_body_positions %0ul %91 %0u %0u %0u\n                                   %96 = Load %float %95 Aligned 4\n                                   %97 = InBoundsPtrAccessChain %(Workgroup)float* %simplified_nbody.local_body_positions %0ul %91 %0u %0u %1u\n                                   %98 = Load %float %97 Aligned 4\n                                   %99 = InBoundsPtrAccessChain %(Workgroup)float* %simplified_nbody.local_body_positions %0ul %91 %0u %0u %2u\n                                  %100 = Load %float %99 Aligned 4\n                                  %101 = FSub %float %96 %37\n                                  %102 = FSub %float %98 %40\n                                  %103 = FSub %float %100 %43\n                                  %104 = FMul %float %101 %101\n                                  %105 = FMul %float %102 %102\n                                  %106 = FMul %float %103 %103\n                                  %108 = FAdd %float %104 %9.99999975e-05f\n                                  %109 = FAdd %float %108 %105\n                                  %110 = FAdd %float %109 %106\n                                  %111 = ExtInst %float %1 rsqrt %110\n                                  %112 = InBoundsPtrAccessChain %(Workgroup)float* %simplified_nbody.local_body_positions %0ul %91 %0u %0u %3u\n                                  %113 = Load %float %112 Aligned 4\n                                  %114 = FMul %float %111 %111\n                                  %115 = FMul %float %114 %111\n                                  %116 = FMul %float %115 %113\n                                  %117 = FMul %float %116 %101\n                                  %118 = FMul %float %116 %102\n                                  %119 = FMul %float %116 %103\n                                   %74 = FAdd %float %117 %94\n                                   %72 = FAdd %float %118 %93\n                                   %70 = FAdd %float %119 %92\n                                   %89 = IAdd %ulong %91 %1ul\n                                  %126 = IEqual %bool %89 %256ul\n                                         BranchConditional %126 %26 %25\n\n26:\n                                         ControlBarrier %2u %2u %272u\n                                   %65 = IAdd %uint %66 %256u\n                                   %67 = IAdd %uint %68 %1u\n                                  %130 = ULessThan %bool %65 %53\n                                         BranchConditional %130 %24 %27\n\n27:\n                                  %131 = FMul %float %74 %22\n                                  %132 = FMul %float %72 %22\n                                  %133 = FMul %float %70 %22\n                                  %134 = FAdd %float %131 %45\n                                  %135 = FAdd %float %132 %47\n                                  %136 = FAdd %float %133 %49\n                                  %138 = FMul %float %134 %0.999000013f\n                                  %139 = FMul %float %135 %0.999000013f\n                                  %140 = FMul %float %136 %0.999000013f\n                                  %141 = FMul %float %138 %22\n                                  %142 = FMul %float %139 %22\n                                  %143 = FMul %float %140 %22\n                                  %144 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %20 %32 %0u %0u %0u\n                                  %145 = Load %float %144 Aligned 4\n                                  %146 = FAdd %float %145 %141\n                                         Store %144 %146 Aligned 4\n                                  %147 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %20 %32 %0u %0u %1u\n                                  %148 = Load %float %147 Aligned 4\n                                  %149 = FAdd %float %148 %142\n                                         Store %147 %149 Aligned 4\n                                  %150 = InBoundsPtrAccessChain %(CrossWorkgroup)float* %20 %32 %0u %0u %2u\n                                  %151 = Load %float %150 Aligned 4\n                                  %152 = FAdd %float %151 %143\n                                         Store %150 %152 Aligned 4\n                                         Store %44 %138 Aligned 4\n                                         Store %46 %139 Aligned 4\n                                         Store %48 %140 Aligned 4\n                                         Return\n}\n\n----\n++++\n\u003c/code\u003e\u003c/pre\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eVulkan / SPIR-V\u003c/summary\u003e\n  Note that the compiler would usually directly output a \u003ca href=\"https://github.com/a2flo/floor/blob/master/etc/example/nbody_vk.spvc\"\u003e.spvc file\u003c/a\u003e (a \u003ca href=\"https://github.com/a2flo/floor/blob/master/compute/spirv_handler.hpp#L30\"\u003esimple container format\u003c/a\u003e for multiple SPIR-V binaries). The output below comes from disassembling it with \u003ccode\u003espirv-dis\u003c/code\u003e (provided by the \u003ca href=\"#computegraphics-toolchain\"\u003etoolchain\u003c/a\u003e).\n  Also note that the output below has been generated with extended readability (--debug-asm).\n  \n++++\n[source,LLVM]\n----\n; SPIR-V\n; Version: 1.6\n; Generator: Khronos LLVM/SPIR-V Translator; 14\n; Bound: 210\n; Schema: 0\n                                                Capability Matrix\n                                                Capability Shader\n                                                Capability Int64\n                                                Capability GroupNonUniform\n                                                Capability VariablePointersStorageBuffer\n                                                Capability VariablePointers\n                                                Capability ShaderNonUniform\n                                                Capability UniformBufferArrayNonUniformIndexing\n                                                Capability SampledImageArrayNonUniformIndexing\n                                                Capability StorageBufferArrayNonUniformIndexing\n                                                Capability StorageImageArrayNonUniformIndexing\n                                                Capability VulkanMemoryModel\n                                                Capability VulkanMemoryModelDeviceScope\n                                                Capability PhysicalStorageBufferAddresses\n                                           %1 = ExtInstImport \"GLSL.std.450\"\n                                                MemoryModel PhysicalStorageBuffer64 Vulkan\n                                                EntryPoint GLCompute %simplified_nbody \"simplified_nbody\" %simplified_nbody.vulkan_uniform. %simplified_nbody.vulkan_uniform..1 %simplified_nbody.vulkan_uniform..2 %simplified_nbody.vulkan_uniform..3 %simplified_nbody.vulkan_builtin_input. %simplified_nbody.vulkan_builtin_input..4 %simplified_nbody.vulkan_builtin_input..5 %simplified_nbody.vulkan_builtin_input..6 %simplified_nbody.vulkan_builtin_input..7 %simplified_nbody.vulkan_builtin_input..8 %vulkan.immutable_sampler_0 %vulkan.immutable_sampler_1 %vulkan.immutable_sampler_2 %vulkan.immutable_sampler_3 %vulkan.immutable_sampler_4 %vulkan.immutable_sampler_5 %vulkan.immutable_sampler_6 %vulkan.immutable_sampler_7 %vulkan.immutable_sampler_8 %vulkan.immutable_sampler_9 %vulkan.immutable_sampler_10 %vulkan.immutable_sampler_11 %vulkan.immutable_sampler_12 %vulkan.immutable_sampler_13 %vulkan.immutable_sampler_14 %vulkan.immutable_sampler_15 %vulkan.immutable_sampler_16 %vulkan.immutable_sampler_17 %vulkan.immutable_sampler_18 %vulkan.immutable_sampler_19 %vulkan.immutable_sampler_20 %vulkan.immutable_sampler_21 %vulkan.immutable_sampler_22 %vulkan.immutable_sampler_23 %vulkan.immutable_sampler_24 %vulkan.immutable_sampler_25 %vulkan.immutable_sampler_26 %vulkan.immutable_sampler_27 %vulkan.immutable_sampler_28 %vulkan.immutable_sampler_29 %vulkan.immutable_sampler_30 %vulkan.immutable_sampler_31 %vulkan.immutable_sampler_32 %vulkan.immutable_sampler_33 %vulkan.immutable_sampler_34 %vulkan.immutable_sampler_35 %vulkan.immutable_sampler_36 %vulkan.immutable_sampler_37 %vulkan.immutable_sampler_38 %vulkan.immutable_sampler_39 %vulkan.immutable_sampler_40 %vulkan.immutable_sampler_41 %vulkan.immutable_sampler_42 %vulkan.immutable_sampler_43 %vulkan.immutable_sampler_44 %vulkan.immutable_sampler_45 %vulkan.immutable_sampler_46 %vulkan.immutable_sampler_47 %_ZZ16simplified_nbodyE20local_body_positions\n                                                ExecutionMode %simplified_nbody LocalSize 256 1 1\n                                                SourceExtension \"vk_capability_int16\"\n                                                SourceExtension \"vk_capability_int64\"\n                                                SourceExtension \"vk_capability_multiview\"\n                                                Source GLSL 450\n                                                Decorate %vulkan.immutable_sampler_0 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_0 Binding 0\n                                                Decorate %vulkan.immutable_sampler_1 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_1 Binding 1\n                                                Decorate %vulkan.immutable_sampler_2 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_2 Binding 2\n                                                Decorate %vulkan.immutable_sampler_3 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_3 Binding 3\n                                                Decorate %vulkan.immutable_sampler_4 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_4 Binding 4\n                                                Decorate %vulkan.immutable_sampler_5 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_5 Binding 5\n                                                Decorate %vulkan.immutable_sampler_6 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_6 Binding 6\n                                                Decorate %vulkan.immutable_sampler_7 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_7 Binding 7\n                                                Decorate %vulkan.immutable_sampler_8 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_8 Binding 8\n                                                Decorate %vulkan.immutable_sampler_9 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_9 Binding 9\n                                                Decorate %vulkan.immutable_sampler_10 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_10 Binding 10\n                                                Decorate %vulkan.immutable_sampler_11 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_11 Binding 11\n                                                Decorate %vulkan.immutable_sampler_12 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_12 Binding 12\n                                                Decorate %vulkan.immutable_sampler_13 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_13 Binding 13\n                                                Decorate %vulkan.immutable_sampler_14 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_14 Binding 14\n                                                Decorate %vulkan.immutable_sampler_15 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_15 Binding 15\n                                                Decorate %vulkan.immutable_sampler_16 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_16 Binding 16\n                                                Decorate %vulkan.immutable_sampler_17 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_17 Binding 17\n                                                Decorate %vulkan.immutable_sampler_18 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_18 Binding 18\n                                                Decorate %vulkan.immutable_sampler_19 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_19 Binding 19\n                                                Decorate %vulkan.immutable_sampler_20 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_20 Binding 20\n                                                Decorate %vulkan.immutable_sampler_21 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_21 Binding 21\n                                                Decorate %vulkan.immutable_sampler_22 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_22 Binding 22\n                                                Decorate %vulkan.immutable_sampler_23 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_23 Binding 23\n                                                Decorate %vulkan.immutable_sampler_24 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_24 Binding 24\n                                                Decorate %vulkan.immutable_sampler_25 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_25 Binding 25\n                                                Decorate %vulkan.immutable_sampler_26 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_26 Binding 26\n                                                Decorate %vulkan.immutable_sampler_27 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_27 Binding 27\n                                                Decorate %vulkan.immutable_sampler_28 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_28 Binding 28\n                                                Decorate %vulkan.immutable_sampler_29 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_29 Binding 29\n                                                Decorate %vulkan.immutable_sampler_30 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_30 Binding 30\n                                                Decorate %vulkan.immutable_sampler_31 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_31 Binding 31\n                                                Decorate %vulkan.immutable_sampler_32 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_32 Binding 32\n                                                Decorate %vulkan.immutable_sampler_33 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_33 Binding 33\n                                                Decorate %vulkan.immutable_sampler_34 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_34 Binding 34\n                                                Decorate %vulkan.immutable_sampler_35 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_35 Binding 35\n                                                Decorate %vulkan.immutable_sampler_36 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_36 Binding 36\n                                                Decorate %vulkan.immutable_sampler_37 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_37 Binding 37\n                                                Decorate %vulkan.immutable_sampler_38 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_38 Binding 38\n                                                Decorate %vulkan.immutable_sampler_39 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_39 Binding 39\n                                                Decorate %vulkan.immutable_sampler_40 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_40 Binding 40\n                                                Decorate %vulkan.immutable_sampler_41 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_41 Binding 41\n                                                Decorate %vulkan.immutable_sampler_42 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_42 Binding 42\n                                                Decorate %vulkan.immutable_sampler_43 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_43 Binding 43\n                                                Decorate %vulkan.immutable_sampler_44 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_44 Binding 44\n                                                Decorate %vulkan.immutable_sampler_45 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_45 Binding 45\n                                                Decorate %vulkan.immutable_sampler_46 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_46 Binding 46\n                                                Decorate %vulkan.immutable_sampler_47 DescriptorSet 0\n                                                Decorate %vulkan.immutable_sampler_47 Binding 47\n                                                Decorate %class.vector4[256l] ArrayStride 16\n                                                MemberDecorate %class.vector4 0 Offset 0\n                                                MemberDecorate %union.anon 0 Offset 0\n                                                MemberDecorate %struct.anon 0 Offset 0\n                                                MemberDecorate %struct.anon 1 Offset 4\n                                                MemberDecorate %struct.anon 2 Offset 8\n                                                MemberDecorate %struct.anon 3 Offset 12\n                                                Decorate %enclose.class.vector4 Block\n                                                MemberDecorate %enclose.class.vector4 0 Offset 0\n                                                Decorate %class.vector4[] ArrayStride 16\n                                                Decorate %(StorageBuffer)enclose.class.vector4* ArrayStride 16\n                                                Decorate %simplified_nbody.vulkan_uniform. NonWritable\n                                                Decorate %simplified_nbody.vulkan_uniform. DescriptorSet 1\n                                                Decorate %simplified_nbody.vulkan_uniform. Binding 0\n                                                Decorate %enclose.class.vector4_0 Block\n                                                MemberDecorate %enclose.class.vector4_0 0 Offset 0\n                                                Decorate %class.vector4[]_0 ArrayStride 16\n                                                Decorate %(StorageBuffer)enclose.class.vector4_0* ArrayStride 16\n                                                Decorate %simplified_nbody.vulkan_uniform..1 DescriptorSet 1\n                                                Decorate %simplified_nbody.vulkan_uniform..1 Binding 1\n                                                Decorate %enclose.class.vector3 Block\n                                                MemberDecorate %enclose.class.vector3 0 Offset 0\n                                                Decorate %class.vector3[] ArrayStride 12\n                                                Decorate %(StorageBuffer)enclose.class.vector3* ArrayStride 12\n                                                MemberDecorate %class.vector3 0 Offset 0\n                                                MemberDecorate %union.anon.8 0 Offset 0\n                                                MemberDecorate %struct.anon.9 0 Offset 0\n                                                MemberDecorate %struct.anon.9 1 Offset 4\n                                                MemberDecorate %struct.anon.9 2 Offset 8\n                                                Decorate %simplified_nbody.vulkan_uniform..2 DescriptorSet 1\n                                                Decorate %simplified_nbody.vulkan_uniform..2 Binding 2\n                                                Decorate %enclose. Block\n                                                MemberDecorate %enclose. 0 Offset 0\n                                                Decorate %simplified_nbody.vulkan_uniform..3 NonWritable\n                                                Decorate %simplified_nbody.vulkan_uniform..3 Uniform\n                                                Decorate %simplified_nbody.vulkan_uniform..3 DescriptorSet 1\n                                                Decorate %simplified_nbody.vulkan_uniform..3 Binding 3\n                                                Decorate %simplified_nbody.vulkan_builtin_input. BuiltIn WorkgroupId\n                                                Decorate %simplified_nbody.vulkan_builtin_input..4 BuiltIn NumWorkgroups\n                                                Decorate %simplified_nbody.vulkan_builtin_input..5 BuiltIn SubgroupId\n                                                Decorate %simplified_nbody.vulkan_builtin_input..6 BuiltIn SubgroupLocalInvocationId\n                                                Decorate %simplified_nbody.vulkan_builtin_input..7 BuiltIn SubgroupSize\n                                                Decorate %simplified_nbody.vulkan_builtin_input..8 BuiltIn NumSubgroups\n                                                Decorate %(Workgroup)class.vector4[256l]* ArrayStride 4096\n                                                Decorate %155 NoSignedWrap\n                                                Decorate %155 NoUnsignedWrap\n                                       %ilong = TypeInt 64 1\n                                        %iint = TypeInt 32 1\n                                        %256l = Constant %ilong 256\n                                          %8i = Constant %iint 8\n                                          %0i = Constant %iint 0\n                                          %1i = Constant %iint 1\n                                          %2i = Constant %iint 2\n                                          %3i = Constant %iint 3\n                                       %2504i = Constant %iint 2504\n                                          %0l = Constant %ilong 0\n                                          %1l = Constant %ilong 1\n                                        %256i = Constant %iint 256\n                                     %Sampler = TypeSampler\n                   %(UniformConstant)Sampler* = TypePointer UniformConstant %Sampler\n                                       %float = TypeFloat 32\n                                 %struct.anon = TypeStruct %float %float %float %float\n                                  %union.anon = TypeStruct %struct.anon\n                               %class.vector4 = TypeStruct %union.anon\n                         %class.vector4[256l] = TypeArray %class.vector4 %256l\n             %(Workgroup)class.vector4[256l]* = TypePointer Workgroup %class.vector4[256l]\n                                        %void = TypeVoid\n                                      %void() = TypeFunction %void\n                             %class.vector4[] = TypeRuntimeArray %class.vector4\n                       %enclose.class.vector4 = TypeStruct %class.vector4[]\n       %(StorageBuffer)enclose.class.vector4* = TypePointer StorageBuffer %enclose.class.vector4\n                           %class.vector4[]_0 = TypeRuntimeArray %class.vector4\n                     %enclose.class.vector4_0 = TypeStruct %class.vector4[]_0\n     %(StorageBuffer)enclose.class.vector4_0* = TypePointer StorageBuffer %enclose.class.vector4_0\n                               %struct.anon.9 = TypeStruct %float %float %float\n                                %union.anon.8 = TypeStruct %struct.anon.9\n                               %class.vector3 = TypeStruct %union.anon.8\n                             %class.vector3[] = TypeRuntimeArray %class.vector3\n                       %enclose.class.vector3 = TypeStruct %class.vector3[]\n       %(StorageBuffer)enclose.class.vector3* = TypePointer StorageBuffer %enclose.class.vector3\n                                    %enclose. = TypeStruct %float\n                          %(Uniform)enclose.* = TypePointer Uniform %enclose.\n                                    %\u003c3xiint\u003e = TypeVector %iint 3\n                            %(Input)\u003c3xiint\u003e* = TypePointer Input %\u003c3xiint\u003e\n                                %(Input)iint* = TypePointer Input %iint\n                       %(StorageBuffer)float* = TypePointer StorageBuffer %float\n                           %(Workgroup)float* = TypePointer Workgroup %float\n                                        %bool = TypeBool\n                             %(Uniform)float* = TypePointer Uniform %float\n                  %vulkan.immutable_sampler_0 = Variable %(UniformConstant)Sampler* UniformConstant\n                  %vulkan.immutable_sampler_1 = Variable %(UniformConstant)Sampler* UniformConstant\n                  %vulkan.immutable_sampler_2 = Variable %(UniformConstant)Sampler* UniformConstant\n                  %vulkan.immutable_sampler_3 = Variable %(UniformConstant)Sampler* UniformConstant\n                  %vulkan.immutable_sampler_4 = Variable %(UniformConstant)Sampler* UniformConstant\n                  %vulkan.immutable_sampler_5 = Variable %(UniformConstant)Sampler* UniformConstant\n                  %vulkan.immutable_sampler_6 = Variable %(UniformConstant)Sampler* UniformConstant\n                  %vulkan.immutable_sampler_7 = Variable %(UniformConstant)Sampler* UniformConstant\n                  %vulkan.immutable_sampler_8 = Variable %(UniformConstant)Sampler* UniformConstant\n                  %vulkan.immutable_sampler_9 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_10 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_11 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_12 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_13 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_14 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_15 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_16 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_17 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_18 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_19 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_20 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_21 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_22 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_23 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_24 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_25 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_26 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_27 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_28 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_29 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_30 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_31 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_32 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_33 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_34 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_35 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_36 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_37 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_38 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_39 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_40 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_41 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_42 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_43 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_44 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_45 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_46 = Variable %(UniformConstant)Sampler* UniformConstant\n                 %vulkan.immutable_sampler_47 = Variable %(UniformConstant)Sampler* UniformConstant\n%_ZZ16simplified_nbodyE20local_body_positions = Variable %(Workgroup)class.vector4[256l]* Workgroup\n            %simplified_nbody.vulkan_uniform. = Variable %(StorageBuffer)enclose.class.vector4* StorageBuffer\n          %simplified_nbody.vulkan_uniform..1 = Variable %(StorageBuffer)enclose.class.vector4_0* StorageBuffer\n          %simplified_nbody.vulkan_uniform..2 = Variable %(StorageBuffer)enclose.class.vector3* StorageBuffer\n          %simplified_nbody.vulkan_uniform..3 = Variable %(Uniform)enclose.* Uniform\n      %simplified_nbody.vulkan_builtin_input. = Variable %(Input)\u003c3xiint\u003e* Input\n    %simplified_nbody.vulkan_builtin_input..4 = Variable %(Input)\u003c3xiint\u003e* Input\n    %simplified_nbody.vulkan_builtin_input..5 = Variable %(Input)iint* Input\n    %simplified_nbody.vulkan_builtin_input..6 = Variable %(Input)iint* Input\n    %simplified_nbody.vulkan_builtin_input..7 = Variable %(Input)iint* Input\n    %simplified_nbody.vulkan_builtin_input..8 = Variable %(Input)iint* Input\n                                        %0.0f = Constant %float 0\n                             %9.99999975e-05f = Constant %float 9.99999975e-05\n                                %0.999000013f = Constant %float 0.999000013\n\nfunction void simplified_nbody ( %void() ) {\n92:\n                                          %98 = Load %\u003c3xiint\u003e %simplified_nbody.vulkan_builtin_input. Aligned 16\n                                          %99 = CompositeExtract %iint %98 0\n                                         %101 = ShiftLeftLogical %iint %99 %8i\n                                         %102 = Load %iint %simplified_nbody.vulkan_builtin_input..6 Aligned 4\n                                         %103 = Load %iint %simplified_nbody.vulkan_builtin_input..5 Aligned 4\n                                         %104 = Load %iint %simplified_nbody.vulkan_builtin_input..7 Aligned 4\n                                         %105 = IMul %iint %103 %104\n                                         %106 = IAdd %iint %105 %102\n                                         %107 = IAdd %iint %101 %106\n                                         %108 = Load %\u003c3xiint\u003e %simplified_nbody.vulkan_builtin_input..4 Aligned 16\n                                         %109 = CompositeExtract %iint %108 0\n                                         %110 = ShiftLeftLogical %iint %109 %8i\n                                         %113 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform. %0i %0i %107 %0i %0i %0i\n                                         %115 = Load %float %113 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                         %116 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform. %0i %0i %107 %0i %0i %1i\n                                         %117 = Load %float %116 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                         %119 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform. %0i %0i %107 %0i %0i %2i\n                                         %120 = Load %float %119 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                         %121 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform..2 %0i %0i %107 %0i %0i %0i\n                                         %122 = Load %float %121 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                         %123 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform..2 %0i %0i %107 %0i %0i %1i\n                                         %124 = Load %float %123 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                         %125 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform..2 %0i %0i %107 %0i %0i %2i\n                                         %126 = Load %float %125 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                         %128 = PtrAccessChain %(Workgroup)float* %_ZZ16simplified_nbodyE20local_body_positions %0i %106 %0i %0i %0i\n                                         %129 = PtrAccessChain %(Workgroup)float* %_ZZ16simplified_nbodyE20local_body_positions %0i %106 %0i %0i %1i\n                                         %130 = PtrAccessChain %(Workgroup)float* %_ZZ16simplified_nbodyE20local_body_positions %0i %106 %0i %0i %2i\n                                         %132 = PtrAccessChain %(Workgroup)float* %_ZZ16simplified_nbodyE20local_body_positions %0i %106 %0i %0i %3i\n                                                Branch %93\n\n93:\n                                         %134 = Phi %iint ( %133 \u003c- %96, %0i \u003c- %92 )\n                                         %136 = Phi %iint ( %135 \u003c- %96, %0i \u003c- %92 )\n                                         %139 = Phi %float ( %0.0f \u003c- %92, %138 \u003c- %96 )\n                                         %141 = Phi %float ( %0.0f \u003c- %92, %140 \u003c- %96 )\n                                         %143 = Phi %float ( %0.0f \u003c- %92, %142 \u003c- %96 )\n                                         %144 = ShiftLeftLogical %iint %136 %8i\n                                         %145 = IAdd %iint %106 %144\n                                         %146 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform. %0i %0i %145 %0i %0i %0i\n                                         %147 = Load %float %146 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                         %148 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform. %0i %0i %145 %0i %0i %1i\n                                         %149 = Load %float %148 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                         %150 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform. %0i %0i %145 %0i %0i %2i\n                                         %151 = Load %float %150 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                         %152 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform. %0i %0i %145 %0i %0i %3i\n                                         %153 = Load %float %152 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                                Store %128 %147 Aligned 4\n                                                Store %129 %149 Aligned 4\n                                                Store %130 %151 Aligned 4\n                                                Store %132 %153 Aligned 4\n                                                ControlBarrier %2i %2i %2504i\n                                                LoopMerge %97 %96 None\n                                                Branch %94\n\n94:\n                                         %157 = Phi %ilong ( %155 \u003c- %94, %0l \u003c- %93 )\n                                         %158 = Phi %float ( %139 \u003c- %93, %138 \u003c- %94 )\n                                         %159 = Phi %float ( %141 \u003c- %93, %140 \u003c- %94 )\n                                         %160 = Phi %float ( %143 \u003c- %93, %142 \u003c- %94 )\n                                         %161 = PtrAccessChain %(Workgroup)float* %_ZZ16simplified_nbodyE20local_body_positions %0i %157 %0i %0i %0i\n                                         %162 = Load %float %161 Aligned 4\n                                         %163 = PtrAccessChain %(Workgroup)float* %_ZZ16simplified_nbodyE20local_body_positions %0i %157 %0i %0i %1i\n                                         %164 = Load %float %163 Aligned 4\n                                         %165 = PtrAccessChain %(Workgroup)float* %_ZZ16simplified_nbodyE20local_body_positions %0i %157 %0i %0i %2i\n                                         %166 = Load %float %165 Aligned 4\n                                         %167 = FSub %float %162 %115\n                                         %168 = FSub %float %164 %117\n                                         %169 = FSub %float %166 %120\n                                         %171 = ExtInst %float %1 Fma %167 %167 %9.99999975e-05f\n                                         %172 = ExtInst %float %1 Fma %168 %168 %171\n                                         %173 = ExtInst %float %1 Fma %169 %169 %172\n                                         %174 = ExtInst %float %1 InverseSqrt %173\n                                         %175 = PtrAccessChain %(Workgroup)float* %_ZZ16simplified_nbodyE20local_body_positions %0i %157 %0i %0i %3i\n                                         %176 = Load %float %175 Aligned 4\n                                         %177 = FMul %float %174 %174\n                                         %178 = FMul %float %177 %174\n                                         %179 = FMul %float %178 %176\n                                         %142 = ExtInst %float %1 Fma %179 %167 %160\n                                         %140 = ExtInst %float %1 Fma %179 %168 %159\n                                         %138 = ExtInst %float %1 Fma %179 %169 %158\n                                         %155 = IAdd %ilong %157 %1l\n                                         %186 = IEqual %bool %155 %256l\n                                                LoopMerge %95 %94 None\n                                                BranchConditional %186 %95 %94\n\n95:\n                                                Branch %96\n\n96:\n                                                ControlBarrier %2i %2i %2504i\n                                         %133 = IAdd %iint %134 %256i\n                                         %135 = IAdd %iint %136 %1i\n                                         %190 = ULessThan %bool %133 %110\n                                                BranchConditional %190 %93 %97\n\n97:\n                                         %192 = InBoundsAccessChain %(Uniform)float* %simplified_nbody.vulkan_uniform..3 %0i\n                                         %193 = Load %float %192 Aligned 4\n                                         %194 = ExtInst %float %1 Fma %193 %142 %122\n                                         %195 = ExtInst %float %1 Fma %193 %140 %124\n                                         %196 = ExtInst %float %1 Fma %193 %138 %126\n                                         %198 = FMul %float %194 %0.999000013f\n                                         %199 = FMul %float %195 %0.999000013f\n                                         %200 = FMul %float %196 %0.999000013f\n                                         %201 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform..1 %0i %0i %107 %0i %0i %0i\n                                         %202 = Load %float %201 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                         %203 = ExtInst %float %1 Fma %198 %193 %202\n                                                Store %201 %203 Aligned|MakePointerAvailable|NonPrivatePointer 4 %1i\n                                         %204 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform..1 %0i %0i %107 %0i %0i %1i\n                                         %205 = Load %float %204 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                         %206 = ExtInst %float %1 Fma %199 %193 %205\n                                                Store %204 %206 Aligned|MakePointerAvailable|NonPrivatePointer 4 %1i\n                                         %207 = PtrAccessChain %(StorageBuffer)float* %simplified_nbody.vulkan_uniform..1 %0i %0i %107 %0i %0i %2i\n                                         %208 = Load %float %207 Aligned|MakePointerVisible|NonPrivatePointer 4 %1i\n                                         %209 = ExtInst %float %1 Fma %200 %193 %208\n                                                Store %207 %209 Aligned|MakePointerAvailable|NonPrivatePointer 4 %1i\n                                                Store %121 %198 Aligned|MakePointerAvailable|NonPrivatePointer 4 %1i\n                                                Store %123 %199 Aligned|MakePointerAvailable|NonPrivatePointer 4 %1i\n                                                Store %125 %200 Aligned|MakePointerAvailable|NonPrivatePointer 4 %1i\n                                                Return\n}\n----\n++++\n\u003c/code\u003e\u003c/pre\u003e\n\u003c/details\u003e\n\n++++\n\n\n== Requirements ==\n* OS:\n** only AMD64/Intel64/ARM64 are supported\n** Windows: 10+\n** macOS: 13.0+\n** iOS: 16.0+\n** Linux: any current x64 distribution\n** other Unix: if other requirements are met\n* compiler/toolchain:\n** Generic: link:https://clang.llvm.org[Clang] / link:https://llvm.org[LLVM] / link:https://libcxx.llvm.org[pass:[libc++]] 19.0+\n** macOS/iOS: link:https://developer.apple.com/xcode/downloads[Xcode 16.3+]\n** Windows (VS): link:https://visualstudio.microsoft.com/vs[VS2022] with provided clang/LLVM\n** Windows (MinGW): link:https://www.msys2.org[MSYS2] with Clang/LLVM/libc++ 19.0+\n* libraries and optional requirements:\n** link:https://www.libsdl.org[SDL3] 3.2.0+\n** (opt) OpenCL: requires OpenCL 1.2+ SDK and CPU/GPU drivers (link:https://software.intel.com/content/www/us/en/develop/tools/opencl-sdk.html[Intel], link:https://github.com/GPUOpen-LibrariesAndSDKs/OCL-SDK/releases[AMD])\n** (opt) CUDA: requires sm_50+/Maxwell+ GPU and CUDA 12.0+ drivers (CUDA SDK not required!)\n** (opt) Metal: requires iOS 16.0+ or macOS 13.0+, and a Metal 3.0 capable GPU\n** (opt) Host-Compute: requires just the compiler/toolchain that is stated above\n** (opt) Vulkan: requires 1.4.309+ link:https://vulkan.lunarg.com[ICD loader / headers / SDK], link:https://github.com/zeux/volk[volk] included as submodule\n** (opt) OpenVR: requires link:https://github.com/ValveSoftware/openvr[OpenVR]\n** (opt) OpenXR: requires link:https://www.khronos.org/OpenXR[OpenXR]\n\n== Build Instructions ==\n* ensure git submodules are cloned and up-to-date: `git submodule update --init --recursive`\n\n=== General / CLI ===\n* run `./build.sh` (use `./build.sh help` to get a list of all options)\n* configuration of optional parts:\n** to disable OpenCL:\n   define `FLOOR_NO_OPENCL` or `./build.sh no-opencl`\n** to disable CUDA:\n   define `FLOOR_NO_CUDA` or `./build.sh no-cuda`\n** to disable Metal (only affects macOS/iOS builds):\n   define `FLOOR_NO_METAL` or `./build.sh no-metal`\n** to disable Host-Compute:\n   define `FLOOR_NO_HOST_COMPUTE` or `./build.sh no-host-compute`\n** to disable Vulkan:\n   define `FLOOR_NO_VULKAN` or `./build.sh no-vulkan`\n** to disable OpenVR:\n   define `FLOOR_NO_OPENVR` or `./build.sh no-openvr`\n** to disable OpenXR:\n   define `FLOOR_NO_OPENXR` or `./build.sh no-openxr`\n** to build with pass:[libstdc++] (GCC 13.0+) instead of pass:[libc++]:\n   `./build.sh libstdc++`\n\n=== CMake / ninja / CLI ===\n* this is provided as an alternative to build.sh and Xcode\n* create a build folder and `cd` into it\n* run `cmake -G \"Ninja\" -S \"\u003cpath-to-libfloor\u003e\" \u003coptions\u003e`\n* options:\n** to build a static library instead of a shared/dynamic one: `-DBUILD_SHARED_LIBS=OFF`\n** to explicitly use libc++: `-DWITH_LIBCXX=ON`\n** to build with address sanitizer: `-DWITH_ASAN=ON`\n* run `ninja`\n\n=== Xcode (macOS / iOS) ===\n* open `floor.xcodeproj` and build\n* some notes:\n** almost all optional parts of floor are enabled here and you'll have to install all dependencies or disable them manually\n** link:https://brew.sh[Homebrew] is the recommended way to install additional dependencies: +\n`+/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"+`\n** (opt) download link:https://github.com/ValveSoftware/openvr/releases[OpenVR] and manually install it:\n*** `mkdir -p {/usr/local/include/openvr,/usr/local/lib}`\n*** `cp openvr/headers/* /usr/local/include/openvr/`\n*** `cp openvr/bin/osx32/libopenvr_api.dylib /usr/local/lib/`\n** command line tools might be necessary, install them with: `xcode-select --install`\n** on iOS, either copy dependencies into your iPhoneOS and iPhoneSimulator SDK, or `floor/ios/deps/{include,lib}`\n** iOS linker flags for a depending project: `-lSDL3 -lfloor`\n\n=== Visual Studio (Windows / CMake / vcpkg) ===\n* install link:https://visualstudio.microsoft.com/vs[Visual Studio 2022]\n* in \"Workloads\" select \"Desktop development with C++\", in \"Individual components\" search for and select all clang packages\n* install and wait\n* install link:https://vulkan.lunarg.com/sdk/home[Vulkan SDK]\n* install vcpkg (somewhere, not within libfloor):\n** `git clone https://github.com/Microsoft/vcpkg.git`\n** `cd vcpkg`\n** `.\\bootstrap-vcpkg.bat -disableMetrics`\n** `.\\vcpkg integrate install`\n* install vcpkg packages:\n** `.\\vcpkg --triplet x64-windows install sdl3 OpenCL vulkan openvr openxr-loader`\n* add a user (or system) environment variable `VCPKG_ROOT` that points to the vcpkg folder\n* in Visual Studio: Tools -\u003e Options -\u003e search for vcpkg and set the custom vcpkg.exe path\n* in Visual Studio: open folder `floor` (wait a little until build files are generated)\n* select `Debug` or `Release` configuration and build\n* NOTE: all dependencies (optional parts) are enabled here\n* NOTE: having other build environments/systems in `PATH` (e.g. MSYS2/MinGW) may result in install/build issues\n\n== Installation ==\n=== Installation (Unix / macOS) ===\n* `sudo mkdir -p /opt/floor/`\n* `sudo ln -sf /path/to/floor/include /opt/floor/include`\n* `sudo ln -sf /path/to/floor/bin /opt/floor/lib`\n* alternatively: copy these files/folders there\n\n=== Installation (Windows) ===\n* create a `%%ProgramFiles%%/floor` folder (C:/Program Files/floor)\n* inside this folder:\n** create a `lib` folder\n** VS2022:\n*** copy everything from bin/ in there (dlls/lib/exp)\n** MinGW/MSYS2:\n*** copy libfloor_static.a/libfloord_static.a there\n** copy the original floor `include` folder in there (containing all floor include files)\n\n== Compute/Graphics Toolchain ==\n* automated builds for Linux, macOS and Windows can be found at: https://libfloor.org/builds/toolchain\n* NOTE: this requires a Unix environment with all LLVM build dependencies installed - use MSYS2 on Windows\n* NOTE: the absolute build path must not contain spaces\n* compile the toolchain:\n** `cd floor/etc/llvm140/ \u0026\u0026 ./build.sh`\n** if successful, package it (in addition to a .zip file, this also creates a folder with all necessary binaries and include files): `./pkg.sh`\n* install the toolchain:\n** Unix:\n*** automatic:\n**** development: run `./deploy_dev.sh` from the floor/etc/llvm140/ folder (this will create symlinks to everything in floor and floor/etc/llvm140)\n**** release: run `./deploy_pkg.sh` from inside the toolchain package folder (floor/etc/llvm140/toolchain_140006_*; this will copy everything)\n*** manual:\n**** copy the toolchain folder as `toolchain` to `/opt/floor/` (should then be `/opt/floor/toolchain/{bin,clang,libcxx}`)\n**** inside `/opt/floor/toolchain`, add a symlink to the floor `include` folder: `sudo ln -sf ../include`\n** Windows:\n*** copy the toolchain folder as `toolchain` to `%%ProgramFiles%%/floor` (should then be `%%ProgramFiles%%/floor/toolchain/{bin,clang,libcxx}`)\n*** inside `%%ProgramFiles%%/floor/toolchain`, copy the floor `include` folder from the `include` folder above it into this folder\n* NOTE: this is the expected default setup - paths can be changed inside config.json (toolchain.generic.paths)\n\n== Misc Hints ==\n* when using X11 forwarding, set these env variables:\n** `export SDL_VIDEO_X11_NODIRECTCOLOR=yes`\n* depending on how your Linux distribution handles OpenCL headers and library, you might need to manually install OpenCL 1.2+ compatible ones\n* Host-Compute device execution requires locked/pinned memory, which may be very limited in default Linux configurations (usually 64KiB)\n** libfloor will try to increase the limit to 32MiB per logical CPU core, but this may fail if the max limit is too low\n** to increase the max limit, link:https://man.archlinux.org/man/limits.conf.5[/etc/security/limits.conf] must be modified\n** as a simple workaround, add the following line to it (replace user_name with your user name) and relog:\n*** `user_name hard memlock unlimited`\n** NOTE: when using ssh, PAM must be enabled for this to apply\n* depending on your Vulkan implementation, you may also need to increase the max amount of open files (usual default is 1024 files)\n** libfloor will try to increase the limit to 256 files per logical CPU core, but this may fail if the max limit is too low\n** to increase the max limit, link:https://man.archlinux.org/man/limits.conf.5[/etc/security/limits.conf] must be modified\n** as a simple workaround, add the following line to it (replace user_name with your user name) and relog:\n*** `user_name hard nofile unlimited`\n** NOTE: when using ssh, PAM must be enabled for this to apply\n\n== Projects and Examples using libfloor ==\n* link:https://github.com/a2flo/floor_examples[floor_examples] (dnn, nbody, warp, hlbvh, path tracer, other)\n* link:https://github.com/a2flo/libwarp[libwarp] (image-space warping library)\n* obsolete: link:https://github.com/a2flo/oclraster[oclraster] (Flexible Rasterizer in OpenCL)\n* obsolete: link:https://github.com/a2flo/a2elight[a2elight] (Albion 2 Engine)\n* obsolete: link:https://github.com/a2flo/unibot[unibot] (IRC bot)\n","funding_links":[],"categories":["C++"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fa2flo%2Ffloor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fa2flo%2Ffloor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fa2flo%2Ffloor/lists"}