{"id":15903509,"url":"https://github.com/vfdev-5/interpolate-tensoriterator","last_synced_at":"2025-10-28T22:10:02.016Z","repository":{"id":54613885,"uuid":"322588822","full_name":"vfdev-5/interpolate-tensoriterator","owner":"vfdev-5","description":"Prototype Torch Interpolate with TensorIterator","archived":false,"fork":false,"pushed_at":"2022-09-29T14:14:59.000Z","size":3600,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-08T10:43:56.277Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vfdev-5.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-12-18T12:26:59.000Z","updated_at":"2022-09-29T14:15:08.000Z","dependencies_parsed_at":"2022-08-13T21:40:16.701Z","dependency_job_id":null,"html_url":"https://github.com/vfdev-5/interpolate-tensoriterator","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vfdev-5%2Finterpolate-tensoriterator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vfdev-5%2Finterpolate-tensoriterator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vfdev-5%2Finterpolate-tensoriterator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vfdev-5%2Finterpolate-tensoriterator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vfdev-5","download_url":"https://codeload.github.com/vfdev-5/interpolate-tensoriterator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246884767,"owners_count":20849554,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-06T12:02:19.743Z","updated_at":"2025-10-28T22:10:01.910Z","avatar_url":"https://github.com/vfdev-5.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Prototype Torch Interpolate with TensorIterator\n\nFMassa's code : https://github.com/fmassa/vision-1/commit/407e0430e14ca688b2fb6f03ec1122ba46527553\n\n## Goals\n\n- ND downsampling/upsampling with TensorIterator, mode: linear, nearest, cubic\n- Benchmark implementations vs original pytorch ones\n- Improve previous algorithms\n\n### Step 6: Back to basics (#5 by Francisco)\n\n- [x] Test the code with older compiler like gcc 5.4\n- [x] Inspect assembly code\n- [ ] Specialization tricks: https://github.com/pytorch/pytorch/blob/9cec8ae146c0b95b0f5dcd1c62ea4e83ee32f90c/aten/src/ATen/native/cpu/Loops.h#L387\n- [x] Explore C10_RESTRICT : https://en.wikipedia.org/wiki/Restrict\n  -  https://wiki.sei.cmu.edu/confluence/display/c/EXP43-C.+Avoid+undefined+behavior+when+using+restrict-qualified+pointers\n\n### Step 7: Generic implementation\n\nFollowing [results 16/03/2021](step_seven/results/custom_pr_1.9.0a0+git2c06596_vs_pth_1.9.0a0+gite8e570e_results.1.md)\n- [ ] Improve case: `upsample_nearest2d channels_first contiguous [32, 128, 64, 64] -\u003e (128, 128)`\n  - 6 threads `[32, 128, 64, 64] -\u003e (128, 128)  |        50420.0       |        53869.1`\n  - 1 thread  `[32, 128, 64, 64] -\u003e (128, 128)  |       195835.9       |       219061.2`\n- [ ] Improve case: `upsample_trilinear3d channels_first contiguous`:\n  - 1 thread  `[1, 3, 16, 320, 320] -\u003e [8, 256, 256]   |          5.4         |         11.5`\n  - 1 thread  `[1, 3, 16, 320, 320] -\u003e [32, 512, 512]  |        114.5         |        210.6`\n  - 6 threads `[1, 3, 16, 320, 320] -\u003e [8, 256, 256]   |          1.0         |          2.1`\n  - 6 threads `[1, 3, 16, 320, 320] -\u003e [32, 512, 512]  |         25.6         |         43.7`\n\n## Development\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\nClick here for details\n\u003c/summary\u003e\n\n\n```bash\ndocker run -it \\\n    --name=tv-interpolate \\\n    -v $PWD:/interpolate-tensoriterator \\\n    -v $PWD/../:/workspace \\\n    -w /interpolate-tensoriterator \\\n    -v /home/user/Documents/ml/pytorch/:/pytorch \\\n    --network=host --security-opt seccomp:unconfined --privileged --ipc=host \\\n    nvidia/cuda:11.1-cudnn8-devel-ubuntu20.04 \\\n    /bin/bash\n```\n```\n# Renew nvidia signing key\n# https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/\napt-key del 7fa2af80 \u0026\u0026 \\\n    rm -rf /etc/apt/sources.list.d/nvidia-ml.list \u0026\u0026 \\\n    apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub\n\napt-get update \u0026\u0026 ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime \u0026\u0026 \\\n    apt-get install -y tzdata \u0026\u0026 \\\n    dpkg-reconfigure --frontend noninteractive tzdata \u0026\u0026 \\\n    apt-get install -y git cmake python3 python3-pip numactl \u0026\u0026 \\\n    ln -s /usr/bin/python3 /usr/bin/python \u0026\u0026 \\\n    pip install numpy typing_extensions\n```\n\n\u003c/details\u003e\n\nSee [playground](playground) for details step by step.\n\n### Step 6 - Linear interpolation\n\n- Install GCC 7.3 as PyTorch Nightly\n\n```bash\necho \"deb http://archive.ubuntu.com/ubuntu/ bionic main\\n\" \u003e\u003e /etc/apt/sources.list\necho \"deb http://archive.ubuntu.com/ubuntu/ bionic universe\\n\" \u003e\u003e /etc/apt/sources.list\napt-get update\napt-get install g++-7=7.3.0-16ubuntu3 gcc-7-base=7.3.0-16ubuntu3 gcc-7=7.3.0-16ubuntu3 cpp-7=7.3.0-16ubuntu3 libgcc-7-dev=7.3.0-16ubuntu3 libstdc++-7-dev=7.3.0-16ubuntu3 libasan4=7.3.0-16ubuntu3 libubsan0=7.3.0-16ubuntu3 libcilkrts5=7.3.0-16ubuntu3\n```\n\n- Build\n\n```bash\ncd step_six \u0026\u0026 mkdir -p build \u0026\u0026 cd $_\nexport CC=/usr/bin/gcc-7\nexport CXX=/usr/bin/g++-7\nexport TORCH_PATH=/tmp/libtorch\ncmake -DTORCH_DIR=$TORCH_PATH ..\nmake\n```\n\n```bash\nmake \u0026\u0026 ./bench 20000\n```\n\n#### How to produce benchmarks:\n\nConfigure PR_TORCH_PATH for PR PyTorch build inside `run_python_pr_bench.sh`:\n```\nexport PR_TORCH_PATH=/workspace/pth-linear-interp/\n```\n\nand run:\n```\nsh run_python_pr_bench.sh\n\u003e pr_vs_pth_results.md\n```\n\n\n\n### Step 7 - Cubic/Nearest/Linear interpolations\n\n\n#### Cubic interpolation\n\n```bash\ncd step_seven/cubic \u0026\u0026 mkdir -p build \u0026\u0026 cd $_\nexport TORCH_PATH=/pytorch/torch\ncmake -DTORCH_DIR=$TORCH_PATH ..\nmake\n```\n\n```bash\nmake \u0026\u0026 ./bench\n```\n\n\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\n\nCubic 2d prelimiary results\n\n\u003c/summary\u003e\n\n```\nTorch config: PyTorch built with:\n  - GCC 9.3\n  - C++ Version: 201402\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - CPU capability usage: AVX2\n  - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/lib/ccache/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopen\nmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas\n -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-\ndeclarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wn\no-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1\n.9.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON,\n\nNum threads: 6\n\n\n---- Benchmark 2D ----\n\nInput tensor: [1, 3, 320, 320]\nInput is_contiguous memory_format torch.channels_last: false\nInput is_contiguous : true\n\n- Bench upsample_bicubic2d (750 rounds) - downsampling to 256x256\nElapsed time (ms): 6.5751\n\n- Bench ti_upsample_bicubic2d_cpu (750 rounds) - downsampling to 256x256\nElapsed time (ms): 0.415758\n\n- Bench upsample_bicubic2d (750 rounds) - upsampling to 512x512\nElapsed time (ms): 25.2327\n\n- Bench ti_upsample_bicubic2d_cpu (750 rounds) - upsampling to 512x512\nElapsed time (ms): 1.57621\n\nInput tensor: [1, 3, 320, 320]\nInput is_contiguous memory_format torch.channels_last: false\nInput is_contiguous : false\n\n- Bench upsample_bicubic2d (750 rounds) - downsampling to 256x256\nElapsed time (ms): 6.54954\n\n- Bench ti_upsample_bicubic2d_cpu (750 rounds) - downsampling to 256x256\nElapsed time (ms): 0.413038\n\n- Bench upsample_bicubic2d (750 rounds) - upsampling to 512x512\nElapsed time (ms): 25.2994\n\n- Bench ti_upsample_bicubic2d_cpu (750 rounds) - upsampling to 512x512\nElapsed time (ms): 1.50504\n\nInput tensor: [1, 3, 320, 320]\nInput is_contiguous memory_format torch.channels_last: true\nInput is_contiguous : false\n\n- Bench upsample_bicubic2d (750 rounds) - downsampling to 256x256\nElapsed time (ms): 6.58091\n\n- Bench ti_upsample_bicubic2d_cpu (750 rounds) - downsampling to 256x256\nElapsed time (ms): 0.752833\n\n- Bench upsample_bicubic2d (750 rounds) - upsampling to 512x512\nElapsed time (ms): 25.3467\n\n- Bench ti_upsample_bicubic2d_cpu (750 rounds) - upsampling to 512x512\nElapsed time (ms): 2.94774\n\n1 - Test size as in https://github.com/mingfeima/op_bench-py\n\nInput tensor: [32, 128, 64, 64]\nInput is_contiguous memory_format torch.channels_last: true\nInput is_contiguous : false\n\n- Bench upsample_bicubic2d (75 rounds) - upsampling to 128x128\nElapsed time (ms): 7296.32\n\n- Bench ti_upsample_bicubic2d_cpu (75 rounds) - upsampling to 128x128\nElapsed time (ms): 158.019\n\n2 - Test size as in https://github.com/mingfeima/op_bench-py\n\nInput tensor: [32, 128, 64, 64]\nInput is_contiguous memory_format torch.channels_last: false\nInput is_contiguous : true\n\n- Bench upsample_bicubic2d (75 rounds) - upsampling to 128x128\nElapsed time (ms): 7249.08\n\n- Bench ti_upsample_bicubic2d_cpu (75 rounds) - upsampling to 128x128\nElapsed time (ms): 158.135\n\nInput tensor: [1, 3, 500, 500]\nInput is_contiguous memory_format torch.channels_last: false\nInput is_contiguous : true\n\n- Bench upsample_bicubic2d (750 rounds) - downsampling to 256x256\nElapsed time (ms): 6.51921\n\n- Bench ti_upsample_bicubic2d_cpu (750 rounds) - downsampling to 256x256\nElapsed time (ms): 0.414213\n\n- Bench upsample_bicubic2d (750 rounds) - upsampling to 800x800\nElapsed time (ms): 61.1398\n\n- Bench ti_upsample_bicubic2d_cpu (750 rounds) - upsampling to 800x800\nElapsed time (ms): 3.62011\n\nInput tensor: [1, 3, 500, 500]\nInput is_contiguous memory_format torch.channels_last: false\nInput is_contiguous : false\n\n- Bench upsample_bicubic2d (750 rounds) - downsampling to 256x256\nElapsed time (ms): 6.6466\n\n- Bench ti_upsample_bicubic2d_cpu (750 rounds) - downsampling to 256x256\nElapsed time (ms): 0.420774\n\n- Bench upsample_bicubic2d (750 rounds) - upsampling to 800x800\nElapsed time (ms): 61.3422\n\n- Bench ti_upsample_bicubic2d_cpu (750 rounds) - upsampling to 800x800\nElapsed time (ms): 3.62022\n\n---- END Benchmark 2D ----\n```\n\n\u003c/details\u003e\n\n\n#### Result 1\n\n[PyTorch nightly (66f07c0) vs This Prototype](step_seven/pth_vs_this_full_results.log.save)\n\n\n#### Any mode / Nd implementation results\n\n[results](step_seven/results)\n\n#### Notes\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\n\n17/03/2021\n\n\u003c/summary\u003e\n\n```\n- ti_upsample_bilinear2d_cpu on channels first\n\nInput tensor: [1, 3, 320, 320]\nInput is_contiguous memory_format torch.channels_last: false\nInput is_contiguous memory_format torch.channels_last_3d: false\nInput is_contiguous : true\n\nOutput tensor: [1, 3, 256, 256]\nOutput is_contiguous memory_format torch.channels_last: false\nOutput is_contiguous memory_format torch.channels_last_3d: false\nOutput is_contiguous : true\nTI_SHOW: N=256\nTI_SHOW_STRIDES: 4 0 | 0 0 0 0 | 8 4 8 4 |\nTI_BASIC_LOOP -\u003e CHANNELS_FIRST\n```\nand\n```\n- Bench ti_upsample_nearest2d (1 rounds) - upsampling to 512x512\n\nInput tensor: [1, 3, 320, 320]\nInput is_contiguous memory_format torch.channels_last: false\nInput is_contiguous memory_format torch.channels_last_3d: false\nInput is_contiguous : true\n\nOutput tensor: [1, 3, 512, 512]\nOutput is_contiguous memory_format torch.channels_last: false\nOutput is_contiguous memory_format torch.channels_last_3d: false\nOutput is_contiguous : true\nTI_SHOW: N=512\nTI_SHOW_STRIDES: 4 0 | 0 0 | 8 4 |\nTI_BASIC_LOOP -\u003e CHANNELS_FIRST\nElapsed time (ms): 1.41033\n```\nand\n```\n- Bench ti_upsample_bicubic2d_cpu (1 rounds) - upsampling to 512x512\n\nInput tensor: [1, 3, 320, 320]\nInput is_contiguous memory_format torch.channels_last: false\nInput is_contiguous memory_format torch.channels_last_3d: false\nInput is_contiguous : true\n\nOutput tensor: [1, 3, 512, 512]\nOutput is_contiguous memory_format torch.channels_last: false\nOutput is_contiguous memory_format torch.channels_last_3d: false\nOutput is_contiguous : true\nTI_SHOW: N=512\nTI_SHOW_STRIDES: 4 0 | 0 0 0 0 0 0 0 0 | 8 4 8 4 8 4 8 4 |\nTI_BASIC_LOOP -\u003e CHANNELS_FIRST\nElapsed time (ms): 10.8974\n```\n\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\n\n18/03/2021 - loop1d vs loop2d\n\n\u003c/summary\u003e\n\n```\n# LOOP1D\nNum threads: 1\nInput tensor: [1, 3, 320, 320]\nInput is_contiguous memory_format torch.channels_last: true\nInput is_contiguous : false\n\n- Bench ti_upsample_bilinear2d (1000 rounds) - downsampling to 256x256\nElapsed time (ms): 1.28927\n\n- Bench upsample_bilinear2d (1000 rounds) - downsampling to 256x256\nElapsed time (ms): 1.01537\n\n- Bench ti_upsample_bilinear2d (1000 rounds) - upsampling to 512x512\nElapsed time (ms): 5.06349\n\n- Bench upsample_bilinear2d (1000 rounds) - upsampling to 512x512\nElapsed time (ms): 4.03706\n```\nvs\n```\n# LOOP2D\n\nNum threads: 1\n\nInput tensor: [1, 3, 320, 320]\nInput is_contiguous memory_format torch.channels_last: true\nInput is_contiguous : false\n\n- Bench ti_upsample_bilinear2d (1000 rounds) - downsampling to 256x256\nElapsed time (ms): 1.14841\n\n- Bench upsample_bilinear2d (1000 rounds) - downsampling to 256x256\nElapsed time (ms): 1.01576\n\n- Bench ti_upsample_bilinear2d (1000 rounds) - upsampling to 512x512\nElapsed time (ms): 4.35938\n\n- Bench upsample_bilinear2d (1000 rounds) - upsampling to 512x512\nElapsed time (ms): 4.03187\n```\n\n```\nOutput tensor: [1, 3, 512, 512]\nOutput is_contiguous memory_format torch.channels_last: true\nOutput is_contiguous memory_format torch.channels_last_3d: false\nOutput is_contiguous : false\nTI_SHOW: size0=3\nTI_SHOW: size1=512\nTI_SHOW_STRIDES: 4 4 | 0 0 0 0 | 0 0 0 0 |\n - strides= 4 4 0 0 0 0 0 0 0 0\n - outer_strides= 12 0 0 0 0 0 8 4 8 4\nTI_BASIC_LOOP -\u003e CHANNELS_LAST\n```\n\n```\nOutput tensor: [1, 3, 512, 512]\nOutput is_contiguous memory_format torch.channels_last: false\nOutput is_contiguous memory_format torch.channels_last_3d: false\nOutput is_contiguous : true\nTI_SHOW: size0=512\nTI_SHOW: size1=512\nTI_SHOW_STRIDES: 4 0 | 0 0 0 0 | 8 4 8 4 |\n - strides= 4 0 0 0 0 0 8 4 8 4\n - outer_strides= 2048 0 8 4 8 4 0 0 0 0\n```\n\n```\nOutput tensor: [1, 3, 8, 256, 256]\nOutput is_contiguous memory_format torch.channels_last: false\nOutput is_contiguous memory_format torch.channels_last_3d: true\nOutput is_contiguous : false\nTI_SHOW: size0=3\nTI_SHOW: size1=256\nTI_SHOW_STRIDES: 4 4 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 |\n - strides= 4 4 0 0 0 0 0 0 0 0 0 0 0 0\n - outer_strides= 12 0 0 0 0 0 0 0 0 0 8 4 8 4\nTI_BASIC_LOOP -\u003e CHANNELS_LAST\n```\n\n```\nOutput tensor: [1, 3, 8, 256, 256]\nOutput is_contiguous memory_format torch.channels_last: false\nOutput is_contiguous memory_format torch.channels_last_3d: false\nOutput is_contiguous : true\nTI_SHOW: size0=256\nTI_SHOW: size1=256\nTI_SHOW_STRIDES: 4 0 | 0 0 0 0 | 0 0 0 0 | 8 4 8 4 |\n - strides= 4 0 0 0 0 0 0 0 0 0 8 4 8 4\n - outer_strides= 1024 0 0 0 0 0 8 4 8 4 0 0 0 0\nTI_BASIC_LOOP -\u003e CHANNELS_FIRST\n```\n\n\n\u003c/details\u003e\n\n\n### Step 8 - Backward with TI for Cubic/Nearest/Linear interpolations\n\n/!\\ Under development /!\\\n\n- an implementation with implicit 2d kernel and possible race condition with ChannelsLast mem format.\n\n- \"Separable implementation\" can possibly solve race condition.\n\n#### Linear interpolation\n\n```bash\ncd step_eight_backward/linear \u0026\u0026 mkdir -p build \u0026\u0026 cd $_\nexport TORCH_PATH=/pytorch/torch\ncmake -DTORCH_DIR=$TORCH_PATH ..\nmake\n```\n\n```bash\nmake \u0026\u0026 ./bench\n```\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\n\nEarly results with linear backward using TensorIterator\n\n\u003c/summary\u003e\n\n```\nTorch config: PyTorch built with:  - GCC 9.3  - C++ Version: 201402  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - CPU capability usage: AVX2  - Build settings: BUILD_TYPE=Release, CXX_COMPILER=/usr/lib/ccache/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type-Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=0, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=0, USE_OPENMP=ON,\n\nNum threads: 1\n\n\n---- Benchmark 2D ----\n\nGrad Output tensor: [1, 3, 320, 320]\nGrad Output is_contiguous memory_format torch.channels_last: false\nGrad Output is_contiguous : true\n\n- Bench ti_upsample_bilinear2d_cpu_backward (7500 rounds) - upsampling from 256x256\nElapsed time (ms): 0.781853\n\n- Bench upsample_bilinear2d_backward (7500 rounds) - upsampling from 256x256\nElapsed time (ms): 1.71152\n\n- Bench ti_upsample_bilinear2d_cpu_backward (7500 rounds) - downsampling from 512x512\nElapsed time (ms): 0.809508\n\n- Bench upsample_bilinear2d_backward (7500 rounds) - downsampling from 512x512\nElapsed time (ms): 1.80017\n\n---- END Benchmark 2D ----\n```\n\n\u003c/details\u003e\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvfdev-5%2Finterpolate-tensoriterator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvfdev-5%2Finterpolate-tensoriterator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvfdev-5%2Finterpolate-tensoriterator/lists"}