https://github.com/renderkit/embree
Embree ray tracing kernels repository.
https://github.com/renderkit/embree
Last synced: about 1 year ago
JSON representation
Embree ray tracing kernels repository.
- Host: GitHub
- URL: https://github.com/renderkit/embree
- Owner: RenderKit
- License: apache-2.0
- Created: 2012-11-30T21:33:29.000Z (over 13 years ago)
- Default Branch: master
- Last Pushed: 2025-04-14T13:06:35.000Z (about 1 year ago)
- Last Synced: 2025-04-23T21:44:09.355Z (about 1 year ago)
- Language: C++
- Homepage:
- Size: 258 MB
- Stars: 2,484
- Watchers: 125
- Forks: 398
- Open Issues: 81
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
- Security: SECURITY.md
Awesome Lists containing this project
README
% Embree: High Performance Ray Tracing Kernels 4.4.0
% Intel Corporation
Intel® Embree Overview
======================
Intel® Embree is a high-performance ray tracing library developed at
Intel, which is released as open source under the [Apache 2.0
license](http://www.apache.org/licenses/LICENSE-2.0). Intel® Embree
supports x86 CPUs under Linux, macOS, and Windows; ARM CPUs on Linux
and macOS; as well as Intel® GPUs under Linux and Windows.
Intel® Embree targets graphics application developers to improve the
performance of photo-realistic rendering applications. Embree is
optimized towards production rendering, by putting focus on incoherent
ray performance, high quality acceleration structure construction, a
rich feature set, accurate primitive intersection, and low memory
consumption.
Embree's feature set includes various primitive types such as
triangles (as well quad and grids for lower memory consumption);
Catmull-Clark subdivision surfaces; various types of curve primitives,
such as flat curves (for distant views), round curves (for closeup
views), and normal oriented curves, all supported with different basis
functions (linear, Bézier, B-spline, Hermite, and Catmull Rom);
point-like primitives, such as ray oriented discs, normal oriented
discs, and spheres; user defined geometries with a procedural
intersection function; multi-level instancing; filter callbacks
invoked for any hit encountered; motion blur including multi-segment
motion blur, deformation blur, and quaternion motion blur; and ray
masking.
Intel® Embree contains ray tracing kernels optimized for the latest
x86 processors with support for SSE, AVX, AVX2, and AVX-512
instructions, and uses runtime code selection to choose between these
kernels. Intel® Embree contains algorithms optimized for incoherent
workloads (e.g. Monte Carlo ray tracing algorithms) and coherent
workloads (e.g. primary visibility and hard shadow rays) as well as
supports for dynamic scenes by implementing high-performance two-level
spatial index structure construction algorithms.
Intel® Embree supports applications written with the Intel® Implicit
SPMD Program Compiler (Intel® ISPC, ) by
providing an ISPC interface to the core ray tracing
algorithms. This makes it possible to write a renderer that
automatically vectorizes and leverages SSE, AVX, AVX2, and AVX-512
instructions.
Intel® Embree supports Intel GPUs through the
[SYCL](https://www.khronos.org/sycl/) open standard programming
language. SYCL allows to write C++ code that can be run on various
devices, such as CPUs and GPUs. Using Intel® Embree application
developers can write a single source renderer that executes
efficiently on CPUs and GPUs. Maintaining just one code base
this way can significantly improve productivity and eliminate
inconsistencies between a CPU and GPU version of the renderer. Embree
supports GPUs based on the Xe HPG and Xe HPC microarchitecture,
which support hardware accelerated ray tracing do deliver excellent
levels of ray tracing performance.
Supported Platforms
-------------------
Embree supports Windows (32-bit and 64-bit), Linux (64-bit), and macOS
(64-bit). Under Windows, Linux and macOS x86 based CPUs are supported,
while ARM CPUs are currently only supported under Linux and macOS (e.g.
Apple M1). ARM support for Windows experimental.
Embree supports Intel GPUs based on the Xe HPG microarchitecture
(Intel® Arc™ GPU) under Linux and Windows and Xe HPC microarchitecture
(Intel® Data Center GPU Flex Series and Intel® Data Center GPU Max
Series) under Linux.
The code compiles with the Intel® Compiler, Intel® oneAPI DPC++
Compiler, GCC, Clang, and the Microsoft Compiler. To use Embree on the
GPU the Intel® oneAPI DPC++ Compiler must be used. Please see section
[Compiling Embree] for details on tested compiler versions.
Embree requires at least an x86 CPU with support for
SSE2 or an Apple M1 CPU.
Embree Support and Contact
--------------------------
If you encounter bugs please report them via [Embree's GitHub Issue
Tracker](https://github.com/embree/embree/issues).
For questions and feature requests please write us at
.
To receive notifications of updates and new features of Embree please
subscribe to the [Embree mailing
list](https://groups.google.com/d/forum/embree/).
Installation of Embree
======================
Windows Installation
--------------------
A pre-built version of Embree for Windows is provided as a ZIP archive
[embree-4.4.0.x64.windows.zip](https://github.com/embree/embree/releases/download/v4.4.0/embree-4.4.0.x64.windows.zip). After
unpacking this ZIP file, you should set the path to the `lib` folder
manually to your `PATH` environment variable for applications to find
Embree.
Linux Installation
------------------
A pre-built version of Embree for Linux is provided as a `tar.gz` archive:
[embree-4.4.0.x86_64.linux.tar.gz](https://github.com/embree/embree/releases/download/v4.4.0/embree-4.4.0.x86_64.linux.tar.gz). Unpack
this file using `tar` and source the provided `embree-vars.sh` (if you
are using the bash shell) or `embree-vars.csh` (if you are using the C
shell) to set up the environment properly:
tar xzf embree-4.4.0.x86_64.linux.tar.gz
source embree-4.4.0.x86_64.linux/embree-vars.sh
We recommend adding a relative `RPATH` to your application that points
to the location where Embree (and TBB) can be found, e.g. `$ORIGIN/../lib`.
macOS Installation
------------------
The macOS version of Embree is also delivered as a ZIP file:
[embree-4.4.0.x86_64.macosx.zip](https://github.com/embree/embree/releases/download/v4.4.0/embree-4.4.0.x86_64.macosx.zip). Unpack
this file using `tar` and source the provided `embree-vars.sh` (if you
are using the bash shell) or `embree-vars.csh` (if you are using the C
shell) to set up the environment properly:
unzip embree-4.4.0.x64.macosx.zip source embree-4.4.0.x64.macosx/embree-vars.sh
If you want to ship Embree with your application, please use the Embree
library of the provided ZIP file. The library name of that Embree
library is of the form `@rpath/libembree.4.dylib`
(and similar also for the included TBB library). This ensures that you
can add a relative `RPATH` to your application that points to the location
where Embree (and TBB) can be found, e.g. `@loader_path/../lib`.
Building Embree Applications
----------------------------
The most convenient way to build an Embree application is through
CMake. Just let CMake find your unpacked Embree package using the
`FIND_PACKAGE` function inside your `CMakeLists.txt` file:
FIND_PACKAGE(embree 4 REQUIRED)
For CMake to properly find Embree you need to set the `embree_DIR` variable to
the folder containing the `embree_config.cmake` file. You might also have to
set the `TBB_DIR` variable to the path containing `TBB-config.cmake` of a local
TBB install, in case you do not have TBB installed globally on your system,
e.g:
cmake -D embree_DIR=path_to_embree_package/lib/cmake/embree-4.4.0/ \
-D TBB_DIR=path_to_tbb_package/lib/cmake/tbb/ \
..
The `FIND_PACKAGE` function will create an `embree` target that
you can add to your target link libraries:
TARGET_LINK_LIBRARIES(application embree)
For a full example on how to build an Embree application please have a
look at the `minimal` tutorial provided in the `src` folder of the
Embree package and also the contained `README.txt` file.
Building Embree SYCL Applications
----------------------------------
Building Embree SYCL applications is also best done using
CMake. Please first get some compatible SYCL compiler and setup the
environment as decribed in sections [Linux SYCL Compilation] and
[Windows SYCL Compilation].
Also perform the setup steps from the previous [Building Embree
Applications] section.
Please also have a look at the [Minimal] tutorial that is provided
with the Embree release, for an example how to build a simple SYCL
application using CMake and Embree.
To properly compile your SYCL application you have to add additional
SYCL compile flags for each C++ file that contains SYCL device side
code or kernels as described next.
### JIT Compilation
We recommend using just in time compilation (JIT compilation) together
with [SYCL JIT caching] to compile Embree SYCL applications. For JIT
compilation add these options to the compilation phase of all C++
files that contain SYCL code:
-fsycl -Xclang -fsycl-allow-func-ptr -fsycl-targets=spir64
These options enable SYCL two phase compilation (`-fsycl` option),
enable function pointer support (`-Xclang -fsycl-allow-func-ptr`
option), and just in time (JIT) compilation only
(`-fsycl-targets=spir64` option).
The following link options have to get added to the linking stage of
your application when using just in time compilation:
-fsycl -fsycl-targets=spir64
For a full example on how to build an Embree SYCL application please
have a look at the SYCL version of the `minimal` tutorial provided in
the `src` folder of the Embree package and also the contained
`README.txt` file.
Please have a look at the [Compiling Embree] section on how to create
an Embree package from sources if required.
### AOT Compilation
Ahead of time compilation (AOT compilation) allows to speed up first
application start up time as device binaries are precompiled. We do
not recommend using AOT compilation as it does not allow the usage of
specialization constants to reduce code complexity.
For ahead of time compilation add these compile options to the
compilation phase of all C++ files that contain SYCL code:
-fsycl -Xclang -fsycl-allow-func-ptr -fsycl-targets=spir64_gen
These options enable SYCL two phase compilation (`-fsycl` option),
enable function pointer support (`-Xclang -fsycl-allow-func-ptr`
option), and ahead of time (AOT) compilation
(`-fsycl-targets=spir64_gen` option).
The following link options have to get added to the linking stage of
your application when compiling ahead of time for Xe HPG devices:
-fsycl -fsycl-targets=spir64_gen
-Xsycl-target-backend=spir64_gen "-device XE_HPG_CORE"
This in particular configures the devices for AOT compilation to
`XE_HPG_CORE`.
To get a list of all device supported by AOT compilation look at the
help of the device option in ocloc tool:
ocloc compile --help
Building Embree Tests
---------------------
Embree is released with a bundle of tests in an optional testing package.
To run these tests extract the testing package in the same folder as your embree installation.
e.g.:
tar -xzf embree-4.4.0-testing.zip -C /path/to/installed/embree
The tests are extracted into a new folder inside you embree installation and can be run with:
cd /path/to/installed/embree/testing
cmake -B build
cmake --build build target=tests
Compiling Embree
================
We recommend using the prebuild Embree packages from
[https://github.com/embree/embree/releases](https://github.com/embree/embree/releases). If
you need to compile Embree yourself you need to use CMake as described
in the following.
Do not enable fast-math optimizations in your compiler as this mode is
not supported by Embree.
Linux and macOS
---------------
To compile Embree you need a modern C++ compiler that supports
C++11. Embree is tested with the following compilers:
Linux
- Intel® oneAPI DPC++/C++ Compiler 2024.0.2
- oneAPI DPC++/C++ Compiler 2023-10-26
- Clang 5.0.0
- Clang 4.0.0
- GCC 10.0.1 (Fedora 32) AVX512 support
- GCC 8.3.1 (Fedora 29) AVX512 support
- Intel® Implicit SPMD Program Compiler 1.22.0
macOS x86_64
- Apple Clang 15
macOS Arm64
- Apple Clang 14
Embree supports using the Intel® Threading Building Blocks (TBB) as the
tasking system. For performance and flexibility reasons we recommend
using Embree with the Intel® Threading Building Blocks (TBB) and best
also use TBB inside your application. Optionally you can disable TBB
in Embree through the `EMBREE_TASKING_SYSTEM` CMake variable.
Embree supports the Intel® Implicit SPMD Program Compiler (Intel® ISPC), which allows
straightforward parallelization of an entire renderer. If you
want to use Intel® ISPC then you can enable `EMBREE_ISPC_SUPPORT` in
CMake. Download and install the Intel® ISPC binaries from
[ispc.github.io](https://ispc.github.io/downloads.html). After
installation, put the path to `ispc` permanently into your `PATH` environment
variable or you set the `EMBREE_ISPC_EXECUTABLE` variable to point at the ISPC
executable during CMake configuration.
You additionally have to install CMake 3.1.0 or higher and the developer
version of [GLFW](https://www.glfw.org/) version 3.
Under macOS, all these dependencies can be installed
using [MacPorts](http://www.macports.org/):
sudo port install cmake tbb glfw-devel
Depending on your Linux distribution you can install these dependencies
using `yum` or `apt-get`. Some of these packages might already be
installed or might have slightly different names.
Type the following to install the dependencies using `yum`:
sudo yum install cmake
sudo yum install tbb-devel
sudo yum install glfw-devel
Type the following to install the dependencies using `apt-get`:
sudo apt-get install cmake-curses-gui
sudo apt-get install libtbb-dev
sudo apt-get install libglfw3-dev
Finally, you can compile Embree using CMake. Create a build directory
inside the Embree root directory and execute `ccmake ..` inside this
build directory.
mkdir build
cd build
ccmake ..
Per default, CMake will use the compilers specified with the `CC` and
`CXX` environment variables. Should you want to use a different
compiler, run `cmake` first and set the `CMAKE_CXX_COMPILER` and
`CMAKE_C_COMPILER` variables to the desired compiler. For example, to
use the Clang compiler instead of the default GCC on most Linux machines
(`g++` and `gcc`), execute
cmake -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang ..
Running `ccmake` will open a dialog where you can perform various
configurations as described below in [CMake Configuration]. After having
configured Embree, press `c` (for configure) and `g` (for generate) to
generate a Makefile and leave the configuration. The code can be
compiled by executing make.
make -j 8
The executables will be generated inside the build folder. We recommend
installing the Embree library and header files on your
system. Therefore set the `CMAKE_INSTALL_PREFIX` to `/usr` in cmake
and type:
sudo make install
If you keep the default `CMAKE_INSTALL_PREFIX` of `/usr/local` then
you have to make sure the path `/usr/local/lib` is in your
`LD_LIBRARY_PATH`.
You can also uninstall Embree again by executing:
sudo make uninstall
You can also create an Embree package using the following command:
make package
Please see the [Building Embree Applications] section on how to build
your application with such an Embree package.
Linux SYCL Compilation
-----------------------
There are two options to compile Embree with SYCL support:
The open source ["oneAPI DPC++ Compiler"](https://github.com/intel/llvm/) or
the ["Intel(R) oneAPI DPC++/C++ Compiler"](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#dpcpp-cpp).
Other SYCL compilers are not supported.
The "oneAPI DPC++ Compiler" is more up-to-date than the "Intel(R) oneAPI
DPC++/C++ Compiler" but less stable. The current tested version of the "oneAPI
DPC++ compiler is
- [oneAPI DPC++ Compiler 2023-10-26](https://github.com/intel/llvm/releases/tag/nightly-2023-10-26)
The compiler can be downloaded and simply extracted. The oneAPI DPC++ compiler
can be set up executing the following commands in a Linux (bash) shell:
export SYCL_BUNDLE_ROOT=path_to_dpcpp_compiler
export PATH=$SYCL_BUNDLE_ROOT/bin:$PATH
export CPATH=$SYCL_BUNDLE_ROOT/include:$CPATH
export LIBRARY_PATH=$SYCL_BUNDLE_ROOT/lib:$LIBRARY_PATH
export LD_LIBRARY_PATH=$SYCL_BUNDLE_ROOT/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$SYCL_BUNDLE_ROOT/linux/lib/x64:$LD_LIBRARY_PATH
where the `path_to_dpcpp_compiler` should point to the unpacked oneAPI DPC++
compiler. This will put `clang++` and `clang` from the oneAPI DPC++ Compiler
into your path.
Please also install all Linux packages described in the previous
section.
Now, you can configure Embree using CMake by executing the following command
in the Embree root directory:
cmake -B build \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_C_COMPILER=clang \
-DEMBREE_SYCL_SUPPORT=ON
This will create a directory `build` to use as the CMake build directory,
configure the usage of the oneAPI DPC++ Compiler, and turn on SYCL support
through `EMBREE_SYCL_SUPPORT=ON`.
Alternatively, you can download and run the installer of the
- [Intel(R) oneAPI DPC++/C++ Compiler](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#dpcpp-cpp).
After installation, you can set up the compiler by sourcing the
`vars.sh` script in the `env` directory of the compiler install directory, for example,
source /opt/intel/oneAPI/compiler/latest/env/vars.sh
This script will put the `icpx` and `icx` compiler executables from the
Intel(R) oneAPI DPC++/C++ Compiler in your path.
Now, you can configure Embree using CMake by executing the following command
in the Embree root directory:
cmake -B build \
-DCMAKE_CXX_COMPILER=icpx \
-DCMAKE_C_COMPILER=icx \
-DEMBREE_SYCL_SUPPORT=ON
More information about setting up the Intel(R) oneAPI DPC++/C++ compiler can be
found in the [Development Reference Guide](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compiler-setup.html). Please note, that the Intel(R) oneAPI DPC++/C++ compiler
requires [at least CMake version 3.20.5 on Linux](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compiler-setup/use-the-command-line/use-cmake-with-the-compiler.html).
Independent of the DPC++ compiler choice, you can now build Embree using
cmake --build build -j 8
The executables will be generated inside the build folder. The
executable names of the SYCL versions of the tutorials end with
`_sycl`.
### Linux Graphics Driver Installation
To run the SYCL code you need to install the latest GPGPU drivers for
your Intel Xe HPG/HPC GPUs from here
[https://dgpu-docs.intel.com/](https://dgpu-docs.intel.com/). Follow
the driver installation instructions for your graphics card and
operating system.
After installing the drivers you have to install an additional package
manually using
sudo apt install intel-level-zero-gpu-raytracing
Windows
-------
Embree is tested using the following compilers under Windows:
- Intel® oneAPI DPC++/C++ Compiler 2024.0.2
- oneAPI DPC++/C++ Compiler 2023-10-26
- Visual Studio 2022
- Visual Studio 2019
- Visual Studio 2017
- Intel® Implicit SPMD Program Compiler 1.22.0
To compile Embree for AVX-512 you have to use the Intel® Compiler.
Embree supports using the Intel® Threading Building Blocks (TBB) as the
tasking system. For performance and flexibility reasons we recommend
using use Embree with the Intel® Threading Building Blocks (TBB) and best
also use TBB inside your application. Optionally you can disable TBB
in Embree through the `EMBREE_TASKING_SYSTEM` CMake variable.
Embree will either find the Intel® Threading Building Blocks (TBB)
installation that comes with the Intel® Compiler, or you can install the
binary distribution of TBB directly from
[https://github.com/oneapi-src/oneTBB/releases](https://github.com/oneapi-src/oneTBB/releases)
into a folder named `tbb` into your Embree root directory. You also have
to make sure that the libraries `tbb.dll` and `tbb_malloc.dll` can be
found when executing your Embree applications, e.g. by putting the path
to these libraries into your `PATH` environment variable.
Embree supports the Intel® Implicit SPMD Program Compiler (Intel® ISPC), which
allows straightforward parallelization of an entire renderer. When installing
Intel® ISPC, make sure to download an Intel® ISPC version from
[ispc.github.io](https://ispc.github.io/downloads.html) that is compatible with
your Visual Studio version. After installation, put the path to `ispc.exe`
permanently into your `PATH` environment variable or you need to correctly set
the `EMBREE_ISPC_EXECUTABLE` variable during CMake configuration to point to
the ISPC executable. If you want to use Intel® ISPC, you have to enable
`EMBREE_ISPC_SUPPORT` in CMake.
You additionally have to install [CMake](http://www.cmake.org/download/)
(version 3.1 or higher). Note that you need a native Windows CMake
installation because CMake under Cygwin cannot generate solution files
for Visual Studio.
### Using the IDE
Run `cmake-gui`, browse to the Embree sources, set the build directory
and click Configure. Now you can select the Generator, e.g. "Visual
Studio 12 2013" for a 32-bit build or "Visual Studio 12 2013 Win64"
for a 64-bit build.
To use a different compiler than the Microsoft Visual C++ compiler, you
additionally need to specify the proper compiler toolset through the
option "Optional toolset to use (-T parameter)". E.g. to use Clang for
compilation set the toolset to "LLVM_v142".
Do not change the toolset manually in a solution file (neither through
the project properties dialog nor through the "Use Intel Compiler"
project context menu), because then some compiler-specific command line
options cannot be set by CMake.
Most configuration parameters described in the [CMake Configuration]
can be set under Windows as well. Finally, click "Generate" to create
the Visual Studio solution files.
The following CMake options are only available under Windows:
+ `CMAKE_CONFIGURATION_TYPE`: List of generated
configurations. The default value is Debug;Release;RelWithDebInfo.
+ `USE_STATIC_RUNTIME`: Use the static version of the C/C++ runtime
library. This option is turned OFF by default.
Use the generated Visual Studio solution file `embree4.sln` to compile
the project.
We recommend enabling syntax highlighting for the `.ispc` source and
`.isph` header files. To do so open Visual Studio, go to Tools ⇒
Options ⇒ Text Editor ⇒ File Extension and add the `isph` and `ispc`
extensions for the "Microsoft Visual C++" editor.
### Using the Command Line
Embree can also be configured and built without the IDE using the Visual
Studio command prompt:
cd path\to\embree
mkdir build
cd build
cmake -G "Visual Studio 16 2019" ..
cmake --build . --config Release
You can also build only some projects with the `--target` switch.
Additional parameters after "`--`" will be passed to `msbuild`. For
example, to build the Embree library in parallel use
cmake --build . --config Release --target embree -- /m
### Building Embree - Using vcpkg
You can download and install Embree using the [vcpkg](https://github.com/Microsoft/vcpkg) dependency manager:
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
./vcpkg install embree3
The Embree port in vcpkg is kept up to date by Microsoft team members
and community contributors. If the version is out of date, please
[create an issue or pull request](https://github.com/Microsoft/vcpkg)
on the vcpkg repository.
Windows SYCL Compilation
-------------------------
There are two options to compile Embree with SYCL support:
The open source ["oneAPI DPC++ Compiler"](https://github.com/intel/llvm/) or
the ["Intel(R) oneAPI DPC++/C++ Compiler"](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#dpcpp-cpp).
Other SYCL compilers are not supported. You will also need an installed version
of Visual Studio that supports the C++17 standard, e.g. Visual Studio 2019.
The "oneAPI DPC++ Compiler" is more up-to-date than the "Intel(R) oneAPI
DPC++/C++ Compiler" but less stable. The current tested version of the oneAPI
DPC++ compiler is
- [oneAPI DPC++ Compiler 2023-10-26](https://github.com/intel/llvm/releases/tag/nightly-2023-10-26)
Download and unpack the archive and open the "x64 Native Tools Command Prompt"
of Visual Studio and execute the following lines to properly configure the
environment to use the oneAPI DPC++ compiler:
set "DPCPP_DIR=path_to_dpcpp_compiler"
set "PATH=%DPCPP_DIR%\bin;%PATH%"
set "PATH=%DPCPP_DIR%\lib;%PATH%"
set "CPATH=%DPCPP_DIR%\include;%CPATH%"
set "INCLUDE=%DPCPP_DIR%\include;%INCLUDE%"
set "LIB=%DPCPP_DIR%\lib;%LIB%"
The `path_to_dpcpp_compiler` should point to the unpacked oneAPI DPC++
compiler.
Now, you can configure Embree using CMake by executing the following command
in the Embree root directory:
cmake -B build
-G Ninja
-D CMAKE_BUILD_TYPE=Release
-D CMAKE_CXX_COMPILER=clang++
-D CMAKE_C_COMPILER=clang
-D EMBREE_SYCL_SUPPORT=ON
-D TBB_ROOT=path_to_tbb\lib\cmake\tbb
This will create a directory `build` to use as the CMake build directory, and
configure a release build that uses `clang++` and `clang` from the oneAPI DPC++
compiler.
The [Ninja](https://ninja-build.org/) generator is currently the easiest way to
use the oneAPI DPC++ compiler.
We also enable SYCL support in Embree using the `EMBREE_SYCL_SUPPORT` CMake
option.
Alternatively, you can download and run the installer of the
- [Intel(R) oneAPI DPC++/C++ Compiler](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#dpcpp-cpp).
After installation, you can either open a regular `Command Prompt` and execute
the `vars.bat` script in the `env` directory of the compiler install directory,
for example
C:\Program Files (x86)\Intel\oneAPI\compiler\latest\env\vars.bat
or simply open the installed "Intel oneAPI command prompt for Intel 64 for Visual Studio".
Both ways will put the `icx` compiler executable from the
Intel(R) oneAPI DPC++/C++ compiler in your path.
Now, you can configure Embree using CMake by executing the following command
in the Embree root directory:
cmake -B build
-G Ninja
-D CMAKE_BUILD_TYPE=Release
-D CMAKE_CXX_COMPILER=icx
-D CMAKE_C_COMPILER=icx
-D EMBREE_SYCL_SUPPORT=ON
-D TBB_ROOT=path_to_tbb\lib\cmake\tbb
More information about setting up the Intel(R) oneAPI DPC++/C++ compiler can be
found in the [Development Reference Guide](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compiler-setup.html). Please note, that the Intel(R) oneAPI DPC++/C++ compiler
requires [at least CMake version 3.23 on Windows](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compiler-setup/use-the-command-line/use-cmake-with-the-compiler.html).
Independent of the DPC++ compiler choice, you can now build Embree using
cmake --build build
If you have problems with Ninja re-running CMake in an infinite loop,
then first remove the "Re-run CMake if any of its inputs changed."
section from the `build.ninja` file and run the above command again.
You can also create an Embree package using the following command:
cmake --build build --target package
Please see the [Building Embree SYCL Applications] section on how to build
your application with such an Embree package.
### Windows Graphics Driver Installation
In order to run the SYCL tutorials on HPG hardware, you first need to
install the graphics drivers for your graphics card from
[https://www.intel.com](https://www.intel.com). Please make sure to
have installed version 31.0.101.4644 or newer.
CMake Configuration
-------------------
The default CMake configuration in the configuration dialog should be
appropriate for most usages. The following list describes all
parameters that can be configured in CMake:
+ `CMAKE_BUILD_TYPE`: Can be used to switch between Debug mode
(Debug), Release mode (Release) (default), and Release mode with
enabled assertions and debug symbols (RelWithDebInfo).
+ `EMBREE_STACK_PROTECTOR`: Enables protection of return address
from buffer overwrites. This option is OFF by default.
+ `EMBREE_ISPC_SUPPORT`: Enables Intel® ISPC support of Embree. This option
is OFF by default.
+ `EMBREE_SYCL_SUPPORT`: Enables GPU support using SYCL. When this
option is enabled you have to use some DPC++ compiler. Please see
the sections [Linux SYCL Compilation] and [Windows SYCL Compilation]
on supported DPC++ compilers. This option is OFF by default.
+ `EMBREE_SYCL_AOT_DEVICES`: Selects a list of GPU devices for
ahead-of-time (AOT) compilation of device code. Possible values are
either, "none" which enables only just in time (JIT) compilation, or
a list of the Embree-supported Xe GPUs for AOT compilation:
* XE_HPG_CORE : Xe HPG devices
* XE_HPC_CORE : Xe HPC devices
One can also specify multiple devices separated by comma to
compile ahead of time for multiple devices,
e.g. "XE_HPG_CORE,XE_HP_CORE". When enabling AOT compilation for one
or multiple devices, JIT compilation will always additionally be
enabled in case the code is executed on a device no code is
precompiled for.
Execute "ocloc compile --help" for more details of possible devices
to pass. Embree is only supported on Xe HPG/HPC and newer devices.
Per default, this option is set to "none" to enable JIT
compilation. We recommend using JIT compilation as this enables the
use of specialization constants to reduce code complexity.
+ `EMBREE_STATIC_LIB`: Builds Embree as a static library (OFF by
default). Further multiple static libraries are generated for the
different ISAs selected (e.g. `embree4.a`, `embree4_sse42.a`,
`embree4_avx.a`, `embree4_avx2.a`, `embree4_avx512.a`). You have
to link these libraries in exactly this order of increasing ISA.
+ `EMBREE_API_NAMESPACE`: Specifies a namespace name to put all Embree
API symbols inside. By default, no namespace is used and plain C symbols
are exported.
+ `EMBREE_LIBRARY_NAME`: Specifies the name of the Embree library file
created. By default, the name embree4 is used.
+ `EMBREE_IGNORE_CMAKE_CXX_FLAGS`: When enabled, Embree ignores
default CMAKE_CXX_FLAGS. This option is turned ON by default.
+ `EMBREE_TUTORIALS`: Enables build of Embree tutorials (default ON).
+ `EMBREE_BACKFACE_CULLING`: Enables backface culling, i.e. only
surfaces facing a ray can be hit. This option is turned OFF by
default.
+ `EMBREE_BACKFACE_CULLING_CURVES`: Enables backface culling for curves,
i.e. only surfaces facing a ray can be hit. This option is turned OFF
by default.
+ `EMBREE_BACKFACE_CULLING_SPHERES`: Enables backface culling for spheres,
i.e. only surfaces facing a ray can be hit. This option is turned OFF
by default.
+ `EMBREE_COMPACT_POLYS`: Enables compact tris/quads, i.e. only
geomIDs and primIDs are stored inside the leaf nodes.
+ `EMBREE_FILTER_FUNCTION`: Enables the intersection filter function
feature (ON by default).
+ `EMBREE_RAY_MASK`: Enables the ray masking feature (OFF by default).
+ `EMBREE_RAY_PACKETS`: Enables ray packet traversal kernels. This
feature is turned ON by default. When turned on packet traversal is
used internally and packets passed to rtcIntersect4/8/16 are kept
intact in callbacks (when the ISA of appropriate width is enabled).
+ `EMBREE_IGNORE_INVALID_RAYS`: Makes code robust against the risk of
full-tree traversals caused by invalid rays (e.g. rays containing
INF/NaN as origins). This option is turned OFF by default.
+ `EMBREE_TASKING_SYSTEM`: Chooses between Intel® Threading TBB
Building Blocks (TBB), Parallel Patterns Library (PPL) (Windows
only), or an internal tasking system (INTERNAL). By default, TBB is
used.
+ `EMBREE_TBB_ROOT`: If Intel® Threading Building Blocks (TBB)
is used as a tasking system, search the library in this directory
tree.
+ `EMBREE_TBB_COMPONENT`: The component/library name of Intel® Threading
Building Blocks (TBB). Embree searches for this library name (default: tbb)
when TBB is used as the tasking system.
+ `EMBREE_TBB_POSTFIX`: If Intel® Threading Building Blocks (TBB)
is used as a tasking system, link to tbb.(so,dll,lib).
Defaults to the empty string.
+ `EMBREE_TBB_DEBUG_ROOT`: If Intel® Threading Building Blocks (TBB)
is used as a tasking system, search the library in this directory
tree in Debug mode. Defaults to `EMBREE_TBB_ROOT`.
+ `EMBREE_TBB_DEBUG_POSTFIX`: If Intel® Threading Building Blocks (TBB)
is used as a tasking system, link to tbb.(so,dll,lib)
in Debug mode. Defaults to "_debug".
+ `EMBREE_MAX_ISA`: Select highest supported ISA (SSE2, SSE4.2, AVX,
AVX2, AVX512, or NONE). When set to NONE the
EMBREE_ISA_* variables can be used to enable ISAs individually. By
default, the option is set to AVX2.
+ `EMBREE_ISA_SSE2`: Enables SSE2 when EMBREE_MAX_ISA is set to
NONE. By default, this option is turned OFF.
+ `EMBREE_ISA_SSE42`: Enables SSE4.2 when EMBREE_MAX_ISA is set to
NONE. By default, this option is turned OFF.
+ `EMBREE_ISA_AVX`: Enables AVX when EMBREE_MAX_ISA is set to NONE. By
default, this option is turned OFF.
+ `EMBREE_ISA_AVX2`: Enables AVX2 when EMBREE_MAX_ISA is set to
NONE. By default, this option is turned OFF.
+ `EMBREE_ISA_AVX512`: Enables AVX-512 for Skylake when
EMBREE_MAX_ISA is set to NONE. By default, this option is turned OFF.
+ `EMBREE_GEOMETRY_TRIANGLE`: Enables support for triangle geometries
(ON by default).
+ `EMBREE_GEOMETRY_QUAD`: Enables support for quad geometries (ON by
default).
+ `EMBREE_GEOMETRY_CURVE`: Enables support for curve geometries (ON by
default).
+ `EMBREE_GEOMETRY_SUBDIVISION`: Enables support for subdivision
geometries (ON by default).
+ `EMBREE_GEOMETRY_INSTANCE`: Enables support for instances (ON by
default).
+ `EMBREE_GEOMETRY_INSTANCE_ARRAY`: Enables support for instance arrays (ON by
default).
+ `EMBREE_GEOMETRY_USER`: Enables support for user-defined geometries
(ON by default).
+ `EMBREE_GEOMETRY_POINT`: Enables support for point geometries
(ON by default).
+ `EMBREE_CURVE_SELF_INTERSECTION_AVOIDANCE_FACTOR`: Specifies a
factor that controls the self-intersection avoidance feature for flat
curves. Flat curve intersections which are closer than
curve_radius*`EMBREE_CURVE_SELF_INTERSECTION_AVOIDANCE_FACTOR` to
the ray origin are ignored. A value of 0.0f disables self-intersection
avoidance while 2.0f is the default value.
+ `EMBREE_DISC_POINT_SELF_INTERSECTION_AVOIDANCE`: Enables self-intersection
avoidance for RTC_GEOMETRY_TYPE_DISC_POINT geometry type (ON by default).
When enabled intersections are skipped if the ray origin lies inside the
sphere defined by the point primitive.
+ `EMBREE_MIN_WIDTH`: Enabled the min-width feature, which allows
increasing the radius of curves and points to match some amount of
pixels. See [rtcSetGeometryMaxRadiusScale] for more details.
+ `EMBREE_MAX_INSTANCE_LEVEL_COUNT`: Specifies the maximum number of nested
instance levels. Should be greater than 0; the default value is 1.
Instances nested any deeper than this value will silently disappear in
release mode, and cause assertions in debug mode.
# Embree API
The Embree API is a low-level C99 ray tracing API which can be used to
build spatial index structures for 3D scenes and perform ray queries of
different types.
The API can get used on the CPU using standard C, C++, and ISPC code
and Intel GPUs by using SYCL code.
The Intel® Implicit SPMD Program Compiler (Intel® ISPC) version of the
API, is almost identical to the standard C99 version, but contains
additional functions that operate on ray packets with a size of the
native SIMD width used by Intel® ISPC.
The SYCL version of the API is also mostly identical to the C99 version
of the API, with some exceptions listed in section [Embree SYCL API].
For simplicity this document refers to the C99 version of the API
functions. For changes when upgrading from the Embree 3 to the current
Embree 4 API see Section [Upgrading from Embree 3 to Embree 4].
All API calls carry the prefix `rtc` (or `RTC` for types) which stands
for **r**ay **t**racing **c**ore. The API supports scenes consisting of
different geometry types such as triangle meshes, quad meshes (triangle
pairs), grid meshes, flat curves, round curves, oriented curves,
subdivision meshes, instances, and user-defined geometries. See Section
[Scene Object](#scene-object) for more information.
Finding the closest hit of a ray segment with the scene
(`rtcIntersect`-type functions), and determining whether any hit
between a ray segment and the scene exists (`rtcOccluded`-type
functions) are both supported. The API supports queries for single rays
and ray packets. See Section [Ray Queries](#ray-queries) for more
information.
The API is designed in an object-oriented manner, e.g. it contains
device objects (`RTCDevice` type), scene objects (`RTCScene` type),
geometry objects (`RTCGeometry` type), buffer objects (`RTCBuffer`
type), and BVH objects (`RTCBVH` type). All objects are reference
counted, and handles can be released by calling the appropriate release
function (e.g. `rtcReleaseDevice`) or retained by incrementing the
reference count (e.g. `rtcRetainDevice`). In general, API calls that
access the same object are not thread-safe, unless specified otherwise.
However, attaching geometries to the same scene and performing ray
queries in a scene is thread-safe.
Starting with Embree 4.4 intersection and occlusion queries on a SYCL
device require the use of the `rtcTraversableIntersect`-type functions
or the `rtcTraversableOccluded`-type function respectively. These
functions take a traversable object (`RTCTraversable` type) which
corresponds to a `RTCScene`. Traversable objects are not reference
counted and therefore they do not have to be released like the other
handles. Traversable objects grant read-only access to a scene object
on a SYCL device and are valid as long as the corresponding scene
object is valid.
## Device Object
Embree supports a device concept, which allows different components of
the application to use the Embree API without interfering with each
other. An application typically first creates a device using the
[rtcNewDevice] function (or [rtcNewSYCLDevice] when using SYCL for
the GPU). This device can then be used to construct further objects,
such as scenes and geometries. Before the application exits, it should
release all devices by invoking [rtcReleaseDevice]. An application
typically creates only a single device. If required differently, it
should only use a small number of devices at any given time.
Each user thread has its own error flag per device. If an error occurs
when invoking an API function, this flag is set to an error code (if it
isn't already set by a previous error). See Section
[rtcGetDeviceError] for information on how to read the error code and
Section [rtcSetDeviceErrorFunction] on how to register a callback
that is invoked for each error encountered. It is recommended to always
set a error callback function, to detect all errors.
## Scene Object
A scene is a container for a set of geometries, and contains a spatial
acceleration structure which can be used to perform different types of
ray queries.
A scene is created using the `rtcNewScene` function call, and released
using the `rtcReleaseScene` function call. To populate a scene with
geometries use the `rtcAttachGeometry` call, and to detach them use the
`rtcDetachGeometry` call. Once all scene geometries are attached, an
`rtcCommitScene` call (or `rtcJoinCommitScene` call) will finish the
scene description and trigger building of internal data structures.
After the scene got committed, it is safe to perform ray queries (see
Section [Ray Queries](#ray-queries)) or to query the scene bounding box
(see [rtcGetSceneBounds] and [rtcGetSceneLinearBounds]).
If scene geometries get modified or attached or detached, the
`rtcCommitScene` call must be invoked before performing any further ray
queries for the scene; otherwise the effect of the ray query is
undefined. The modification of a geometry, committing the scene, and
tracing of rays must always happen sequentially, and never at the same
time. Any API call that sets a property of the scene or geometries
contained in the scene count as scene modification, e.g. including
setting of intersection filter functions.
When using SYCL, calls to `rtcCommitScene` trigger memory transfers
from the host (CPU) to the device (GPU). Calling `rtcCommitScene` will
be blocking and return only after the memory transfers are completed.
Embree also provides the function `rtcCommitSceneWithQueue` which takes
a SYCL queue as argument to which the memory transfer operations are
submitted. Calling `rtcCommitSceneWithQueue` will trigger the memory
transfers asynchronously and the application is responsible for
sychronizing command on the queue properly to ensure the scene data is
available on a SYCL device when a SYCL kernels performs intersection
queries that rely on the scene data.
Scene flags can be used to configure a scene to use less memory
(`RTC_SCENE_FLAG_COMPACT`), use more robust traversal algorithms
(`RTC_SCENE_FLAG_ROBUST`), and to optimize for dynamic content. See
Section [rtcSetSceneFlags] for more details.
A build quality can be specified for a scene to balance between
acceleration structure build performance and ray query performance. See
Section [rtcSetSceneBuildQuality] for more details on build quality.
## Traversable Object
Starting with Embree 4.4 scene objects (`RTCScene` types) are not valid
handles on SYCL devices anymore and therefore can not be used for
Embree API calls in a SYCL kernel. Instead, Embree API calls on a SYCL
kernel have a variation which use traversable objects (`RTCTraversable`
type).
Traversable objects grant read-only access to a scene object on a SYCL
device and are valid as long as the corresponding scene object is
valid. They can be queried from a scene object using the
`rtcGetSceneTraversable` function and used in
`rtcTraversableIntersect`-type functions or the
`rtcTraversableOccluded`-type function. They can also be used in CPU
code and Embree provides other API calls such as the
`rtcTraversablePointQuery` (which are not currently implemented for
SYCL) to help write portable code compatible with CPU and SYCL device
execution.
## Geometry Object
A new geometry is created using the `rtcNewGeometry` function.
Depending on the geometry type, different buffers must be bound (e.g.
using `rtcSetSharedGeometryBuffer`) to set up the geometry data. In
most cases, binding of a vertex and index buffer is required. The
number of primitives and vertices of that geometry is typically
inferred from the size of these bound buffers.
Changes to the geometry always must be committed using the
`rtcCommitGeometry` call before using the geometry. After committing, a
geometry is not included in any scene. A geometry can be added to a
scene by using the `rtcAttachGeometry` function (to automatically
assign a geometry ID) or using the `rtcAttachGeometryById` function (to
specify the geometry ID manually). A geometry can get attached to
multiple scenes.
All geometry types support multi-segment motion blur with an arbitrary
number of equidistant time steps (in the range of 2 to 129) inside a
user specified time range. Each geometry can have a different number of
time steps and a different time range. The motion blur geometry is
defined by linearly interpolating the geometries of neighboring time
steps. To construct a motion blur geometry, first the number of time
steps of the geometry must be specified using the
`rtcSetGeometryTimeStepCount` function, and then a vertex buffer for
each time step must be bound, e.g. using the
`rtcSetSharedGeometryBuffer` function. Optionally, a time range
defining the start (and end time) of the first (and last) time step can
be set using the `rtcSetGeometryTimeRange` function. This feature will
also allow geometries to appear and disappear during the camera shutter
time if the time range is a sub range of [0,1].
## Ray Queries
The API supports finding the closest hit of a ray segment with the
scene (`rtcIntersect`-type functions), and determining whether any hit
between a ray segment and the scene exists (`rtcOccluded`-type
functions).
Supported are single ray queries (`rtcIntersect1` and `rtcOccluded1`)
as well as ray packet queries for ray packets of size 4
(`rtcIntersect4` and `rtcOccluded4`), ray packets of size 8
(`rtcIntersect8` and `rtcOccluded8`), and ray packets of size 16
(`rtcIntersect16` and `rtcOccluded16`).
See Sections [rtcIntersect1] and [rtcOccluded1] for a detailed
description of how to set up and trace a ray.
See tutorial [Triangle Geometry] for a complete example of how to
trace single rays and ray packets.
On SYCL devices the API functions `rtcTraversableIntersect` and
`rtcTraversableOccluded` have to be used.
## Point Queries
The API supports traversal of the BVH using a point query object that
specifies a location and a query radius. For all primitives
intersecting the according domain, a user defined callback function is
called which allows queries such as finding the closest point on the
surface geometries of the scene (see Tutorial [Closest Point]) or
nearest neighbour queries (see Tutorial [Voronoi]).
Point Queries can currently not be used on SYCL devices.
See Section [rtcPointQuery] for a detailed description of how to set
up point queries.
## Collision Detection
The Embree API also supports collision detection queries between two
scenes consisting only of user geometries. Embree only performs
broadphase collision detection, the narrow phase detection can be
performed through a callback function.
Collision detection can currently not be used on SYCL devices.
See Section [rtcCollide] for a detailed description of how to set up
collision detection.
Seen tutorial [Collision Detection](#collision-detection) for a
complete example of collision detection being used on a simple cloth
solver.
## Filter Functions
The API supports filter functions that are invoked for each
intersection found during the `rtcIntersect`-type or `rtcOccluded`-type
calls.
The filter functions can be set per-geometry using the
`rtcSetGeometryIntersectFilterFunction` and
`rtcSetGeometryOccludedFilterFunction` calls. The former ones are
called geometry intersection filter functions, the latter ones geometry
occlusion filter functions. These filter functions are designed to be
used to ignore intersections outside of a user-defined silhouette of a
primitive, e.g. to model tree leaves using transparency textures.
The filter function can also get passed as arguments directly to the
traversal functions, see section [rtcInitIntersectArguments] and
[rtcInitOccludedArguments] for more details. These argument filter
functions are designed to change the semantics of the ray query,
e.g. to accumulate opacity for transparent shadows, count the number of
surfaces along a ray, collect all hits along a ray, etc. The argument
filter function must be enabled to be used for a scene using the
`RTC_SCENE_FLAG_FILTER_FUNCTION_IN_ARGUMENTS` scene flag. The callback
is only invoked for geometries that enable the callback using the
`rtcSetGeometryEnableFilterFunctionFromArguments` call, or enabled for
all geometries when the `RTC_RAY_QUERY_FLAG_INVOKE_ARGUMENT_FILTER` ray
query flag is set.
## BVH Build API
The internal algorithms to build a BVH are exposed through the `RTCBVH`
object and `rtcBuildBVH` call. This call makes it possible to build a
BVH in a user-specified format over user-specified primitives. See the
documentation of the `rtcBuildBVH` call for more details.
# Embree SYCL API
Embree supports ray tracing on Intel GPUs by using the SYCL programming
language. SYCL is a Khronos standardized C++ based language for single
source heterogenous programming for acceleration offload, see the [SYCL
webpage](https://www.khronos.org/sycl/) for details.
The Embree SYCL API is designed for photorealistic rendering use cases,
where scene setup is performed on the host, and rendering on the
device. The Embree SYCL API is very similar to the standard Embree C99
API, and supports most of its features, such as all triangle-type
geometries, all curve types and basis functions, point geometry types,
user geometries, filter callbacks, multi-level instancing, and motion
blur.
To enable SYCL support you have to include the `sycl.hpp` file before
the Embree API headers:
#include
#include
Next you need to initializes an Embree SYCL device using the
`rtcNewSYCLDevice` API function by providing a SYCL context.
Embree provides the `rtcIsSYCLDeviceSupported` API function to check if
some SYCL device is supported by Embree. You can also use the
`rtcSYCLDeviceSelector` to conveniently select the first SYCL device
that is supported by Embree, e.g.:
sycl::device device(rtcSYCLDeviceSelector);
sycl::queue queue(device, exception_handler);
sycl::context context(device);
RTCDevice device = rtcNewSYCLDevice(context,"");
Scenes created with an Embree SYCL device can only get used to trace
rays using SYCL on the GPU, it is not possible to trace rays on the CPU
with such a device. To render on the CPU and GPU in parallel, the user
has to create a second Embree device and create a second scene to be
used on the CPU.
Starting with Embree 4.4 scene objects (`RTCScene` types) are not valid
handles on SYCL devices anymore and therefore can not be used for
Embree API calls in a SYCL kernel. Instead, Embree API calls on a SYCL
kernel have a variation which use traversable objects (`RTCTraversable`
type). To get a traversable object for a scene object the application
can call `rtcGetSceneTraversable`.
Files containing SYCL code, have to get compiled with the Intel® oneAPI
DPC++ compiler. Please see section [Linux SYCL Compilation] and
[Windows SYCL Compilation] for supported compilers. The DPC++
compiler performs a two-phase compilation, where host code is compiled
in a first phase, and device code compiled in a second compilation
phase.
Standard Embree API functions for scene construction can get used on
the host but not the device.
Before version 4.4, Embree made heavy use of unified shared memory
(USM) shared memory which simplifies memory management with SYCL
devices by letting the SYCL runtime transfer data from host to device
implicitly. However, some applications require more control over when
and how data is migrated from CPU to GPU. Embree 4.4 allows to use
explicit host and device memory allocations. See for example
`rtcSetNewGeometryBufferHostDevice`,
`rtcSetSharedGeometryBufferHostDevice`, `rtcNewBufferHostDevice`, and
`rtcNewSharedBufferHostDevice`. It is still possible to share data
buffers with Embree using SYCL USM shared memory by using the API calls
without the `HostDevice` suffix.
The easiest way to share data buffers with Embree (e.g. for vertex of
index buffers) is to allocate the data as USM shared memory, using the
`sycl::malloc` or `sycl::aligned_alloc` calls with
`sycl::usm::alloc::shared` property, or the sycl::aligned_alloc_shared
call, e.g:
void* ptr = sycl::aligned_alloc(16, bytes, queue, sycl::usm::alloc::shared);
These shared allocations have to be valid during rendering, as Embree
may access contained data when tracing rays.
Device side rendering can get invoked by submitting a SYCL
`parallel_for` to the SYCL queue:
const sycl::specialization_id feature_mask;
RTCFeatureFlags required_features = RTC_FEATURE_FLAG_TRIANGLE;
RTCTraversable traversable = rtcGetSceneTraversable(scene);
queue.submit([=](sycl::handler& cgh)
{
cgh.set_specialization_constant(required_features);
cgh.parallel_for(sycl::range<1>(1),[=](sycl::id<1> item, sycl::kernel_handler kh)
{
RTCIntersectArguments args;
rtcInitIntersectArguments(&args);
const RTCFeatureFlags features = kh.get_specialization_constant();
args.feature_mask = features;
struct RTCRayHit rayhit;
rayhit.ray.org_x = ox;
rayhit.ray.org_y = oy;
rayhit.ray.org_z = oz;
rayhit.ray.dir_x = dx;
rayhit.ray.dir_y = dy;
rayhit.ray.dir_z = dz;
rayhit.ray.tnear = 0;
rayhit.ray.tfar = std::numeric_limits::infinity();
rayhit.ray.mask = -1;
rayhit.ray.flags = 0;
rayhit.hit.geomID = RTC_INVALID_GEOMETRY_ID;
rayhit.hit.instID[0] = RTC_INVALID_GEOMETRY_ID;
rtcTraversableIntersect1(traversable, &rayhit, &args);
result->geomID = rayhit.hit.geomID;
result->primID = rayhit.hit.primID;
result->tfar = rayhit.ray.tfar;
});
});
queue.wait_and_throw();
This example passes a feature mask using a specialization constant to
the `rtcTraversableIntersect1` function, which is recommended for GPU
rendering. For best performance, this feature mask should get used to
enable only features required by the application to render the scene,
e.g. just triangles in this example.
Inside the SYCL `parallel_for` loop you can use rendering related
functions, such as the `rtcTraversableIntersect1` and
`rtcTraversableOccluded1` functions to trace rays,
`rtcTraversableForwardIntersect1/Ex` and
`rtcTraversableForwardOccluded1/Ex` to continue object traversal from
inside a user geometry callback, and
`rtcGetGeometryUserDataFromTraversable` to get the user data pointer of
some geometry.
Have a look at the [Minimal] tutorial for a minimal SYCL example and
the [Host Device Memory] tutorial shows four different ways in which
data buffers can be created by or shared with Embree using explicit
host/device data buffers.
## SYCL JIT caching
Compile times for just in time compilation (JIT compilation) can be
large. To resolve this issue we recommend enabling persistent JIT
compilation caching inside your application, by setting the
`SYCL_CACHE_PERSISTENT` environment variable to `1`, and the
`SYCL_CACHE_DIR` environment variable to some proper directory where
the JIT cache should get stored. These environment variables have to
get set before the SYCL device is created, e.g:
setenv("SYCL_CACHE_PERSISTENT","1",1);
setenv("SYCL_CACHE_DIR","cache_dir",1);
sycl::device device(rtcSYCLDeviceSelector);
...
## SYCL Memory Pooling
Memory Pooling is a mechanism where small USM memory allocations are
packed into larger allocation blocks. This mode is required when your
application performs many small USM allocations, as otherwise only a
small fraction of GPU memory is usable and data transfer performance
will be low.
Memory pooling is supported for USM allocations that are read-only by
the device. The following example allocated device read-only memory
with memory pooling support:
sycl::aligned_alloc_shared(align, bytes, queue,
sycl::ext::oneapi::property::usm::device_read_only());
## Embree SYCL Limitations
Embree only supports Xe HPC and HPG GPUs as SYCL devices, thus in
particular the CPU and other GPUs cannot get used as a SYCL device. To
render on the CPU just use the standard C99 API without relying on
SYCL.
The SYCL language spec puts some restrictions to device functions, such
as disallowing: global variable access, malloc, invocation of virtual
functions, function pointers, runtime type information, exceptions,
recursion, etc. See Section
`5.4. Language Restrictions for device functions` of the [SYCL
specification](https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:language.restrictions.kernels)
for more details.
Using Intel's oneAPI DPC++ compiler invoking an indirectly called
function is allowed, but we do not recommend this for performance
reasons.
Some features are not supported by the Embree SYCL API thus cannot get
used on the GPU:
- Since Embree 4.4, all the ray query functions that take an
`RTCScene` object as argument cannot get used in SYCL device side
code. Instead, the API functions taking a `RTCTraversable` object
(e.g. `rtcTraversableIntersect1`) have to be used.
- The packet tracing functions `rtcTraversableIntersect4/8/16` and
`rtcTraversableOccluded4/8/16` are not supported in SYCL device
side code. Using these functions makes no sense for SYCL, as the
programming model is implicitly executed in SIMT mode on the GPU
anyway.
- Filter and user geometry callbacks stored inside the geometry
objects are not supported on SYCL. Please use the alternative
approach of passing the function pointer through the
`RTCIntersectArguments` (or `RTCOccludedArguments`) structures to
the tracing function, which enables inlining on the GPU.
- The `rtcInterpolate` function cannot get used on the the device.
For most primitive types the vertex data interpolation is anyway a
trivial operation, and an API call just introduces overheads. On
the CPU that overhead is acceptable, but on the GPU it is not. The
`rtcInterpolate` function does not know the geometry type it is
interpolating over, thus its implementation on the GPU would
contain a large switch statement for all potential geometry types.
- Tracing rays using `rtcTraversableIntersect1` and
`rtcTraversableOccluded1` functions from user geometry callbacks is
not supported in SYCL. Please use the tail recursive
`rtcTraversableForwardIntersect1` and
`rtcTraversableForwardOccluded1` calls instead.
- Subdivision surfaces are not supported for Embree SYCL devices.
- Collision detection (`rtcCollide` API call) is not supported in
SYCL device side code.
- Point queries (`rtcPointQuery` API call) are not supported in SYCL
device side code.
## Embree SYCL Known Issues
- Compilation with build configuration "debug" is currently not
feasible because compilation times are very long.
# Upgrading from Embree 3 to Embree 4
This section summarizes API changes between Embree 3 and Embree4. Most
of these changes are motivated by GPU performance and having a
consistent API that works properly for the CPU and GPU.
- The API include folder got renamed from embree3 to embree4, to be
able to install Embree 3 and Embree 4 side by side, without having
conflicts in API folder.
- The `RTCIntersectContext` is renamed to `RTCRayQueryContext` and
the `RTCIntersectContextFlags` got renamed to `RTCRayQueryFlags`.
- There are some changes to the `rtcIntersect` and `rtcOccluded`
functions. Most members of the old intersect context have been
moved to some optional `RTCIntersectArguments` (and
`RTCOccludedArguments`) structures, which also contains a pointer
to the new ray query context. The argument structs fulfill the task
of providing additional advanced arguments to the traversal
functions. The ray query context can get used to pass additional
data to callbacks, and to maintain an instID stack in case
instancing is done manually inside user geometry callbacks. The
arguments struct is not available inside callbacks. This change was
in particular necessary for SYCL to allow inlining of function
pointers provided to the traversal functions, and to reduce the
amount of state passed to callbacks, which both improves GPU
performance. Most applications can just drop passing the ray query
context to port to Embree 4.
- The `rtcFilterIntersection` and `rtcFilterOcclusion` API calls that
invoke both, the geometry and argument version of the filter
callback, from a user geometry callback are no longer supported.
Instead applications should use the
`rtcInvokeIntersectFilterFromGeometry` and
`rtcInvokeOccludedFilterFromGeometry` API calls that invoke just
the geometry version of the filter function, and invoke the
argument filter function manually if required.
- The filter function passed as arguments to `rtcIntersect` and
`rtcOccluded` functions is only invoked for some geometry if
enabled through `rtcSetGeometryEnableFilterFunctionFromArguments`
for that geometry. Alternatively, argument filter functions can get
enabled for all geometries using the
`RTC_RAY_QUERY_FLAG_INVOKE_ARGUMENT_FILTER` ray query flag.
- User geometry callbacks get a valid vector as input to identify
valid and invalid rays. In Embree 3 the user geometry callback just
had to update the ray hit members when an intersection was found
and perform no operation otherwise. In Embree 4 the callback
additionally has to return valid=-1 when a hit was found, and
valid=0 when no hit was found. This allows Embree to properly pass
the new hit distance to the ray tracing hardware only in the case a
hit was found.
- Further ray masking is enabled by default now as required by most
applications and the default ray mask for geometries got changed
from 0xFFFFFFFF to 0x1.
- The stream tracing functions `rtcIntersect1M`, `rtcIntersect1Mp`,
`rtcIntersectNM`, `rtcIntersectNp`, `rtcOccluded1M`,
`rtcOccluded1Mp`, `rtcOccludedNM`, and `rtcOccludedNp` got removed
as they were rarely used and did not provide relevant performance
benefits. As alternative the application can just iterate over
`rtcIntersect1` and potentially `rtcIntersect4/8/16` to get similar
performance.
To use Embree through SYCL on the CPU and GPU additional changes are
required:
- Embree 3 allows to use `rtcIntersect` recursively from a user
geometry or intersection filter callback to continue a ray inside
an instantiated object. In Embree 4 using `rtcIntersect`
recursively is disallowed on the GPU but still supported on the
CPU. To properly continue a ray inside an instantiated object use
the new `rtc(Traversable)ForwardIntersect1` and
`rtc(Traversable)ForwardOccluded1` functions.
- The geometry object and scene object of Embree 4 are a host side
only objects, thus accessing it during rendering from the GPU is
not allowed. Thus all API functions that take an RTCGeometry object
or RTCScene object as argument cannot get used during rendering. In
particular the `rtcGetGeometryUserData(RTCGeometry)` call cannot
get used, but there is an alternative function
`rtcGetGeometryUserDataFromTraversable(RTCTraversable traversable,uint geomID)`
that should get used instead. To perform ray queries on the GPU
(e.g. `rtcTraversableIntersect1`) the application has to get a
`RTCTraversable` object first (using `rtcGetSceneTraversable`) and
pass it to the SYCL kernel.
- The user geometry callback and filter callback functions should get
passed through the intersection and occlusion argument structures
to the `rtcTraversableIntersect1` and `rtcTraversableOccluded1`
functions directly to allow inlining. The experimental geometry
version of the callbacks is disabled in SYCL and should not get
used.
- The feature flags should get used in SYCL to minimal GPU code for
optimal performance.
- The `rtcInterpolate` function cannot get used on the device, and
vertex data interpolation should get implemented by the
application.
- Indirectly called functions must be declared with
`RTC_SYCL_INDIRECTLY_CALLABLE` when used as filter or user geometry
callbacks.
```{=tex}
```
# Embree API Reference
## rtcNewDevice
#### NAME
rtcNewDevice - creates a new device
#### SYNOPSIS
#include
RTCDevice rtcNewDevice(const char* config);
#### DESCRIPTION
This function creates a new device to be used for CPU ray tracing and
returns a handle to this device. The device object is reference counted
with an initial reference count of 1. The handle can be released using
the `rtcReleaseDevice` API call.
The device object acts as a class factory for all other object types.
All objects created from the device (like scenes, geometries, etc.)
hold a reference to the device, thus the device will not be destroyed
unless these objects are destroyed first.
Objects are only compatible if they belong to the same device, e.g it
is not allowed to create a geometry in one device and attach it to a
scene created with a different device.
A configuration string (`config` argument) can be passed to the device
construction. This configuration string can be `NULL` to use the
default configuration.
The following configuration is supported:
- `threads=[int]`: Specifies a number of build threads to use. A
value of 0 enables all detected hardware threads. By default all
hardware threads are used.
- `user_threads=[int]`: Sets the number of user threads that can be
used to join and participate in a scene commit using
`rtcJoinCommitScene`. The tasking system will only use
threads-user_threads many worker threads, thus if the app wants to
solely use its threads to commit scenes, just set threads equal to
user_threads. This option only has effect with the Intel(R)
Threading Building Blocks (TBB) tasking system.
- `set_affinity=[0/1]`: When enabled, build threads are affinitized
to hardware threads. This option is disabled by default on standard
CPUs, and enabled by default on Xeon Phi Processors.
- `start_threads=[0/1]`: When enabled, the build threads are started
upfront. This can be useful for benchmarking to exclude thread
creation time. This option is disabled by default.
- `isa=[sse2,sse4.2,avx,avx2,avx512]`: Use specified ISA. By default
the ISA is selected automatically.
- `max_isa=[sse2,sse4.2,avx,avx2,avx512]`: Configures the automated
ISA selection to use maximally the specified ISA.
- `hugepages=[0/1]`: Enables or disables usage of huge pages. Under
Linux huge pages are used by default but under Windows and macOS
they are disabled by default.
- `enable_selockmemoryprivilege=[0/1]`: When set to 1, this enables
the `SeLockMemoryPrivilege` privilege with is required to use huge
pages on Windows. This option has an effect only under Windows and
is ignored on other platforms. See Section [Huge Page Support]
for more details.
- `verbose=[0,1,2,3]`: Sets the verbosity of the output. When set to
0, no output is printed by Embree, when set to a higher level more
output is printed. By default Embree does not print anything on the
console.
- `frequency_level=[simd128,simd256,simd512]`: Specifies the
frequency level the application want to run on, which can be
either:
a) simd128 to run at highest frequency
b) simd256 to run at AVX2-heavy frequency level
c) simd512 to run at heavy AVX512 frequency level. When some
frequency level is specified, Embree will avoid doing
optimizations that may reduce the frequency level below the
level specified. E.g. if your app does not use AVX instructions
setting "frequency_level=simd128" will cause some CPUs to run
at highest frequency, which may result in higher application
performance if you do much shading. If you application heavily
uses AVX code, you should best set the frequency level to
simd256. Per default Embree tries to avoid reducing the
frequency of the CPU by setting the simd256 level only when the
CPU has no significant down clocking.
Different configuration options should be separated by commas, e.g.:
rtcNewDevice("threads=1,isa=avx");
#### EXIT STATUS
On success returns a handle of the created device. On failure returns
`NULL` as device and sets a per-thread error code that can be queried
using `rtcGetDeviceError(NULL)`.
#### SEE ALSO
[rtcRetainDevice], [rtcReleaseDevice], [rtcNewSYCLDevice]
```{=tex}
```
## rtcNewSYCLDevice
#### NAME {#name}
rtcNewSYCLDevice - creates a new device to be used with SYCL
#### SYNOPSIS {#synopsis}
#include
RTCDevice rtcNewSYCLDevice(sycl::context context, const char* config);
#### DESCRIPTION {#description}
This function creates a new device to be used with SYCL for GPU
rendering and returns a handle to this device. The device object is
reference counted with an initial reference count of 1. The handle can
get released using the `rtcReleaseDevice` API call.
The passed SYCL context (`context` argument) is used to allocate GPU
data, thus only devices contained inside this context can be used for
rendering. By default the GPU data is allocated on the first GPU device
of the context, but this behavior can get changed with the
[rtcSetDeviceSYCLDevice] function.
The device object acts as a class factory for all other object types.
All objects created from the device (like scenes, geometries, etc.)
hold a reference to the device, thus the device will not be destroyed
unless these objects are destroyed first.
Objects are only compatible if they belong to the same device, e.g it
is not allowed to create a geometry in one device and attach it to a
scene created with a different device.
For an overview of configurations that can get passed (`config`
argument) please see the [rtcNewDevice] function description.
#### EXIT STATUS {#exit-status}
On success returns a handle of the created device. On failure returns
`NULL` as device and sets a per-thread error code that can be queried
using `rtcGetDeviceError(NULL)`.
#### SEE ALSO {#see-also}
[rtcRetainDevice], [rtcReleaseDevice], [rtcNewDevice]
```{=tex}
```
## rtcIsSYCLDeviceSupported
#### NAME {#name}
rtcIsSYCLDeviceSupported - checks if some SYCL device is supported by Embree
#### SYNOPSIS {#synopsis}
#include
bool rtcIsSYCLDeviceSupported(const sycl::device sycl_device);
#### DESCRIPTION {#description}
This function can be used to check if some SYCL device (`sycl_device`
argument) is supported by Embree.
#### EXIT STATUS {#exit-status}
The function returns true if the SYCL device is supported by Embree and
false otherwise. On failure an error code is set that can get queried
using `rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcSYCLDeviceSelector]
```{=tex}
```
## rtcSYCLDeviceSelector
#### NAME {#name}
rtcSYCLDeviceSelector - SYCL device selector function to select
devices supported by Embree
#### SYNOPSIS {#synopsis}
#include
int rtcSYCLDeviceSelector(const sycl::device sycl_device);
#### DESCRIPTION {#description}
This function checks if the passed SYCL device (`sycl_device`
arguments) is supported by Embree or not. This function can be used
directly to select some supported SYCL device by using it as SYCL
device selector function. For instance, the following code sequence
selects an Embree supported SYCL device and creates an Embree device
from it:
sycl::device sycl_device(rtcSYCLDeviceSelector);
sycl::queue sycl_queue(sycl_device);
sycl::context(sycl_device);
RTCDevice device = rtcNewSYCLDevice(sycl_context,nullptr);
#### EXIT STATUS {#exit-status}
The function returns -1 if the SYCL device is supported by Embree and 1
otherwise. On failure an error code is set that can get queried using
`rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcIsSYCLDeviceSupported]
```{=tex}
```
## rtcSetDeviceSYCLDevice
#### NAME {#name}
rtcSetDeviceSYCLDevice - sets the SYCL device to be used for memory allocations
#### SYNOPSIS {#synopsis}
#include
void rtcSetDeviceSYCLDevice(RTCDevice device, const sycl::device sycl_device);
#### DESCRIPTION {#description}
This function sets the SYCL device (`sycl_device` argument) to be used
to allocate GPU memory when using the specified Embree device (`device`
argument). This SYCL device must be one of the SYCL devices contained
inside the SYCL context used to create the Embree device.
#### EXIT STATUS {#exit-status}
On failure an error code is set that can get queried using
`rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcNewSYCLDevice]
```{=tex}
```
## rtcRetainDevice
#### NAME {#name}
rtcRetainDevice - increments the device reference count
#### SYNOPSIS {#synopsis}
#include
void rtcRetainDevice(RTCDevice device);
#### DESCRIPTION {#description}
Device objects are reference counted. The `rtcRetainDevice` function
increments the reference count of the passed device object (`device`
argument). This function together with `rtcReleaseDevice` allows to use
the internal reference counting in a C++ wrapper class to manage the
ownership of the object.
#### EXIT STATUS {#exit-status}
On failure an error code is set that can be queried using
`rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcNewDevice], [rtcReleaseDevice]
```{=tex}
```
## rtcReleaseDevice
#### NAME {#name}
rtcReleaseDevice - decrements the device reference count
#### SYNOPSIS {#synopsis}
#include
void rtcReleaseDevice(RTCDevice device);
#### DESCRIPTION {#description}
Device objects are reference counted. The `rtcReleaseDevice` function
decrements the reference count of the passed device object (`device`
argument). When the reference count falls to 0, the device gets
destroyed.
All objects created from the device (like scenes, geometries, etc.)
hold a reference to the device, thus the device will not get destroyed
unless these objects are destroyed first.
#### EXIT STATUS {#exit-status}
On failure an error code is set that can be queried using
`rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcNewDevice], [rtcRetainDevice]
```{=tex}
```
## rtcGetDeviceProperty
#### NAME {#name}
rtcGetDeviceProperty - queries properties of the device
#### SYNOPSIS {#synopsis}
#include
ssize_t rtcGetDeviceProperty(
RTCDevice device,
enum RTCDeviceProperty prop
);
#### DESCRIPTION {#description}
The `rtcGetDeviceProperty` function can be used to query properties
(`prop` argument) of a device object (`device` argument). The returned
property is an integer of type `ssize_t`.
Possible properties to query are:
- `RTC_DEVICE_PROPERTY_VERSION`: Queries the combined version number
(MAJOR.MINOR.PATCH) with two decimal digits per component. E.g. for
Embree 2.8.3 the integer 208003 is returned.
- `RTC_DEVICE_PROPERTY_VERSION_MAJOR`: Queries the major version
number of Embree.
- `RTC_DEVICE_PROPERTY_VERSION_MINOR`: Queries the minor version
number of Embree.
- `RTC_DEVICE_PROPERTY_VERSION_PATCH`: Queries the patch version
number of Embree.
- `RTC_DEVICE_PROPERTY_NATIVE_RAY4_SUPPORTED`: Queries whether the
`rtcIntersect4` and `rtcOccluded4` functions preserve packet size
and ray order when invoking callback functions. This is only the
case if Embree is compiled with `EMBREE_RAY_PACKETS` and `SSE2` (or
`SSE4.2`) enabled, and if the machine it is running on supports
`SSE2` (or `SSE4.2`).
- `RTC_DEVICE_PROPERTY_NATIVE_RAY8_SUPPORTED`: Queries whether the
`rtcIntersect8` and `rtcOccluded8` functions preserve packet size
and ray order when invoking callback functions. This is only the
case if Embree is compiled with `EMBREE_RAY_PACKETS` and `AVX` (or
`AVX2`) enabled, and if the machine it is running on supports `AVX`
(or `AVX2`).
- `RTC_DEVICE_PROPERTY_NATIVE_RAY16_SUPPORTED`: Queries whether the
`rtcIntersect16` and `rtcOccluded16` functions preserve packet size
and ray order when invoking callback functions. This is only the
case if Embree is compiled with `EMBREE_RAY_PACKETS` and `AVX512`
enabled, and if the machine it is running on supports `AVX512`.
- `RTC_DEVICE_PROPERTY_RAY_MASK_SUPPORTED`: Queries whether ray masks
are supported. This is only the case if Embree is compiled with
`EMBREE_RAY_MASK` enabled.
- `RTC_DEVICE_PROPERTY_BACKFACE_CULLING_ENABLED`: Queries whether
back face culling is enabled. This is only the case if Embree is
compiled with `EMBREE_BACKFACE_CULLING` enabled.
- `RTC_DEVICE_PROPERTY_BACKFACE_CULLING_CURVES_ENABLED`: Queries
whether back face culling for curves is enabled. This is only the
case if Embree is compiled with `EMBREE_BACKFACE_CULLING_CURVES`
enabled.
- `RTC_DEVICE_PROPERTY_BACKFACE_CULLING_SPHERES_ENABLED`: Queries
whether back face culling for spheres is enabled. This is only the
case if Embree is compiled with `EMBREE_BACKFACE_CULLING_SPHERES`
enabled.
- `RTC_DEVICE_PROPERTY_COMPACT_POLYS_ENABLED`: Queries whether
compact polys is enabled. This is only the case if Embree is
compiled with `EMBREE_COMPACT_POLYS` enabled.
- `RTC_DEVICE_PROPERTY_FILTER_FUNCTION_SUPPORTED`: Queries whether
filter functions are supported, which is the case if Embree is
compiled with `EMBREE_FILTER_FUNCTION` enabled.
- `RTC_DEVICE_PROPERTY_IGNORE_INVALID_RAYS_ENABLED`: Queries whether
invalid rays are ignored, which is the case if Embree is compiled
with `EMBREE_IGNORE_INVALID_RAYS` enabled.
- `RTC_DEVICE_PROPERTY_TRIANGLE_GEOMETRY_SUPPORTED`: Queries whether
triangles are supported, which is the case if Embree is compiled
with `EMBREE_GEOMETRY_TRIANGLE` enabled.
- `RTC_DEVICE_PROPERTY_QUAD_GEOMETRY_SUPPORTED`: Queries whether
quads are supported, which is the case if Embree is compiled with
`EMBREE_GEOMETRY_QUAD` enabled.
- `RTC_DEVICE_PROPERTY_SUBDIVISION_GEOMETRY_SUPPORTED`: Queries
whether subdivision meshes are supported, which is the case if
Embree is compiled with `EMBREE_GEOMETRY_SUBDIVISION` enabled.
- `RTC_DEVICE_PROPERTY_CURVE_GEOMETRY_SUPPORTED`: Queries whether
curves are supported, which is the case if Embree is compiled with
`EMBREE_GEOMETRY_CURVE` enabled.
- `RTC_DEVICE_PROPERTY_POINT_GEOMETRY_SUPPORTED`: Queries whether
points are supported, which is the case if Embree is compiled with
`EMBREE_GEOMETRY_POINT` enabled.
- `RTC_DEVICE_PROPERTY_USER_GEOMETRY_SUPPORTED`: Queries whether user
geometries are supported, which is the case if Embree is compiled
with `EMBREE_GEOMETRY_USER` enabled.
- `RTC_DEVICE_PROPERTY_TASKING_SYSTEM`: Queries the tasking system
Embree is compiled with. Possible return values are:
0. internal tasking system
1. Intel Threading Building Blocks (TBB)
2. Parallel Patterns Library (PPL)
- `RTC_DEVICE_PROPERTY_JOIN_COMMIT_SUPPORTED`: Queries whether
`rtcJoinCommitScene` is supported. This is not the case when Embree
is compiled with PPL or older versions of TBB.
- `RTC_DEVICE_PROPERTY_PARALLEL_COMMIT_SUPPORTED`: Queries whether
`rtcCommitScene` can get invoked from multiple TBB worker threads
concurrently. This feature is only supported starting with TBB 2019
Update 9.
#### EXIT STATUS {#exit-status}
On success returns the value of the queried property. For properties
returning a boolean value, the return value 0 denotes `false` and 1
denotes `true`.
On failure zero is returned and an error code is set that can be
queried using `rtcGetDeviceError`.
```{=tex}
```
## rtcGetDeviceError
#### NAME {#name}
rtcGetDeviceError - returns the error code of the device
#### SYNOPSIS {#synopsis}
#include
RTCError rtcGetDeviceError(RTCDevice device);
#### DESCRIPTION {#description}
Each thread has its own error code per device. If an error occurs when
calling an API function, this error code is set to the occurred error
if it stores no previous error. The `rtcGetDeviceError` function reads
and returns the currently stored error and clears the error code. This
assures that the returned error code is always the first error occurred
since the last invocation of `rtcGetDeviceError`.
Possible error codes returned by `rtcGetDeviceError` are:
- `RTC_ERROR_NONE`: No error occurred.
- `RTC_ERROR_UNKNOWN`: An unknown error has occurred.
- `RTC_ERROR_INVALID_ARGUMENT`: An invalid argument was specified.
- `RTC_ERROR_INVALID_OPERATION`: The operation is not allowed for the
specified object.
- `RTC_ERROR_OUT_OF_MEMORY`: There is not enough memory left to
complete the operation.
- `RTC_ERROR_UNSUPPORTED_CPU`: The CPU is not supported as it does
not support the lowest ISA Embree is compiled for.
- `RTC_ERROR_CANCELLED`: The operation got canceled by a memory
monitor callback or progress monitor callback function.
- `RTC_ERROR_LEVEL_ZERO_RAYTRACING_SUPPORT_MISSING`: This error can
occur when creating an Embree device with SYCL support using
`rtcNewSYCLDevice` fails. This error probably means that the GPU
driver is to old or not installed properly. Install a new GPU
driver and on Linux make sure that the package
`intel-level-zero-gpu-raytracing` is installed. For general driver
installation information for Linux refer to
.
When the device construction fails, `rtcNewDevice` returns `NULL` as
device. To detect the error code of a such a failed device
construction, pass `NULL` as device to the `rtcGetDeviceError`
function. For all other invocations of `rtcGetDeviceError`, a proper
device pointer must be specified.
The API function `rtcGetDeviceLastErrorMessage` can be used to get more
details about the last `RTCError` a `RTCDevice` encountered.
For convenient reporting of a `RTCError`, the API function
`rtcGetErrorString` can be used, which returns a string representation
of a given `RTCError`.
#### EXIT STATUS {#exit-status}
Returns the error code for the device.
#### SEE ALSO {#see-also}
[rtcSetDeviceErrorFunction], [rtcGetDeviceLastErrorMessage],
[rtcGetErrorString]
```{=tex}
```
## rtcGetDeviceLastErrorMessage
#### NAME {#name}
rtcGetDeviceLastErrorMessage - returns a message corresponding
to the last error code
#### SYNOPSIS {#synopsis}
#include
const char* rtcGetDeviceLastErrorMessage(RTCDevice device);
#### DESCRIPTION {#description}
This function can be used to get a message corresponding to the last
error code (returned by `rtcGetDeviceError`) which often provides
details about the error that happened. The message is the same as the
message that will written to console when verbosity is \> 0 or which is
passed as the `str` argument of the `RTCErrorFunction` (see
[rtcSetDeviceErrorFunction]). However, when device construction fails
this is the only way to get additional information about the error. In
this case, `rtcNewDevice` returns `NULL` as device. To query the error
message for such a failed device construction, pass `NULL` as device to
the `rtcGetDeviceLastErrorMessage` function. For all other invocations
of `rtcGetDeviceLastErrorMessage`, a proper device pointer must be
specified.
#### EXIT STATUS {#exit-status}
Returns a message corresponding to the last error code.
#### SEE ALSO {#see-also}
[rtcGetDeviceError], [rtcSetDeviceErrorFunction]
```{=tex}
```
## rtcGetErrorString
#### NAME {#name}
rtcGetErrorString - returns a string representation
of a given RTCError
#### SYNOPSIS {#synopsis}
#include
const char* rtcGetErrorString(RTCError code);
#### DESCRIPTION {#description}
Returns a string representation for a `RTCError` error code. For
example, for the `RTCError` RTC_ERROR_UNKNOWN this function will return
the string "Unknown Error". This is purely a convenience function for
printing error information on the user side.
The returned strings should not be used for comparing different
`RTCError` error codes or make other decisions based on the type of
error that occurred. For such things only the `RTCError` enum values
should be used.
#### EXIT STATUS {#exit-status}
Returns a string representation of a given `RTCError` error code.
#### SEE ALSO {#see-also}
[rtcGetDeviceError]
```{=tex}
```
## rtcSetDeviceErrorFunction
#### NAME {#name}
rtcSetDeviceErrorFunction - sets an error callback function for the device
#### SYNOPSIS {#synopsis}
#include
typedef void (*RTCErrorFunction)(
void* userPtr,
RTCError code,
const char* str
);
void rtcSetDeviceErrorFunction(
RTCDevice device,
RTCErrorFunction error,
void* userPtr
);
#### DESCRIPTION {#description}
Using the `rtcSetDeviceErrorFunction` call, it is possible to set a
callback function (`error` argument) with payload (`userPtr` argument),
which is called whenever an error occurs for the specified device
(`device` argument).
Only a single callback function can be registered per device, and
further invocations overwrite the previously set callback function.
Passing `NULL` as function pointer disables the registered callback
function.
When the registered callback function is invoked, it gets passed the
user-defined payload (`userPtr` argument as specified at registration
time), the error code (`code` argument) of the occurred error, as well
as a string (`str` argument) that further describes the error.
The error code is also set if an error callback function is registered.
#### EXIT STATUS {#exit-status}
On failure an error code is set that can be queried using
`rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcGetDeviceError]
```{=tex}
```
## rtcSetDeviceMemoryMonitorFunction
#### NAME {#name}
rtcSetDeviceMemoryMonitorFunction - registers a callback function
to track memory consumption
#### SYNOPSIS {#synopsis}
#include
typedef bool (*RTCMemoryMonitorFunction)(
void* userPtr,
ssize_t bytes,
bool post
);
void rtcSetDeviceMemoryMonitorFunction(
RTCDevice device,
RTCMemoryMonitorFunction memoryMonitor,
void* userPtr
);
#### DESCRIPTION {#description}
Using the `rtcSetDeviceMemoryMonitorFunction` call, it is possible to
register a callback function (`memoryMonitor` argument) with payload
(`userPtr` argument) for a device (`device` argument), which is called
whenever internal memory is allocated or deallocated by objects of that
device. Using this memory monitor callback mechanism, the application
can track the memory consumption of an Embree device, and optionally
terminate API calls that consume too much memory.
Only a single callback function can be registered per device, and
further invocations overwrite the previously set callback function.
Passing `NULL` as function pointer disables the registered callback
function.
Once registered, the Embree device will invoke the memory monitor
callback function before or after it allocates or frees important
memory blocks. The callback function gets passed the payload as
specified at registration time (`userPtr` argument), the number of
bytes allocated or deallocated (`bytes` argument), and whether the
callback is invoked after the allocation or deallocation took place
(`post` argument). The callback function might get called from multiple
threads concurrently.
The application can track the current memory usage of the Embree device
by atomically accumulating the `bytes` input parameter provided to the
callback function. This parameter will be \>0 for allocations and \<0
for deallocations.
Embree will continue its operation normally when returning `true` from
the callback function. If `false` is returned, Embree will cancel the
current operation with the `RTC_ERROR_OUT_OF_MEMORY` error code.
Issuing multiple cancel requests from different threads is allowed.
Canceling will only happen when the callback was called for allocations
(bytes \> 0), otherwise the cancel request will be ignored.
If a callback to cancel was invoked before the allocation happens
(`post == false`), then the `bytes` parameter should not be
accumulated, as the allocation will never happen. If the callback to
cancel was invoked after the allocation happened (`post == true`), then
the `bytes` parameter should be accumulated, as the allocation properly
happened and a deallocation will later free that data block.
#### EXIT STATUS {#exit-status}
On failure an error code is set that can be queried using
`rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcNewDevice]
```{=tex}
```
## rtcNewScene
#### NAME {#name}
rtcNewScene - creates a new scene
#### SYNOPSIS {#synopsis}
#include
RTCScene rtcNewScene(RTCDevice device);
#### DESCRIPTION {#description}
This function creates a new scene bound to the specified device
(`device` argument), and returns a handle to this scene. The scene
object is reference counted with an initial reference count of 1. The
scene handle can be released using the `rtcReleaseScene` API call.
#### EXIT STATUS {#exit-status}
On success a scene handle is returned. On failure `NULL` is returned
and an error code is set that can be queried using `rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcRetainScene], [rtcReleaseScene]
```{=tex}
```
## rtcGetSceneDevice
#### NAME {#name}
rtcGetSceneDevice - returns the device the scene got created in
#### SYNOPSIS {#synopsis}
#include
RTCDevice rtcGetSceneDevice(RTCScene scene);
#### DESCRIPTION {#description}
This function returns the device object the scene got created in. The
returned handle own one additional reference to the device object, thus
you should need to call `rtcReleaseDevice` when the returned handle is
no longer required.
#### EXIT STATUS {#exit-status}
On failure an error code is set that can be queried using
`rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcReleaseDevice]
```{=tex}
```
## rtcRetainScene
#### NAME {#name}
rtcRetainScene - increments the scene reference count
#### SYNOPSIS {#synopsis}
#include
void rtcRetainScene(RTCScene scene);
#### DESCRIPTION {#description}
Scene objects are reference counted. The `rtcRetainScene` function
increments the reference count of the passed scene object (`scene`
argument). This function together with `rtcReleaseScene` allows to use
the internal reference counting in a C++ wrapper class to handle the
ownership of the object.
#### EXIT STATUS {#exit-status}
On failure an error code is set that can be queried using
`rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcNewScene], [rtcReleaseScene]
```{=tex}
```
## rtcReleaseScene
#### NAME {#name}
rtcReleaseScene - decrements the scene reference count
#### SYNOPSIS {#synopsis}
#include
void rtcReleaseScene(RTCScene scene);
#### DESCRIPTION {#description}
Scene objects are reference counted. The `rtcReleaseScene` function
decrements the reference count of the passed scene object (`scene`
argument). When the reference count falls to 0, the scene gets
destroyed.
The scene holds a reference to all attached geometries, thus if the
scene gets destroyed, all geometries get detached and their reference
count decremented.
#### EXIT STATUS {#exit-status}
On failure an error code is set that can be queried using
`rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcNewScene], [rtcRetainScene]
```{=tex}
```
## rtcAttachGeometry
#### NAME {#name}
rtcAttachGeometry - attaches a geometry to the scene
#### SYNOPSIS {#synopsis}
#include
unsigned int rtcAttachGeometry(
RTCScene scene,
RTCGeometry geometry
);
#### DESCRIPTION {#description}
The `rtcAttachGeometry` function attaches a geometry (`geometry`
argument) to a scene (`scene` argument) and assigns a geometry ID to
that geometry. All geometries attached to a scene are defined to be
included inside the scene. A geometry can get attached to multiple
scenes. The geometry ID is unique for the scene, and is used to
identify the geometry when hit by a ray during ray queries.
This function is thread-safe, thus multiple threads can attach
geometries to a scene in parallel.
The geometry IDs are assigned sequentially, starting from 0, as long as
no geometry got detached. If geometries got detached, the
implementation will reuse IDs in an implementation dependent way.
Consequently sequential assignment is no longer guaranteed, but a
compact range of IDs.
These rules allow the application to manage a dynamic array to
efficiently map from geometry IDs to its own geometry representation.
Alternatively, the application can also use per-geometry user data to
map to its geometry representation. See `rtcSetGeometryUserData` and
`rtcGetGeometryUserData` for more information.
#### EXIT STATUS {#exit-status}
On failure an error code is set that can be queried using
`rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcSetGeometryUserData], [rtcGetGeometryUserData]
```{=tex}
```
## rtcAttachGeometryByID
#### NAME {#name}
rtcAttachGeometryByID - attaches a geometry to the scene
using a specified geometry ID
#### SYNOPSIS {#synopsis}
#include
void rtcAttachGeometryByID(
RTCScene scene,
RTCGeometry geometry,
unsigned int geomID
);
#### DESCRIPTION {#description}
The `rtcAttachGeometryByID` function attaches a geometry (`geometry`
argument) to a scene (`scene` argument) and assigns a user provided
geometry ID (`geomID` argument) to that geometry. All geometries
attached to a scene are defined to be included inside the scene. A
geometry can get attached to multiple scenes. The passed user-defined
geometry ID is used to identify the geometry when hit by a ray during
ray queries. Using this function, it is possible to share the same IDs
to refer to geometries inside the application and Embree.
This function is thread-safe, thus multiple threads can attach
geometries to a scene in parallel.
The user-provided geometry ID must be unused in the scene, otherwise
the creation of the geometry will fail. Further, the user-provided
geometry IDs should be compact, as Embree internally creates a vector
which size is equal to the largest geometry ID used. Creating very
large geometry IDs for small scenes would thus cause a memory
consumption and performance overhead.
#### EXIT STATUS {#exit-status}
On failure an error code is set that can be queried using
`rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcAttachGeometry]
```{=tex}
```
## rtcDetachGeometry
#### NAME {#name}
rtcDetachGeometry - detaches a geometry from the scene
#### SYNOPSIS {#synopsis}
#include
void rtcDetachGeometry(RTCScene scene, unsigned int geomID);
#### DESCRIPTION {#description}
This function detaches a geometry identified by its geometry ID
(`geomID` argument) from a scene (`scene` argument). When detached, the
geometry is no longer contained in the scene.
This function is thread-safe, thus multiple threads can detach
geometries from a scene at the same time.
#### EXIT STATUS {#exit-status}
On failure an error code is set that can be queried using
`rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcAttachGeometry], [rtcAttachGeometryByID]
```{=tex}
```
## rtcGetGeometry
#### NAME {#name}
rtcGetGeometry - returns the geometry bound to
the specified geometry ID
#### SYNOPSIS {#synopsis}
#include
RTCGeometry rtcGetGeometry(RTCScene scene, unsigned int geomID);
#### DESCRIPTION {#description}
The `rtcGetGeometry` function returns the geometry that is bound to the
specified geometry ID (`geomID` argument) for the specified scene
(`scene` argument). This function just looks up the handle and does
*not* increment the reference count. If you want to get ownership of
the handle, you need to additionally call `rtcRetainGeometry`.
This function is not thread safe and thus can be used during rendering.
However, it is generally recommended to store the geometry handle
inside the application's geometry representation and look up the
geometry handle from that representation directly.
If you need a thread safe version of this function please use
[rtcGetGeometryThreadSafe].
#### EXIT STATUS {#exit-status}
On failure `NULL` is returned and an error code is set that can be
queried using `rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcAttachGeometry], [rtcAttachGeometryByID],
[rtcGetGeometryThreadSafe]
```{=tex}
```
## rtcGetGeometryThreadSafe
#### NAME {#name}
rtcGetGeometryThreadSafe - returns the geometry bound to
the specified geometry ID
#### SYNOPSIS {#synopsis}
#include
RTCGeometry rtcGetGeometryThreadSafe(RTCScene scene, unsigned int geomID);
#### DESCRIPTION {#description}
The `rtcGetGeometryThreadSafe` function returns the geometry that is
bound to the specified geometry ID (`geomID` argument) for the
specified scene (`scene` argument). This function just looks up the
handle and does *not* increment the reference count. If you want to get
ownership of the handle, you need to additionally call
`rtcRetainGeometry`.
This function is thread safe and should NOT get used during rendering.
If you need a fast non-thread safe version during rendering please use
the [rtcGetGeometry] function.
#### EXIT STATUS {#exit-status}
On failure `NULL` is returned and an error code is set that can be
queried using `rtcGetDeviceError`.
#### SEE ALSO {#see-also}
[rtcAttachGeometry], [rtcAttachGeometryByID], [rtcGetGeometry]
```{=tex}
```
## rtcCommitScene
#### NAME {#name}
rtcCommitScene - commits scene changes
#### SYNOPSIS {#synopsis}
#include
void rtcCommitScene(RTCScene scene);
#### DESCRIPTION {#description}
The `rtcCommitScene` function commits all changes for the specified
scene (`scene` argument). This internally triggers building of a
spatial acceleration structure for the scene using all available worker
threads. Ray queries can be performed only after committing all scene
changes.
If the application uses TBB 2019 Update 9 or later for parallelization
of rendering, lazy scene construction during rendering is supported by
`rtcCommitScene`. Therefore `rtcCommitScene` can get called from
multiple TBB worker threads concurrently for the same scene. The
`rtcCommitScene` function will then internally isolate the scene
construction using a tbb::isolated_task_group. The alternative approach
of using `rtcJoinCommitScene` which uses an tbb:task_arena internally,
is not recommended due to it's high runtime overhead.
If scene geometries get modified or attached or detached, the
`rtcCommitScene` call must be invoked before performing any further ray
queries for the scene; otherwise the effect of the ray query is
undefined. The modification of a geometry, committing the scene, and
tracing of rays must always happen sequentially, and never at the same
time. Any API call that sets a property of the scene or geometries
contained in the scene count as scene modification, e.g. including
setting of intersection filter functions.
The kind of acceleration structure built can be influenced using scene
flags (see `rtcSetSceneFlags`), and the quality can be specified using
the `rtcSetSceneBuildQuality` function.
Embree silently ignores primitives during spatial acceleration
structure construction that would cause numerical issues,
e.g. primitives containing NaNs, INFs, or values greater than 1.844E18f
(as no reasonable calculations can be performed with such values
without causing overflows).
In case the RTCDevice associated with the `scene` is a SYCL device,
`rtcCommitScene` will internally create a temporary SYCL queue to issue
memory transfers from host to device memory. In this case, the call to
`