Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yinqiwen/rapidudf
A High-Performance JIT-Based C++ Expression/Script Execution Engine with SIMD Vectorization Support
https://github.com/yinqiwen/rapidudf
cpp17 expression high-performance jit llvm llvm-jit math-expressions script simd udf vectorized
Last synced: about 2 months ago
JSON representation
A High-Performance JIT-Based C++ Expression/Script Execution Engine with SIMD Vectorization Support
- Host: GitHub
- URL: https://github.com/yinqiwen/rapidudf
- Owner: yinqiwen
- License: apache-2.0
- Created: 2024-08-03T00:52:45.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-10-21T09:42:21.000Z (4 months ago)
- Last Synced: 2024-10-21T13:53:34.741Z (4 months ago)
- Topics: cpp17, expression, high-performance, jit, llvm, llvm-jit, math-expressions, script, simd, udf, vectorized
- Language: C++
- Homepage:
- Size: 979 KB
- Stars: 53
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
RapidUDF
`RapidUDF` is a high-performance SIMD vectorized expression/script computation execution engine library designed for online systems. It can be used in scenarios requiring high performance and flexibility such as rule engines, storage systems, and feature computation.
## Limitations
- C++17
## Features
- **Easy to Use**:
- Provides support for conventional expression syntax
- For more complex logic, supports a C-like DSL including if-elif-else* conditional control, while loop control, auto temporary variables, etc.;
- For columnar memory data (vector), provides dynamic Table APIs similar to Spark's DataFrame and operations like filter/order_by/topk/take;
- **High Performance**:
- Based on LLVM JIT compilation, startup and execution performance comparable to native cpp implementation;
- For columnar memory data (vector), provides SIMD vectorization acceleration implementation
- **Thread Safe**:
- State-less JIT-generated C methods are naturally thread-safe
- **FFI**:
- Supports zero-cost access to C++ defined class objects (custom classes/stl/protobufs/flatbuffers/...) in expressions/UDFs
- Supports zero-cost calls to methods/class methods defined in C++ within expressions/UDFs
- **Rich Built-in Data Types, Operators, and Functions**:
- [built-in data types](docs/dtype.md)
- [built-in operators](docs/operator.md)
- [built-in functions](docs/builtin_function.md)## Compilation and Installation
Compilation requires a compiler that supports C++17
### Bazel
Add in WORKSPACE:
```python
git_repository(
name = "rapidudf",
remote = "https://github.com/yinqiwen/rapidudf.git",
commit = "...",
)
load("@rapidudf//:rapidudf.bzl", "rapidudf_workspace")
rapidudf_workspace()
```
Add in the BUILD file for relevant code compilation rules:
```python
cc_library(
name = "mylib",
srcs = ["mylib.cc"],
hdrs = [
"mylib.h",
],
deps = [
"@rapidudf",
],
)
```
### CMake
First, compile and instal `rapidudf`
```bash
cd
mkdir build; cd build;
cmake ..
make install
```
Add the following to the CMake configuration of the related project:
```cmake
find_package(rapidudf REQUIRED)
....
# link rapidudf
target_link_libraries(mylib PRIVATE rapidudf::rapidudf)
```
[Example](rapidudf/examples/CMakeLists.txt)## Usage Overview
### Simple Expression
```cpp
#include "rapidudf/rapidudf.h"int main() {
// 1. If needed, set up rapidudf logger
// std::shared_ptr mylogger;
// rapidudf::set_default_logger(mylogger);
// 2. Expression string
std::string expression = "x >= 1 && y < 10";
// 3. Compile to generate Function, the generated Function object can be saved for subsequent repeated execution; compilation usually takes between 10ms-100ms;
rapidudf::JitCompiler compiler;
// CompileExpression's template parameters support multiple types, the first template parameter is the return type, the rest are function parameter types;
// Variable names used in the expression need to be passed in as a parameter name list, otherwise compilation fails
auto result = compiler.CompileExpression(expression, {"x", "y"});
if (!result.ok()) {
RUDF_ERROR("{}", result.status().ToString());
return -1;
}
// 4. Execute function
rapidudf::JitFunction f = std::move(result.value());
bool v = f(2, 3); // true
v = f(0, 1); // false
return 0;
};
```### Simple UDF Script
Fibonacci function
```cpp
#include "rapidudf/rapidudf.h"int main() {
// 1. If needed, can set up rapidudf logger
// std::shared_ptr mylogger;
// rapidudf::set_default_logger(mylogger);
// 2. UDF string
std::string source = R"(
int fib(int n)
{
if (n <= 1){
return n;
}
// Supports cpp // comments
return fib(n - 1) + fib(n - 2); // Recursive call
}
)";
// 3. Compile to generate Function, the generated Function object can be saved for subsequent repeated execution; compilation usually takes between 10ms-100ms;
rapidudf::JitCompiler compiler;
// CompileFunction's template parameters support multiple types, the first template parameter is the return type, the rest are function parameter types
auto result = compiler.CompileFunction(source);
if (!result.ok()) {
RUDF_ERROR("{}", result.status().ToString());
return -1;
}// 4. Execute function
rapidudf::JitFunction f = std::move(result.value());
int n = 9;
int x = f(n); // 34
RUDF_INFO("fib({}):{}", n, x);
return 0;
};
```### Vector Calculation
```cpp
#include "rapidudf/rapidudf.h"using namespace rapidudf;
int main() {
// 2. UDF string
std::string source = R"(
simd_vector boost_scores(Context ctx, simd_vector location, simd_vector score)
{
auto boost=(location=="home"?2.0_f32:0_f32);
return score*boost;
}
)";// 3. Compile to generate Function, the generated Function object can be saved for subsequent use
rapidudf::JitCompiler compiler;
// CompileFunction's template parameters support multiple types, the first template parameter is the return type, the rest are function parameter types
// 'rapidudf::Context' is a mandatory parameter involved in arena memory allocation in the simd implementation
auto result =
compiler.CompileFunction, rapidudf::Context&, simd::Vector, simd::Vector>(
source);
if (!result.ok()) {
RUDF_ERROR("{}", result.status().ToString());
return -1;
}// 4.1 Test data, need to convert raw data into columnar data
std::vector scores;
std::vector locations;
for (size_t i = 0; i < 4096; i++) {
scores.emplace_back(1.1 + i);
locations.emplace_back(i % 3 == 0 ? "home" : "other");
}// 5. Execute function
rapidudf::Context ctx;
auto f = std::move(result.value());
auto new_scores = f(ctx, ctx.NewSimdVector(locations), ctx.NewSimdVector(scores));
for (size_t i = 0; i < new_scores.Size(); i++) {
// RUDF_INFO("{}", new_scores[i]);
}
return 0;
};
```### Dynamic Vector Table
**RapidUDF** supports dynamically creating vector tables, allowing arbitrary computational operations on table columns (accelerated through SIMD) in expressions/UDFs;
The table class also provides operations similar to Spark DataFrame, such as:
- `.filter(simd::Vector)` returns a new table instance filtered by condition
- `.order_by(simd::Vector column, bool descending)` returns a new table instance sorted by condition
- `.topk(simd::Vector column, uint32_t k, bool descending)` returns a new table instance with top k entries
```cpp
#include "rapidudf/rapidudf.h"using namespace rapidudf;
struct Student {
std::string name;
uint16_t age = 0;
float score = 0;
bool gender = false;
};
RUDF_STRUCT_FIELDS(Student, name, age, score, gender)
int main() {
// 1. Create table schema
auto schema =
simd::TableSchema::GetOrCreate("Student", [](simd::TableSchema* s) { std::ignore = s->AddColumns(); });// 2. UDF string, table generic format where TABLE_NAME must match the previously created table schema name
// table supports filter/order_by/topk/take, etc. operations
std::string source = R"(
table select_students(Context ctx, table x)
{
auto filtered = x.filter(x.score >90 && x.age<10);
// Sort by score in descending order and take top 10
return filtered.topk(filtered.score,10,true);
}
)";// 3. Compile to generate Function, the generated Function object can be saved for subsequent use
rapidudf::JitCompiler compiler;
// CompileFunction's template parameters support multiple types, the first template parameter is the return type, the rest are function parameter types
auto result = compiler.CompileFunction(source);
if (!result.ok()) {
RUDF_ERROR("{}", result.status().ToString());
return -1;
}
auto f = std::move(result.value());// 4.1 Test data, need to convert raw data into columnar data
std::vector students;
for (size_t i = 0; i < 128; i++) {
float score = (i + 1) % 150;
uint16_t age = i % 5 + 8;
bool gender = i % 2 == 0;
students.emplace_back(Student{"test_" + std::to_string(i), age, score, gender});
}
// 4.2 Create table instance
rapidudf::Context ctx;
auto table = schema->NewTable(ctx);
std::ignore = table->AddRows(students);// 5. Execute function
auto result_table = f(ctx, table.get());
auto result_scores = result_table->Get("score").value();
auto result_names = result_table->Get("name").value();
auto result_ages = result_table->Get("age").value();
auto result_genders = result_table->Get("gender").value();
for (size_t i = 0; i < result_scores.Size(); i++) {
RUDF_INFO("name:{},score:{},age:{},gender:{}", result_names[i], result_scores[i], result_ages[i],
result_genders[i] ? true : false);
}
return 0;
};
```### Dynamic Vector Table Based on Protobuf/Flatbuffers/Struct
**RapidUDF** can also create a table from Protobuf/Flatbuffers, avoiding the tedious process of creating a TableSchema. Building table instances can be done directly from arrays of Protobuf objects such as `std::vector, std::vector, std::vector`.Here is an example of creating a vector table based on Protobuf;
Examples based on flatbuffers can be found in [fbs_vector_table_udf](rapidudf/examples/fbs_vector_table_udf.cc);
Examples based on struct can be found in [struct_vector_table_udf](rapidudf/examples/struct_vector_table_udf.cc);
```cpp
#include "rapidudf/examples/student.pb.h"
#include "rapidudf/rapidudf.h"using namespace rapidudf;
int main() {
// 1. Create table schema
auto schema = simd::TableSchema::GetOrCreate(
"Student", [](simd::TableSchema* s) { std::ignore = s->AddColumns(); });// 2. UDF string
std::string source = R"(
table select_students(Context ctx, table x)
{
auto filtered = x.filter(x.score >90 && x.age<10);
// Sort in descending order
return filtered.topk(filtered.score,10, true);
}
)";// 3. Compile to generate Function, the generated Function object can be saved for subsequent use
rapidudf::JitCompiler compiler;
auto result = compiler.CompileFunction(source);
if (!result.ok()) {
RUDF_ERROR("{}", result.status().ToString());
return -1;
}
auto f = std::move(result.value());// 4.1 Test data
std::vector students;
for (size_t i = 0; i < 150; i++) {
examples::Student student;
student.set_score((i + 1) % 150);
student.set_name("test_" + std::to_string(i));
student.set_age(i % 5 + 8);
students.emplace_back(std::move(student));
}
// 4.2 Create table instance and populate data
rapidudf::Context ctx;
auto table = schema->NewTable(ctx);
std::ignore = table->AddRows(students);// 5. Execute function
auto result_table = f(ctx, table.get());
// 5.1 Fetch columns
auto result_scores = result_table->Get("score").value();
auto result_names = result_table->Get("name").value();
auto result_ages = result_table->Get("age").value();for (size_t i = 0; i < result_scores.Size(); i++) {
RUDF_INFO("name:{},score:{},age:{}", result_names[i], result_scores[i], result_ages[i]);
}
return 0;
};
```### Compilation Cache
**RapidUDF** incorporates an LRU cache with keys as the string of expressions/UDFs. Users can retrieve compiled JitFunction objects from the cache to avoid parse/compile overhead each time they are used:
```cpp
std::vector vec{1, 2, 3};
JitCompiler compiler;
JsonObject json;
json["key"] = 123;std::string content = R"(
bool test_func(json x){
return x["key"] == 123;
}
)";
auto rc = GlobalJitCompiler::GetFunction(content);
ASSERT_TRUE(rc.ok());
auto f = std::move(rc.value());
ASSERT_TRUE(f(json));
ASSERT_FALSE(f.IsFromCache()); // 第一次编译rc = GlobalJitCompiler::GetFunction(content);
ASSERT_TRUE(rc.ok());
f = std::move(rc.value());
ASSERT_TRUE(f(json));
ASSERT_TRUE(f.IsFromCache()); //后续从cache中获取
```### More Examples and Usage
- [Using Custom C++ Classes in Expressions/UDFs](docs/ffi.md)
- [Using Member Functions of Custom C++ Classes in Expressions/UDFs](docs/ffi.md)
- [Using Protobuf Objects in Expressions/UDFs](docs/ffi.md)
- [Using FlatBuffers Objects in Expressions/UDFs](docs/ffi.md)
- [Using STL Objects in Expressions/UDFs](docs/ffi.md)There are more examples for different scenarios in the [tests](rapidudf/tests/) code directory.
## Performance
### Comparison with Native C++
Since RapidUDF is based on LLVM Jit, it theoretically can achieve performance very close to native C++ code. Comparison results for compiling the Fibonacci method with `O0`:
```
Benchmark Time CPU Iterations
---------------------------------------------------------------
BM_rapidudf_fib_func 22547 ns 22547 ns 31060
BM_native_fib_func 38933 ns 38933 ns 17964
```
Fibonacci method GCC `O2` compilation comparison results:
```
Benchmark Time CPU Iterations
---------------------------------------------------------------
BM_rapidudf_fib_func 22557 ns 22555 ns 31065
BM_native_fib_func 19246 ns 19239 ns 36395
```
Note: The Jit implementation currently uses the same jit compilation logic under `O0/O2` compilation switches; theoretically, the generated code should be identical.### Vectorized Acceleration Scenarios
The following tests were run on a CPU that supports `AVX2`, with the compilation optimization flag `O2`, and an array length of `4099`.
#### Complex Trigonometric Expression
The calculation is to execute the double array `x + (cos(y - sin(2 / x * pi)) - sin(x - cos(2 * y / pi))) - y`; theoretically, the acceleration ratio should be the multiple of the `AVX2` register width to the `double` width, which is `4`.
Actual results are as follows, showing that the acceleration ratio has exceeded `4` times, reaching **6.09**:
```
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
BM_rapidudf_expr_func 207713 ns 207648 ns 3362
BM_rapidudf_vector_expr_func 33962 ns 33962 ns 20594
BM_native_func 207145 ns 207136 ns 3387
```
#### Wilson Ctr
Original function prototype:
```cpp
float wilson_ctr(float exp_cnt, float clk_cnt) {
return std::log10(exp_cnt) *
(clk_cnt / exp_cnt + 1.96 * 1.96 / (2 * exp_cnt) -
1.96 / (2 * exp_cnt) * std::sqrt(4 * exp_cnt * (1 - clk_cnt / exp_cnt) * clk_cnt / exp_cnt + 1.96 * 1.96)) /
(1 + 1.96 * 1.96 / exp_cnt);
}
```Corresponding vector UDF script implementation:
```cpp
simd_vector wilson_ctr(Context ctx, simd_vector exp_cnt, simd_vector clk_cnt)
{
return log10(exp_cnt) *
(clk_cnt / exp_cnt + 1.96 * 1.96 / (2 * exp_cnt) -
1.96 / (2 * exp_cnt) * sqrt(4 * exp_cnt * (1 - clk_cnt / exp_cnt) * clk_cnt / exp_cnt + 1.96 * 1.96)) /
(1 + 1.96 * 1.96 / exp_cnt);
}
```
Theoretically, the acceleration ratio should be the multiple of the `AVX2` register width to the float width, which is `8`;
Actual results are as follows, showing that the acceleration ratio has exceeded `8` times, reaching **10.5**:
```
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
BM_native_wilson_ctr 69961 ns 69957 ns 9960
BM_rapidudf_vector_wilson_ctr 6661 ns 6659 ns 105270
```## Dependencies
- [LLVM](https://llvm.org/)
- [highway](https://github.com/google/highway)
- [x86-simd-sort](https://github.com/intel/x86-simd-sort)
- [sleef](https://github.com/shibatch/sleef)
- [fmtlib](https://github.com/fmtlib/fmt)
- [spdlog](https://github.com/gabime/spdlog)
- [abseil-cpp](https://github.com/abseil/abseil-cpp)
- boost
- [preprocessor](http://boost.org/libs/preprocessor)
- [parser](https://github.com/tzlaine/parser)
- [protobuf](https://github.com/protocolbuffers)
- [flatbuffers](https://github.com/google/flatbuffers)
- [json](https://github.com/nlohmann/json)