https://github.com/skailasa/rust-m2l
Multipole to Local Field Translation Experiments
https://github.com/skailasa/rust-m2l
Last synced: 5 months ago
JSON representation
Multipole to Local Field Translation Experiments
- Host: GitHub
- URL: https://github.com/skailasa/rust-m2l
- Owner: skailasa
- Created: 2023-09-28T09:41:17.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2023-09-28T13:46:16.000Z (about 2 years ago)
- Last Synced: 2023-09-28T15:33:19.466Z (about 2 years ago)
- Language: Rust
- Size: 215 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Rust SIMD Experiments
A huge problem in modern CPUs is the massive difference in latencies of in
computation in comparison to memory access. Memory accesses are a couple of
orders of magnitude slower than basic CPU instructions.This problem dominates the runtime of naively implemented algorithms. Where
cache misses cause the runtime to repeatedly access main memory. Algorithms
must instead be designed to reduce the number of memory accesses as much as
possible.For problems that have significant data dependencies, such as the Fast Multipole
Method (FMM), this becomes challenging. A naive implementation will potentially result
in a cache miss for every iteration, as data will have to be looked up from anywhere
in global buffer.In this crate I try out a few different experiments to alleviate global memory
access problems for a problem that has a similar logic to the field translations
in the FMM.A strategy for coping with this is to gather mutable pointers to save locations
and explicitly load them into SIMD registers, while computing the results of the
field translation in place in a vectorised manner.## Important SIMD terminology
This isn't a resource on SIMD programming, but some notes that I've made of things
that I need to commit to memory.* Vector - a SIMD value is called a vector. It's of fixed size known at compile
time. Vectors are aligned with respect to their entire size (similar, but different
concept to rust stack allocated arrays)* Lane - A single element position within a SIMD vector is called a lane. Accessing
individual lane values is relatively expensive. A SIMD vector has to be pushed out of
the register and onto the stack before an individual lane can be accessed on most
architectures. The vector then has to potentially be pushed back onto the stack,
lane access should be in general avoided in any hot loop.* Bit Widths - the bit widths in this context are the size of the vectors involved,
not the individual elements.* Vector Register - name for the extra wide registers used for SIMD operations.
Sometimes called floating point registers as it's common for the registers to be
used with both scalar and vectorised floating point operations.* Vertical - when an operation is vertical, each lane processes individually without
regard to the other lanes in the same vector. Most SIMD operations are vertical.* Reducing/Reduce - `reduce_*` operations, the lanes within a single vector are
merged using some operation.* Target Feature - Rust calls a CPU architecture extension a `target_feature`.
Nothing protects you from running binaries not built for your architecture, this
is automatically undefined behaviour.