https://github.com/cometscome/latticematrices.jl
High-performance matrix fields on arbitrary D-dimensional lattices in Julia.
https://github.com/cometscome/latticematrices.jl
Last synced: 5 months ago
JSON representation
High-performance matrix fields on arbitrary D-dimensional lattices in Julia.
- Host: GitHub
- URL: https://github.com/cometscome/latticematrices.jl
- Owner: cometscome
- License: mit
- Created: 2025-07-25T01:05:17.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2026-01-30T23:12:02.000Z (5 months ago)
- Last Synced: 2026-01-31T01:56:00.387Z (5 months ago)
- Language: Julia
- Homepage:
- Size: 310 KB
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LatticeMatrices.jl
[](https://github.com/cometscome/MPILattice.jl/actions/workflows/CI.yml?query=branch%3Amain)
High-performance **matrix fields on arbitrary D-dimensional lattices** in Julia.
- Per-site matrices (size `NC1×NC2`) stored in **column-major layout**:
`(NC1, NC2, X, Y, Z, …)`
- **MPI** domain decomposition via a Cartesian communicator (halo width `nw`, periodic BCs).
- **GPU-ready** through **[JACC.jl](https://github.com/JuliaORNL/JACC.jl)** (portable CPU/GPU kernels; CUDA/ROCm/Threads).
- Fast, allocation-free **indexing helpers** for kernels: `DIndexer`, `linearize`, `delinearize`, `shiftindices`.
> This package focuses on scalable, halo-exchange–based lattice algorithms with minimal allocations and clean multi-backend execution.
**Applications**: This package is designed to support large-scale simulations on structured lattices. A key application area is lattice QCD, where gauge fields and fermion fields are represented as matrix-valued objects on a multi-dimensional lattice. In future developments, LatticeMatrices.jl is planned to be integrated into [Gaugefields.jl](https://github.com/akio-tomiya/Gaugefields.jl) and [LatticeDiracOperators.jl](https://github.com/akio-tomiya/LatticeDiracOperators.jl), providing the underlying data structures and linear algebra kernels for gauge and fermion dynamics.
**Current limitation.** Multi‑GPU execution and hybrid MPI+threads parallelism are **experimental** and **not yet thoroughly tested**; treat them as provisional.
---
## Installation
```julia
pkg> add LatticeMatrices
```
Requirements:
- Julia ≥ 1.11
---
## Quick tour
### 1) D-dimensional indexing helpers (GPU-kernel friendly)
```julia
using LatticeMatrices
# Build an indexer for a D-dimensional lattice (1-based indices)
gsize = (16, 16, 16, 16) # global lattice size
d = DIndexer(gsize) # computes row-major "strides" internally
# Convert between linear and multi-index (1-based)
L = linearize(d, (1, 1, 1, 1)) # -> 1
ix = delinearize(d, 4) # -> (4, 1, 1, 1) on this shape
# Apply shifts componentwise
p = shiftindices((4, 1, 1, 1), (1, 0, 0, 0)) # -> (5, 1, 1, 1)
```
**Signatures**
```julia
struct DIndexer{D,dims,strides} end
DIndexer(dims_in::NTuple{D,<:Integer}) where {D}
DIndexer(dims_in::AbstractVector{<:Integer})
# 1-based linearization/delinearization (no heap allocs; GPU-friendly)
linearize(::DIndexer{D,dims,strides}, idx::NTuple{D,Int32})::Int32
delinearize(::DIndexer{D,dims,strides}, L::Integer, offset::Int32=0)::NTuple{D,Int32}
# elementwise shifting for index tuples
shiftindices(indices, shift)
```
- `delinearize(...; offset)` is handy to **map into halo regions**, e.g. pass `offset = nw`.
---
### 2) Lattice containers (MPI + halos + JACC arrays)
The core container stores a **halo-padded** array on each rank and manages halo exchange without MPI derived datatypes (faces are packed into contiguous buffers).
```julia
using LatticeMatrices, MPI, JACC, LinearAlgebra
JACC.@init_backend
MPI.Init()
dim = 4
gsize = ntuple(_ -> 16, dim) # global spatial size per dimension
nw = 1 # ghost width
NC = 3 # per-site matrix size (NC×NC)
# Choose a Cartesian process grid (PEs) of length `dim`
nprocs = MPI.Comm_size(MPI.COMM_WORLD)
n1 = max(nprocs ÷ 2, 1)
PEs = ntuple(i -> i == 1 ? n1 : (i == 2 ? nprocs ÷ n1 : 1), dim)
# Construct an empty lattice matrix (device array via JACC.zeros)
M = LatticeMatrix(NC, NC, dim, gsize, PEs; nw, elementtype=ComplexF64)
# Or initialize from an existing array (broadcast to ranks)
A = rand(ComplexF64, NC, NC, gsize...)
M2 = LatticeMatrix(A, dim, PEs; nw)
# Halo exchange across all spatial dimensions
set_halo!(M)
# Global gather helpers (host reconstruction on rank 0)
G = gather_matrix(M; root=0) # rank 0: Array(NC, NC, gsize...)
Gall = gather_and_bcast_matrix(M; root=0) # all ranks receive the same Array
```
**Key type**
```julia
struct LatticeMatrix{D,T,AT,NC1,NC2,nw,DI} <: Lattice{D,T,AT}
nw::Int
phases::SVector{D,T} # per-direction phase (applied at wrap boundaries)
NC1::Int
NC2::Int
gsize::NTuple{D,Int}
cart::MPI.Comm # Cartesian communicator
coords::NTuple{D,Int} # 0-based Cartesian coords
dims::NTuple{D,Int} # process grid (PEs)
nbr::NTuple{D,NTuple{2,Int}} # neighbors (minus, plus)
A::AT # local array (NC1, NC2, X, Y, Z, …) with halos
buf::Vector{AT} # four face buffers per spatial dim
myrank::Int
PN::NTuple{D,Int} # local interior size per dim (no halos)
comm::MPI.Comm # original communicator
indexer::DI # DIndexer for global sizes
end
```
**Constructors**
```julia
LatticeMatrix(NC1, NC2, dim, gsize, PEs;
nw=1, elementtype=ComplexF64, phases=ones(dim), comm0=MPI.COMM_WORLD)
LatticeMatrix(A, dim, PEs; nw=1, phases=ones(dim), comm0=MPI.COMM_WORLD)
```
- **Layout**: `(NC1, NC2, X, Y, Z, …)`; halos are the outer `nw` cells on each spatial dim.
- **Phases**: wrap-around phases per dimension (applied on the boundary faces during exchange).
- **Exchange**: `set_halo!(ls)` calls `exchange_dim!(ls, d)` for each spatial dimension `d`.
---
### 3) Linear algebra on lattices
Per-site matrix operations follow BLAS-like semantics. The test suite shows full coverage (plain/adjoint inputs, shifted views):
```julia
# Random per-site matrices
A1 = rand(ComplexF64, NC, NC, gsize...)
A2 = rand(ComplexF64, NC, NC, gsize...)
A3 = rand(ComplexF64, NC, NC, gsize...)
M1 = LatticeMatrix(NC, NC, dim, gsize, PEs; nw)
M2 = LatticeMatrix(A2, dim, PEs; nw)
M3 = LatticeMatrix(A3, dim, PEs; nw)
# Choose a site (using DIndexer + halos)
indexer = DIndexer(gsize)
L = 4
idx_halo = Tuple(delinearize(indexer, L, Int32(nw))) # with halo offset
idx_core = Tuple(delinearize(indexer, L, Int32(0))) # core (no halo)
# Reference (host) product at a single site:
a1 = A1[:, :, idx_core...]
a2 = A2[:, :, idx_core...]
a3 = A3[:, :, idx_core...]
mul!(a1, a2, a3)
# Lattice product (device-backed); updates M1.A at that site:
mul!(M1, M2, M3)
m1 = M1.A[:, :, idx_halo...]
@assert a1 ≈ m1
# Matrix exponential at each site (in-place):
expt!(M1, M2, 1)
m1 = M1.A[:, :, idx_halo...]
a1 = exp(a2)
@assert a1 ≈ m1
# Trace and sum over all sites (returns a scalar)
println(tr(M1))
```
Adjoints and **shifted** operands are supported via lightweight wrappers:
```julia
M2p = Shifted_Lattice(M2, (1, 0, 0, 0)) # shift by +1 along X (periodic)
mul!(M1, M2', M3p) # all combinations in tests:
# (A, B, C), (A, B', C), (A, B, C'), etc.
```
**Convenience**
```julia
# Reduced sums (interior region only)
s = allsum(M) # MPI.Reduce to root (returns the global sum on rank 0)
```
## Examples: matrix multiplication on lattices
### 1) Plain matrix multiplication at each lattice site
```julia
using LatticeMatrices, MPI, JACC, LinearAlgebra
JACC.@init_backend
MPI.Init()
dim = 2
gsize = (8, 8)
NC = 3
PEs = (2, 2) # process grid (2×2)
M1 = LatticeMatrix(NC, NC, dim, gsize, PEs)
M2 = LatticeMatrix(rand(ComplexF64, NC, NC, gsize...), dim, PEs)
M3 = LatticeMatrix(rand(ComplexF64, NC, NC, gsize...), dim, PEs)
mul!(M1, M2, M3) # per-site product: M1 = M2 * M3
```
### 2) Multiplication with a shifted lattice
```julia
shift = (1, 0) # shift by +1 along X
M2s = Shifted_Lattice(M2, shift)
mul!(M1, M2s, M3) # M1 = (M2 shifted) * M3
```
The shift is applied with periodic wrapping across the global lattice size.
---
### 3) Multiplication with conjugate-transposed matrices
```julia
mul!(M1, M2', M3) # M1 = adjoint(M2) * M3
mul!(M1, M2, M3') # M1 = M2 * adjoint(M3)
mul!(M1, M2', M3') # M1 = adjoint(M2) * adjoint(M3)
```
All combinations of shifted and adjoint operands are supported and tested in `test/runtests.jl`.
---
## Automatic differentiation (Enzyme)
(above v0.3: experimental) We provide Enzyme-based AD extensions and test cases. See `test/adtest/ad.jl` for a concrete comparison between
automatic differentiation and numerical differentiation using `calc_action_loopfn`. The loop body is factored
into a small helper function (`_calc_action_step!`), which makes Enzyme AD more reliable for loop-heavy code.
Example (runs the AD vs numerical comparison with `calc_action_loopfn`):
```julia
using Enzyme
using LatticeMatrices, MPI, JACC
JACC.@init_backend
MPI.Init()
include("test/adtest/ad.jl") # runs main() in the script
```
Note: the AD result here follows Enzyme's complex differentiation convention. For a complex variable
`U = X + iY`, the gradient reported by Enzyme is
`dS/dUij = dS/dXij + i dS/dYij`.
---
## Running the test example
Exactly what `test/runtests.jl` does:
```bash
# CPU single process
julia --project -e 'using Pkg; Pkg.test("LatticeMatrices")'
# MPI (choose ranks and an MPI launcher)
mpiexec -n 4 julia --project test/runtests.jl
# With GPUs (example; make sure CUDA/ROCm works and select a JACC backend)
julia --project -e 'using JACC; JACC.@init_backend; using Pkg; Pkg.test()'
```
Internally, the tests:
- sweep `dim = 1:4` and `NC = 2:4`,
- construct `LatticeMatrix` objects on a Cartesian grid `PEs`,
- verify `mul!` for all nine combinations with/without adjoint and with/without shifts,
- use `DIndexer` to map between linear and multi-indices, including halo offsets.
---
## API reference (selected)
```julia
# Indexing
DIndexer(::NTuple{D,<:Integer})
DIndexer(::AbstractVector{<:Integer})
linearize(::DIndexer{D,dims,strides}, ::NTuple{D,Int32})::Int32
delinearize(::DIndexer{D,dims,strides}, ::Integer, ::Int32=0)::NTuple{D,Int32}
shiftindices(indices, shift)
# Lattice
LatticeMatrix(NC1, NC2, dim, gsize, PEs; nw=1, elementtype=ComplexF64,
phases=ones(dim), comm0=MPI.COMM_WORLD)
LatticeMatrix(A, dim, PEs; nw=1, phases=ones(dim), comm0=MPI.COMM_WORLD)
set_halo!(ls)
exchange_dim!(ls, d::Int)
gather_matrix(ls; root=0)::Union{Array{T},Nothing}
gather_and_bcast_matrix(ls; root=0)::Array{T}
allsum(ls) # Reduce(SUM) to root over interior
# Lightweight wrappers
struct Shifted_Lattice{D,shift}; data::D; end
struct Adjoint_Lattice{D}; data::D; end
# Base.adjoint(::Lattice) and Base.adjoint(::Shifted_Lattice) return Adjoint_Lattice
```
---
## License
MIT (see `LICENSE`).
---
## Acknowledgements
Built on the excellent Julia HPC stack: **MPI.jl**, **JACC.jl**, and the Julia standard libraries.
---
### References
- MPI.jl: https://github.com/JuliaParallel/MPI.jl
- JACC.jl: https://github.com/JuliaORNL/JACC.jl
---
## Selecting & switching GPU/CPU backends (via JACC.jl)
LatticeMatrices.jl uses [JACC.jl] for performance‑portable execution. Follow JACC’s
recommended flow to select **one** backend per project/session:
1) **Set a backend** (writes/updates `LocalPreferences.toml` and adds the backend package):
```julia
julia> import JACC
julia> JACC.set_backend("cuda") # or "amdgpu" or "threads" (default)
```
2) **Initialize at top level** so your code doesn’t need backend‑specific imports:
```julia
import JACC
JACC.@init_backend # must be at top-level scope
```
3) **Switching backends.** Re-run `JACC.set_backend("amdgpu")` (or `"threads"`, `"cuda"`) in the same project to switch; this updates `LocalPreferences.toml`. Restart your Julia session so extensions load for the new backend, then call `JACC.@init_backend` again.
> Notes:
> - Without calling `@init_backend`, using a non-`"threads"` backend will raise
> errors like `get_backend(::Val(:cuda))` when invoking JACC functions.
> - `JACC.array` / `JACC.array_type()` help you stay backend‑agnostic in your APIs.
References: JACC quick start and usage in the upstream README.