https://github.com/dkobylianskii/torch-lap-cuda
A fast CUDA implementation of the Linear Assignment Problem (LAP) solver for PyTorch.
https://github.com/dkobylianskii/torch-lap-cuda
cuda python pytorch
Last synced: about 2 months ago
JSON representation
A fast CUDA implementation of the Linear Assignment Problem (LAP) solver for PyTorch.
- Host: GitHub
- URL: https://github.com/dkobylianskii/torch-lap-cuda
- Owner: dkobylianskii
- License: other
- Created: 2025-06-27T16:50:14.000Z (12 months ago)
- Default Branch: master
- Last Pushed: 2025-08-14T12:15:55.000Z (10 months ago)
- Last Synced: 2025-11-28T06:45:39.948Z (7 months ago)
- Topics: cuda, python, pytorch
- Language: Cuda
- Homepage:
- Size: 2.19 MB
- Stars: 9
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CUDA LAP Solver
[](https://badge.fury.io/py/torch-lap-cuda)
[](https://pepy.tech/project/torch-lap-cuda)
[](https://opensource.org/licenses/MIT)
Installation |
Usage |
Benchmarks
A fast CUDA implementation of the Linear Assignment Problem (LAP) solver for PyTorch. This project provides GPU-accelerated HyLAC algorithm implementation that can efficiently handle batched inputs.
Based on the HyLAC code https://github.com/Nagi-Research-Group/HyLAC/tree/Block-LAP
Please cite the original work if you use this code in your research: https://doi.org/10.1016/j.jpdc.2024.104838
## Features
- Fast CUDA-based implementation of the LAP solver
- Batched processing support for multiple cost matrices
- Seamless integration with PyTorch
- Supports single and double precision types: `torch.int32, torch.int64, torch.float32, torch.float64`
## Requirements
- Python >= 3.9
- CUDA >= 10.0
- PyTorch
- NVIDIA GPU with compute capability >= 7.5
## Installation
To install the package, you can use pip:
```bash
pip install torch-lap-cuda --no-build-isolation
```
You can install the package directly from source:
```bash
git clone https://github.com/dkobylianskii/torch-lap-cuda.git
cd torch-lap-cuda
pip install . --no-build-isolation
```
## Usage
Here's a simple example of how to use the LAP solver:
```python
import torch
from torch_lap_cuda import solve_lap
# Create a random cost matrix (batch_size x N x N)
batch_size = 128
size = 256
cost_matrix = torch.randn((batch_size, size, size), device="cuda")
# Solve the assignment problem
# assignments shape will be (batch_size, size)
# Each batch element contains the column indices for optimal assignment
assignments = solve_lap(cost_matrix)
# Calculate total costs
batch_idxs = torch.arange(batch_size, device=assignments.device).unsqueeze(1)
row_idxs = torch.arange(size, device=assignments.device).unsqueeze(0)
total_cost = cost_matrix[batch_idxs, row_idxs, assignments].sum()
```
The solver also supports 2D inputs for single matrices:
```python
# Single cost matrix (N x N)
cost_matrix = torch.randn((size, size), device="cuda")
assignments = solve_lap(cost_matrix) # Shape: (size,)
```
In case of having multiple GPUs, you can specify the device for lap solver using the `device` argument:
```python
cost_matrix = torch.randn((batch_size, size, size), device="cuda:0")
assignments = solve_lap(cost_matrix, device="cuda:1") # assignments will be on cuda:0
```
## Input Requirements
- Cost matrices must be on a CUDA device
- Input can be either 2D (N x N) or 3D (batch_size x N x N)
- Matrices must be square
- Supports single and double precision types: `torch.int32, torch.int64, torch.float32, torch.float64`
## Benchmarks
Tests were performed on an INTEL(R) XEON(R) GOLD 6530 and NVIDIA A6000 Ada GPU with CUDA 12.5 and PyTorch 2.6.0.
`Scipy (MP)` means multiprocessing version, `Scipy (MT)` means multithreading version, both used 32 processes/threads.
To run the benchmarks, execute:
```bash
python tests/benchmark.py
```
### Benchmark for uniform random distribution:

### Benchmark for normal random distribution:

### Benchmark for integer random distribution:

## Testing
To run the test suite:
```bash
pytest tests/
```