{"id":29514157,"url":"https://github.com/dkobylianskii/torch-lap-cuda","last_synced_at":"2026-05-05T17:31:55.756Z","repository":{"id":301596027,"uuid":"1009754497","full_name":"dkobylianskii/torch-lap-cuda","owner":"dkobylianskii","description":"A fast CUDA implementation of the Linear Assignment Problem (LAP) solver for PyTorch.","archived":false,"fork":false,"pushed_at":"2025-08-14T12:15:55.000Z","size":2293,"stargazers_count":9,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-11-28T06:45:39.948Z","etag":null,"topics":["cuda","python","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dkobylianskii.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-27T16:50:14.000Z","updated_at":"2025-10-17T09:08:02.000Z","dependencies_parsed_at":"2025-08-14T13:14:48.413Z","dependency_job_id":"7a7da242-352a-4601-9c1f-cbe51aebcc12","html_url":"https://github.com/dkobylianskii/torch-lap-cuda","commit_stats":null,"previous_names":["dkobylianskii/lap_cuda","dkobylianskii/torch-lap-cuda"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/dkobylianskii/torch-lap-cuda","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkobylianskii%2Ftorch-lap-cuda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkobylianskii%2Ftorch-lap-cuda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkobylianskii%2Ftorch-lap-cuda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkobylianskii%2Ftorch-lap-cuda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dkobylianskii","download_url":"https://codeload.github.com/dkobylianskii/torch-lap-cuda/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkobylianskii%2Ftorch-lap-cuda/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32660207,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-05T11:29:49.557Z","status":"ssl_error","status_checked_at":"2026-05-05T11:29:48.587Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","python","pytorch"],"created_at":"2025-07-16T14:00:55.785Z","updated_at":"2026-05-05T17:31:55.749Z","avatar_url":"https://github.com/dkobylianskii.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CUDA LAP Solver\n[![PyPI version](https://badge.fury.io/py/torch-lap-cuda.svg)](https://badge.fury.io/py/torch-lap-cuda)\n[![Downloads](https://static.pepy.tech/badge/torch-lap-cuda)](https://pepy.tech/project/torch-lap-cuda)\n[![License](https://img.shields.io/badge/MIT-blue.svg)](https://opensource.org/licenses/MIT)\n\n\u003ch4 align=\"left\"\u003e\n    \u003cp\u003e\n        \u003ca href=\"#Installation\"\u003eInstallation\u003c/a\u003e |\n        \u003ca href=\"#Usage\"\u003eUsage\u003c/a\u003e |\n        \u003ca href=\"#Benchmarks\"\u003eBenchmarks\u003c/a\u003e\n    \u003cp\u003e\n\u003c/h4\u003e\n\nA fast CUDA implementation of the Linear Assignment Problem (LAP) solver for PyTorch. This project provides GPU-accelerated HyLAC algorithm implementation that can efficiently handle batched inputs.\n\nBased on the HyLAC code https://github.com/Nagi-Research-Group/HyLAC/tree/Block-LAP\nPlease cite the original work if you use this code in your research:  https://doi.org/10.1016/j.jpdc.2024.104838 \n\n## Features\n\n- Fast CUDA-based implementation of the LAP solver \n- Batched processing support for multiple cost matrices\n- Seamless integration with PyTorch\n- Supports single and double precision types: `torch.int32, torch.int64, torch.float32, torch.float64`\n\n## Requirements\n\n- Python \u003e= 3.9\n- CUDA \u003e= 10.0\n- PyTorch\n- NVIDIA GPU with compute capability \u003e= 7.5\n\n## Installation\n\nTo install the package, you can use pip:\n\n```bash\npip install torch-lap-cuda --no-build-isolation\n```\n\nYou can install the package directly from source:\n\n```bash\ngit clone https://github.com/dkobylianskii/torch-lap-cuda.git\ncd torch-lap-cuda\npip install . --no-build-isolation\n```\n\n## Usage\n\nHere's a simple example of how to use the LAP solver:\n\n```python\nimport torch\nfrom torch_lap_cuda import solve_lap\n\n# Create a random cost matrix (batch_size x N x N)\nbatch_size = 128\nsize = 256\ncost_matrix = torch.randn((batch_size, size, size), device=\"cuda\")\n\n# Solve the assignment problem\n# assignments shape will be (batch_size, size)\n# Each batch element contains the column indices for optimal assignment\nassignments = solve_lap(cost_matrix)\n\n# Calculate total costs\nbatch_idxs = torch.arange(batch_size, device=assignments.device).unsqueeze(1)\nrow_idxs = torch.arange(size, device=assignments.device).unsqueeze(0)\ntotal_cost = cost_matrix[batch_idxs, row_idxs, assignments].sum()\n```\n\nThe solver also supports 2D inputs for single matrices:\n\n```python\n# Single cost matrix (N x N)\ncost_matrix = torch.randn((size, size), device=\"cuda\")\nassignments = solve_lap(cost_matrix)  # Shape: (size,)\n```\n\nIn case of having multiple GPUs, you can specify the device for lap solver using the `device` argument:\n\n```python\ncost_matrix = torch.randn((batch_size, size, size), device=\"cuda:0\")\nassignments = solve_lap(cost_matrix, device=\"cuda:1\")  # assignments will be on cuda:0\n```\n\n## Input Requirements\n\n- Cost matrices must be on a CUDA device\n- Input can be either 2D (N x N) or 3D (batch_size x N x N) \n- Matrices must be square\n- Supports single and double precision types: `torch.int32, torch.int64, torch.float32, torch.float64`\n\n## Benchmarks\n\nTests were performed on an INTEL(R) XEON(R) GOLD 6530 and NVIDIA A6000 Ada GPU with CUDA 12.5 and PyTorch 2.6.0.\n\n`Scipy (MP)` means multiprocessing version, `Scipy (MT)` means multithreading version, both used 32 processes/threads.\n\nTo run the benchmarks, execute:\n\n```bash\npython tests/benchmark.py\n```\n\n### Benchmark for uniform random distribution:\n\n![Benchmark results for uniform random cost matrices](figs/benchmark_uniform.png)\n\n### Benchmark for normal random distribution:\n\n![Benchmark results for normal random cost matrices](figs/benchmark_normal.png)\n\n### Benchmark for integer random distribution:\n\n![Benchmark results for integer random cost matrices](figs/benchmark_integer.png)\n\n## Testing\n\nTo run the test suite:\n\n```bash\npytest tests/\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdkobylianskii%2Ftorch-lap-cuda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdkobylianskii%2Ftorch-lap-cuda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdkobylianskii%2Ftorch-lap-cuda/lists"}