https://github.com/vmikk/distwiz
distwiz - Sparse to full matrix converter
https://github.com/vmikk/distwiz
Last synced: 2 months ago
JSON representation
distwiz - Sparse to full matrix converter
- Host: GitHub
- URL: https://github.com/vmikk/distwiz
- Owner: vmikk
- License: mit
- Created: 2024-04-01T17:14:31.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-03T18:23:11.000Z (about 1 year ago)
- Last Synced: 2024-04-04T16:02:45.364Z (about 1 year ago)
- Language: Go
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Sparse to full matrix converter
## Overview
This utility converts a sparse matrix
(such as those produced by `usearch -calc_distmx`)
into a full square matrix.## Notes and limitations
This project is experimental and may require further optimization to enhance performance.
Selection between the in-memory or disk-based processing methods
is based on the number of unique labels in the input data (10,000 sequences by default).- The in-memory solution requires approximately 100GB of RAM for 30,000 objects (equivalent to around 450 million pairwise distances)
- The disk-based solution, while significantly more memory-efficient, is I/O intensive and much slower.## Usage
```shell
distwiz --input mx.txt --output dist.txt.gz
```Supported arguments:
- `--input`: Path to the input file containing the sparse distance matrix
- `--output`: Path to the output file (GZIP-compressed)
- `--mode`: Processing mode: `auto`, `mem` (in-memory), or `disk` (disk-based). Default is `auto`
- `--compresslevel`: GZIP compression level (1-9). The default is 4## Installation
Compile the program using `go build`.