{"id":28236453,"url":"https://github.com/passionlab/elba","last_synced_at":"2025-07-22T11:04:07.059Z","repository":{"id":40293673,"uuid":"269807480","full_name":"PASSIONLab/ELBA","owner":"PASSIONLab","description":"Parallel String Graph Construction, Transitive Reduction, and Contig Generation for De Novo Genome Assembly","archived":false,"fork":false,"pushed_at":"2024-06-11T00:32:02.000Z","size":111240,"stargazers_count":16,"open_issues_count":0,"forks_count":10,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-06-10T13:49:02.276Z","etag":null,"topics":["combblas","genome","genome-assembly","hpc","hpc-applications","kmer-counting","longread","longread-aligner","overlapping","spgemm"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PASSIONLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"licenses/hipmer.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-06-05T23:26:27.000Z","updated_at":"2025-04-09T10:16:13.000Z","dependencies_parsed_at":"2025-06-10T13:43:10.545Z","dependency_job_id":null,"html_url":"https://github.com/PASSIONLab/ELBA","commit_stats":null,"previous_names":["passionlab/dibella.2d"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/PASSIONLab/ELBA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PASSIONLab%2FELBA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PASSIONLab%2FELBA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PASSIONLab%2FELBA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PASSIONLab%2FELBA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PASSIONLab","download_url":"https://codeload.github.com/PASSIONLab/ELBA/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PASSIONLab%2FELBA/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266481732,"owners_count":23935938,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["combblas","genome","genome-assembly","hpc","hpc-applications","kmer-counting","longread","longread-aligner","overlapping","spgemm"],"created_at":"2025-05-19T00:14:43.934Z","updated_at":"2025-07-22T11:04:07.052Z","avatar_url":"https://github.com/PASSIONLab.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ELBA\n## Parallel String Graph Construction, Transitive Reduction, and Contig Generation for De Novo Genome Assembly\n\n## Prerequisites\n\n1. Operating System.\n  * ELBA is tested and known to work on the following operating systems.\n    *  SUSE Linux Enterprise Server 15.\n    *  Ubuntu 14.10.\n    *  MacOS.\n    \n2. GCC/G++ version 8.2.0 or above.\n\n3. CMake 3.11 or above.\n\n## Dependencies\n    \n0. STILL NEED TO EDIT THIS, WILL BE INCONSISTENT WITH ACTUAL REPO FOR A BIT.\n\n1. CombBLAS.\n  * Download or clone CombBLAS from `https://github.com/PASSIONLab/CombBLAS.git`.\n  * Export the path to this directory as an environment variable `COMBBLAS_HOME`.\n   ```\n      git clone https://github.com/PASSIONLab/CombBLAS.git\n      export COMBBLAS_HOME=$PWD\n   ```\n  * The following commands can be used to build and install CombBLAS:\n  ```\n    cd $COMBBLAS_HOME/CombBLAS\n    mkdir build\n    mkdir install\n    cd build\n    cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install ../\n    make -j4\n    make install         \n  ```\n3. SeqAn (included in this repository).\n  * Create an environment variable, `SEQAN_HOME`, pointing to it:\n  ```\n    export SEQAN_HOME=/path/to/seqan\n    export BLOOM_HOME=src/libbloom/\n  ```\n  * This is a header only library, so there's no need to build it.\n\n# Build ELBA\nTo build ELBA, you can use the following commands:\n  ```\n    mkdir build_release\n    cd build_release\n    cmake ..\n    make -j4  \n  ```\nDefault macro definition in CMakeFiles.txt:\n  ```\n    #define MAX_KMER_SIZE  32\n    #define LOWER_KMER_FREQ 2\n    #define UPPER_KMER_FREQ 8\n  ```\nBased on the dataset, one might want to change the above definitions. **UPPER_KMER_FREQ**: reliable k-mer upper bound (8 works for E. coli (Sample) 30X and 4 for Human 10X and C. elegans 40X that you can find [here](https://portal.nersc.gov/project/m1982/dibella.2d/inputs/)), **LOWER_KMER_FREQ**: reliable k-mer lower bound.\n\nYou can change the defaul setting at compile time when building using the following command instead of ```cmake ..```:\n```\ncmake -DLOWER_KMER_FREQ=\u003cnew-lower-bound\u003e -DUPPER_KMER_FREQ=\u003cnew-upper-bound\u003e .. \n```\n\n# Run ELBA\n\nYou can run ELBA in parallel by specifying the number of processes to the mpirun or mpiexec command. The number of processes must be perfect square value.\n\n## Input data samples\nA few input data sets can be downloaded [here](https://portal.nersc.gov/project/m1982/dibella.2d/inputs/). If you have your own FASTQs, you can convert them into FASTAs using [seqtk](https://github.com/lh3/seqtk):\n\n  ```\n    cd ../seqtk\n    ./seqtk seq -a \u003cname\u003e.fastq/fq \u003e \u003cname\u003e.fa\n  ```\nA tiny example `ecsample-sub1.fa` can be found in this repository.\n\n## Ready to run\nThe parameters and options of ELBA are as follows:\n- ```-i \u003cstring\u003e```: Input FASTA file.\n- ```-c \u003cinteger\u003e```: Number of sequences in the FASTA file.\n- ```--sc \u003cinteger\u003e```: Seed count. ```[default: 2]```\n- ```-k \u003cinteger\u003e```: K-mer length.\n- ```-s \u003cinteger\u003e```: K-mers stride. ```[default: 1]```\n- ```--ma \u003cinteger\u003e```: Base match score (positive). ```[default: 1]```\n- ```--mi \u003cinteger\u003e```: Base mismatch score (negative). ```[default: -1]```\n- ```-g \u003cinteger\u003e```: Gap open penalty (negative). ```[default: 0]```\n- ```-e \u003cinteger\u003e```: Gap extension penalty (negative). ```[default: -1]```\n- ```-O \u003cinteger\u003e```: Number of bytes to overlap when reading the input file in parallel. ```[default: 10000]```\n- ```--afreq \u003cinteger\u003e```: Alignment write frequency.\n- ```--na```: Do not perform alignment.\n- ```--fa```: Full Smith-Waterman alignment.\n- ```--xa \u003cinteger\u003e```: X-drop alignment with the indicated drop value.\n- ```--of \u003cstring\u003e```: Overlap file.\n- ```--af \u003cstring\u003e```: Output file to write alignment information. \n- ```--idxmap \u003cstring\u003e```: Output file for input sequences to ids used in ELBA.\n- ```--alph \u003cdna|protein\u003e```: Alphabet.\n\n## Run test program\nYou can run the test dataset ```ecsample-sub1.fa``` as follows on one node (it's too small to run on multiple nodes), this command runs ELBA using x-drop alignment and ```x = 5```:\n```\nexport OMP_NUM_THREADS=1\nmpirun -np 1 ./elba -i /path/to/ecsample-sub1.fa -k 17 --idxmap elba-test -c 135 --alph dna --of overlap-test --af alignment-test -s 1 -O 100000 --afreq 100000 --xa 5\n```\nTo run on multiple nodes, for example on 4 nodes using 4 MPI rank/node, please download ```ecsample30x.fa``` from [here](https://portal.nersc.gov/project/m1982/dibella.2d/inputs/) and run as follows:\n```\nexport OMP_NUM_THREADS=1\nmpirun -np 16 ./elba -i /path/to/ecsample30x.fa -k 17 --idxmap elba-ecsample -c 16890 --alph dna --of overlap-ecsample --af alignment-ecsample -s 1 -O 100000 --afreq 100000 --xa 5\n```\nYou need to use a perfect square number of processes to match our 2D decomposition. Recall ```-c``` should match the number of sequences in the input FASTA.\n\n# Citation\nTo cite our work or to know more about our methods, please refer to:\n\n\u003e Giulia Guidi, Oguz Selvitopi, Marquita Ellis, Leonid Oliker, Katherine Yelick, Aydın Buluç. [Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly](https://arxiv.org/pdf/2010.10055.pdf). Proceedings of the IPDPS, 2021.\n\n\u003e Giulia Guidi, Gabriel Raulet, Daniel Rokhsar, Leonid Oliker, Katherine Yelick, Aydın Buluç. [Distributed-Memory Parallel Contig Generation for De Novo\nLong-Read Genome Assembly](https://arxiv.org/pdf/2207.04350.pdf). Proceedings of the ICPP, 2022.\n\nFurther design choices and results in terms of accuracy can be found here:\n\n\u003e Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, Aydın Buluç. [BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper](https://drive.google.com/file/d/132i0RAKyIIWk_BEl1jpf9R_V5eVkKkxT/view). bioRxiv 464420; doi: https://doi.org/10.1101/464420. Proceedings of the SIAM ACDA, 2021.\n\n# Copyright\n\ndiBELLA 2D: Parallel String Graph Construction and Transitive Reduction for De Novo Assembly (diBELLA 2D) Copyright (c) 2021, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy).  All rights reserved.\n\nIf you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Intellectual Property Office at IPO@lbl.gov.\n\nNOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit others to do so.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpassionlab%2Felba","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpassionlab%2Felba","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpassionlab%2Felba/lists"}