{"id":18189934,"url":"https://github.com/kullrich/korthocpp","last_synced_at":"2025-04-07T14:50:25.613Z","repository":{"id":225908737,"uuid":"767194097","full_name":"kullrich/korthoCPP","owner":"kullrich","description":"korthoCPP calculates pairwise kmer jaccard distance between two peptide fasta files","archived":false,"fork":false,"pushed_at":"2024-03-05T07:35:48.000Z","size":615,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-13T17:29:03.926Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kullrich.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-04T21:35:45.000Z","updated_at":"2024-03-04T21:37:53.000Z","dependencies_parsed_at":"2024-12-21T04:29:06.706Z","dependency_job_id":"7e7f171e-8b61-444e-a02a-3db71d2f884c","html_url":"https://github.com/kullrich/korthoCPP","commit_stats":{"total_commits":14,"total_committers":1,"mean_commits":14.0,"dds":0.0,"last_synced_commit":"5238ba46db4f3693796700ebf217b45f54ad7bd6"},"previous_names":["kullrich/korthocpp"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kullrich%2FkorthoCPP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kullrich%2FkorthoCPP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kullrich%2FkorthoCPP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kullrich%2FkorthoCPP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kullrich","download_url":"https://codeload.github.com/kullrich/korthoCPP/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247675627,"owners_count":20977376,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-03T04:04:36.355Z","updated_at":"2025-04-07T14:50:25.595Z","avatar_url":"https://github.com/kullrich.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# korthoCPP \u003ca href=\"https://github.com/kullrich/korthoCPP\"\u003e\u003cimg src=\"man/figures/korthoCPP_logo.png\" align=\"right\" height=\"160\" /\u003e\u003c/a\u003e\nkorthoCPP calculates pairwise kmer jaccard distance between two peptide fasta files\n\n## Download and compile\n\n```\ngit clone https://github.com/kullrich/korthoCPP.git\ncd korthoCPP \u0026\u0026 make clean \u0026\u0026 make\n```\n\n## Usage\n\n```\nUsage: korthoCPP [options] -q \u003cquery.fasta\u003e -t \u003ctarget.fasta\u003e\nOptions:\n  -q FILE    query peptide fasta file\n  -t FILE    target peptide fasta file\n  -o FILE    output file to write jaccard distances\n  -k INT     kmer length [default: 6]\n  -m DOUBLE  min jaccard distance to report pair [default: 0.01]\n  -s DOUBLE  sparse threshold to switch search strategy [defualt: 0.1]\n  -n INT     number of kmers to check for sparse [default: 20]\n  -p INT     number of threads [default: 1]\n  -d         debug\n```\n\n## Compare two peptide fasta files and calcualte jaccard distances\n\n```\nkorthoCPP -q query.fasta -t target.fasta\n```\n\n### use multiple threads\n\n```\nkorthoCPP -q query.fasta -t target.fasta -p 2\n```\n\n### use different min jaccard\n\n```\nkorthoCPP -q query.fasta -t target.fasta -m 0.02\n```\n\n## Example\n\n```\nwget https://www.pseudomonas.com/downloads/pseudomonas/pgd_r_22_1/Pseudomonas_aeruginosa_PAO1_107/Pseudomonas_aeruginosa_PAO1_107.faa.gz\nwget https://www.pseudomonas.com/downloads/pseudomonas/pgd_r_22_1/Pseudomonas_fluorescens_SBW25_116/Pseudomonas_fluorescens_SBW25_116.faa.gz\ngunzip Pseudomonas_aeruginosa_PAO1_107.faa.gz\ngunzip Pseudomonas_fluorescens_SBW25_116.faa.gz\n./korthoCPP -q Pseudomonas_aeruginosa_PAO1_107.faa -t Pseudomonas_fluorescens_SBW25_116.faa -p 4 -d\n```\n\n```\nTime taken: kmer extraction 7059 milliseconds\nnumber of sequences Q: 5586\nnumber of sequences T: 5861\nTime taken: QkmerMap creation 7091 milliseconds\nnumber of kmers QkmerMap: 1612859\nTime taken: TkmerMap creation 8469 milliseconds\nnumber of kmers TkmerMap: 1734068\nnumber of sparse candidate pairs: 345\nsparse value: 1.05377e-05 \u003c sparse threshold: 0.1 \u003e\u003e\u003e search strategy one vs one\nTime taken: check sparse threshold 1 milliseconds\nTime taken: candidate pairs creation 1423 milliseconds\nnumber of candidate pairs: 409274\nTime taken: distance calculation 44714 milliseconds\n```\n\n```\nhead output.txt\n###\nqname\ttname\tjaccard\tmash\tani\tsumdist\nPA0034\tPFLU1157\t0.0303797\t0.471793\t0.528207\t0.941032\nPA0032\tPFLU0028\t0.087344\t0.304749\t0.695251\t0.839344\nPA0029\tPFLU0025\t0.057554\t0.369641\t0.630359\t0.891156\nPA0030\tPFLU0026\t0.0470383\t0.401602\t0.598398\t0.91015\nPA0031\tPFLU0027\t0.215854\t0.172576\t0.827424\t0.644935\nPA0035\tPFLU0035\t0.272947\t0.141111\t0.858889\t0.571157\nPA0036\tPFLU0036\t0.473297\t0.0737314\t0.926269\t0.3575\nPA0037\tPFLU0037\t0.110476\t0.269099\t0.730901\t0.801029\nPA0038\tPFLU0039\t0.147826\t0.226074\t0.773926\t0.742424\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkullrich%2Fkorthocpp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkullrich%2Fkorthocpp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkullrich%2Fkorthocpp/lists"}