{"id":13756328,"url":"https://github.com/edawson/gfakluge","last_synced_at":"2025-06-29T14:32:05.325Z","repository":{"id":45120315,"uuid":"47132080","full_name":"edawson/gfakluge","owner":"edawson","description":"A C++ library and utilities for manipulating the Graphical Fragment Assembly format.","archived":false,"fork":false,"pushed_at":"2022-05-17T21:07:45.000Z","size":1965,"stargazers_count":54,"open_issues_count":22,"forks_count":20,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-06-27T02:42:06.101Z","etag":null,"topics":["genomics","gfa","graph-representation","parsing"],"latest_commit_sha":null,"homepage":"http://edawson.github.io/gfakluge/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/edawson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json","zenodo":null}},"created_at":"2015-11-30T16:40:53.000Z","updated_at":"2025-02-13T17:09:40.000Z","dependencies_parsed_at":"2022-09-16T13:51:10.518Z","dependency_job_id":null,"html_url":"https://github.com/edawson/gfakluge","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/edawson/gfakluge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edawson%2Fgfakluge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edawson%2Fgfakluge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edawson%2Fgfakluge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edawson%2Fgfakluge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/edawson","download_url":"https://codeload.github.com/edawson/gfakluge/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edawson%2Fgfakluge/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262357384,"owners_count":23298448,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["genomics","gfa","graph-representation","parsing"],"created_at":"2024-08-03T11:00:41.986Z","updated_at":"2025-06-29T14:32:05.267Z","avatar_url":"https://github.com/edawson.png","language":"C++","funding_links":[],"categories":["A list of software capable of analyzing mainly **eukaryotic** genomes for pangenomics."],"sub_categories":[],"readme":"gfakluge\n--------------------\n\n[![DOI](https://zenodo.org/badge/47132080.svg)](https://zenodo.org/badge/latestdoi/47132080)\n[![status](http://joss.theoj.org/papers/d731f6dfc6b77013caaccfd8333c684a/status.svg)](http://joss.theoj.org/papers/d731f6dfc6b77013caaccfd8333c684a)\n\n[![Build Status](https://dev.azure.com/ericco92/ericco92/_apis/build/status/edawson.gfakluge?branchName=master)](https://dev.azure.com/ericco92/ericco92/_build/latest?definitionId=3\u0026branchName=master)  \n\n\n## What is it?  \nGFAKluge is a C++ parser/writer and\na set of command line utilities for manipulating [GFA files](http://lh3.github.io/2014/07/19/a-proposal-of-the-grapical-fragment-assembly-format).\nIt parses GFA to a set of data structures that represent the encoded graph.\nYou can use these components and their fields/members to build up your own\ngraph representation. You can also convert between GFA 0.1 \u003c-\u003e 1.0 \u003c-\u003e 2.0\nto glue programs that use different GFA versions together.\n\n\n**Homepage**: https://github.com/edawson/gfakluge  \n**License**: MIT  \n\n## Dependencies\nA C++11 compliant compiler (we recommend GCC or clang)  \nOpenMP (via GCC or clang)  \n**NB**: GFAKluge cannot be compiled with Apple clang, as it does not include OpenMP.\n\n## Command line utilities\nWhen `make` is run, the `gfak` binary is built in the top level directory. It offers the following subcommands:  \n+ gfak extract : transform the GFA segment lines to a FASTA file.  \n+ gfak fillseq : fill in the sequence field of S lines with placeholders using sequences from a FASTA file.\n+ gfak diff : check if two GFA files are different (not very sophisticated at the moment)  \n+ gfak sort : change the line order of a GFA file so that lines proceed in\nHeader -\u003e Segment -\u003e Link/Edge/Containment -\u003e Path order.  \n+ gfak convert : convert between the different GFA specifications (e.g. GFA1 -\u003e GFA2).  \n+ gfak stats : get the assembly stats of a GFA file (e.g. N50, L50)  \n+ gfak subset : extract a subgraph between two Segment IDs in a GFA file.  \n+ gfak ids : manually coordinate / increment the ID spaces of two graphs, so that they can be concatenated.  \n+ gfak merge : merge (i.e. concatenate) multiple GFA files. NB: Obliterates nodes with the same ID.  \n\nFor CLI usage, run any of the above (including `gfak` with no subcommand) with no arguments or `-h`. To change specification version, most commands take the `-S` flag and a single `double` argument.  \n\n## Example CLI Usage\nExamples of various commands are included in the [examples.md file](https://github.com/edawson/gfakluge/blob/master/examples.md).\n\n## C++ API\nExamples of the C++ API are included in the [interface.md file](https://github.com/edawson/gfakluge/blob/master/interface.md).  \n\n## How do I build it?  \nThe `gfak` utilities are available via homebrew: `brew install brewsci/bio/gfakluge`  \n\nBuilding GFAKluge from source requires OpenMP. This should be supported on Linux by default. On Apple Mac OS X, we recommend installing gcc:  \n\n```\nbrew install gcc@8\nmake CXX=g++-8\n```  \nor  \n```\nsudo port install gcc8\nmake\n```\n\nYou can then build libgfakluge and the command line `gfak` utilities by typing ``make`` in the repo.  \nTo use GFAKluge in your program, you'll need to\nadd a few lines to your code. First, add the necessary include line to your C++ code:  \n                #include \"gfakluge.hpp\"\n\nNext, make sure that the library is on the proper system paths and compile line:\n\n                g++ -o my_exe my_exe.cpp -L/path/to/gfakluge/ -lgfakluge\n\n\nYou should then be able to parse and manipulate gfa from your program:  \n\n                    gg = GFAKluge();\n                    gg.parse_gfa_file(my_gfa_file); \n\n                    cout \u003c\u003c gg \u003c\u003c endl;\n\n\n## Why gfak / gfakluge?\n+ Simple command line utilities (no awk foo needed!)  \n+ High level C++ API for many graph manipulations.  \n+ Easy to build - no external dependencies; build with just a modern C++ compiler supporting C++11.\n+ Easy to develop with - Backing library is mostly STL containers and a handful of structs.  \n+ Performance - gfakluge is fast and relies on standard STL containers and basic structs.  \n\n\n## Internal Structures\nInternally, lines of GFA are represented as structs with member variables that correspond to their defined fields.\nHere's the definition for a sequence line, for example:\n\n                struct sequence_elem{\n                    std::string seq;\n                    std::string name;\n                    map\u003cstring, string\u003e opt_fields;\n                    long id;\n                };\n\nThe structs for contained elements, link elements, and alignment elements are very similar. These individual structs\nare then wrapped in a set of standard containers for easy access:\n\n                map\u003cstd::string, std::string\u003e header;\n                map\u003cstring, sequence_elem\u003e name_to_seq;\n                map\u003cstd::string, vector\u003ccontained_elem\u003e \u003e seq_to_contained;\n                map\u003cstd::string, vector\u003clink_elem\u003e \u003e seq_to_link;\n                map\u003cstring, vector\u003calignment_elem\u003e \u003e seq_to_alignment;\n\nAll of these structures can be accessed using the ``get_\u003cThing\u003e`` method, where \\\u003cThing\\\u003e is the name of the map you would like to retrieve.\nThey reside in gfakluge.hpp.  \n\n## GFA2\nGFAKluge now supports GFA2! This brings with it four new structs: `edge_elem`, `gap_elem`, `fragment_elem`, and `group_elem`. They're contained in maps much like those for the GFA1 structs.  \n\nA few caveats apply:  \n    1. As GFA2 is a **superset** of GFA1, we support only support legal GFA2 -\u003e GFA1 conversions. Information can be lost along the way (e.g. unordered groups won't be output).\n    2. Our GFA2 testing is a bit limited but we've verified several times to be on-spec.\n\nTags we specifically do not (i.e. cannot) support in GFA2 -\u003e GFA1 conversion: G - gap, U - unordered group, F - fragment.\nLinks and containments should get converted to edges correctly. Sequence elements should get converted, but watch out for the length field if you hit issues.\n\nGFAKluge is fully compliant with reading GFA2 and GFA0.1 \u003c-\u003e GFA1.0 -\u003e GFA2.0 conversion as of September 2017.\n\n## Reading GFA\n                GFAKluge gg;\n                gg.parse_gfa_file(\"my_gfa.gfa\");\n\nYou can then iterate over the aforementioned maps/structs and build out your own graph representation.\n\nI'm working on a low-memory API for reading lines / emitting structs but it won't be this pretty.\n\n## Writing GFA\n                GFAKluge og;\n\n                sequence_elem s;\n                s.sequence = \"GATTACA\";\n                s.name = \"seq1\";\n                og.add_sequence(s);\n\n                sequence_elem t;\n                t.sequence = \"AATTGN\";\n                t.name = \"seq2\";\n                og.add_sequence(t);\n\n                link_elem l;\n                l.source = s.name;\n                l.sink = s.name;\n                l.source_orientation_forward = true;\n                l.sink_orientation_forward = true;\n                l.pos = 0;\n                l.cigar = \"\";\n\n                og.add_link(l.source, l);\n\n                cout \u003c\u003c og \u003c\u003c endl;\n                ofstream f = ofstream(\"my_file.gfa);\n                // Write GFA1\n                f \u003c\u003c og;\n\n                // To convert to GFA2:\n                og.set_version(2.0);\n                f \u003c\u003c od;\n\n## Status\n- GFAKluge is essentially a set of dumb containers - it does no error checking of your structure to detect if it is\nvalid GFA. This may change as the GFA spec becomes more formal.  \n- Diff is not a useful tool yet.\n- Parses JSON structs in optional fields of sequence lines (just as strings though).  \n- Full GFA1/GFA2 compatibility and interconversion is now implemented.  \n- CLI has been refactored to a single executable\n- Memory usage for to\\_string is a bit high - be careful with large graphs.\n- API for input / spec conversion / output is stable. API for merging graphs and coordinating ID namespaces may change slightly, but will strive for backwards compatibility.\n\n\n## Getting Help \nEric T Dawson   \ngithub: [edawson](https://github.com/edawson/https://github.com/edawson/GFAKluge)   \nPlease post an issue for help.\n\n## Contributing\nGFAKluge is open-source and community contributions are welcome and appreciated! Please keep the following in mind when contributing to the repo:  \n\n1. Please treat others with kindness and professionalism. Everyone is welcome and we will not tolerate harassment for any reason.\n2. Please keep `gfakluge.hpp` header-only and update the build process if a modification alters it.  \n3. Please update the dependency list if one is added.  \n4. Please use semantic versioning. Minor changes bump the third versioning digit (e.g. 1.0.0 -\u003e 1.0.1).  \nAdditional features, or changes that may or may not partially break backward compatibility \nbut which do not require significant modifications to code depending on the library bump the second versioning digit (e.g. 1.0.0 -\u003e 1.1.0).  \nChanges which signficantly alter the API require a bump in the major version digit (e.g. 1.0.0 -\u003e 2.0.0).\n5. Please fully specify all namespace items (e.g. `std::stream` in place of just `stream`).  \n6. To incorporate changes, please file a pull request on the Github page.\n7. Bug reports or feature requests should be posted as \"issues\" on the Github page with the appropriate tag and referenced in any relevant pull requests.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fedawson%2Fgfakluge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fedawson%2Fgfakluge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fedawson%2Fgfakluge/lists"}