{"id":20031749,"url":"https://github.com/astrodynamic/dna_analazer-algorithms-for-working-with-text-in-cpp","last_synced_at":"2025-09-20T07:31:47.219Z","repository":{"id":155833119,"uuid":"626990418","full_name":"Astrodynamic/DNA_Analazer-Algorithms-for-working-with-text-in-CPP","owner":"Astrodynamic","description":"This project implements substring search and sequence alignment algorithms for molecular sequences analysis. It includes the Rabin-Karp algorithm for substring search and the Needleman-Wunsch algorithm for sequence alignment. Developed in C++17, the code follows Google Style and includes a Makefile for building and testing the program.","archived":false,"fork":false,"pushed_at":"2023-05-09T16:10:11.000Z","size":891,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"develop","last_synced_at":"2025-04-03T03:51:12.703Z","etag":null,"topics":["algorithms","analayze","cmake","cmakelists","console-application","console-applications","cpp","cpp17","dna","dna-sequences","hashing","learning","makefile","rabin-karp-algorithm","regex","reusable","testing","text-algorithms","text-summarization"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Astrodynamic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-12T15:02:47.000Z","updated_at":"2024-02-23T04:40:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"77083d67-84af-4802-b8d1-7e5e78a42546","html_url":"https://github.com/Astrodynamic/DNA_Analazer-Algorithms-for-working-with-text-in-CPP","commit_stats":null,"previous_names":["astrodynamic/dna_analazer-algorithms-for-working-with-text-in-cpp"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Astrodynamic/DNA_Analazer-Algorithms-for-working-with-text-in-CPP","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Astrodynamic%2FDNA_Analazer-Algorithms-for-working-with-text-in-CPP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Astrodynamic%2FDNA_Analazer-Algorithms-for-working-with-text-in-CPP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Astrodynamic%2FDNA_Analazer-Algorithms-for-working-with-text-in-CPP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Astrodynamic%2FDNA_Analazer-Algorithms-for-working-with-text-in-CPP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Astrodynamic","download_url":"https://codeload.github.com/Astrodynamic/DNA_Analazer-Algorithms-for-working-with-text-in-CPP/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Astrodynamic%2FDNA_Analazer-Algorithms-for-working-with-text-in-CPP/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276064222,"owners_count":25578997,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-20T02:00:10.207Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","analayze","cmake","cmakelists","console-application","console-applications","cpp","cpp17","dna","dna-sequences","hashing","learning","makefile","rabin-karp-algorithm","regex","reusable","testing","text-algorithms","text-summarization"],"created_at":"2024-11-13T09:34:35.419Z","updated_at":"2025-09-20T07:31:46.951Z","avatar_url":"https://github.com/Astrodynamic.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text Algorithms in CPP\n\nText Algorithms is a C++ project that implements substring search and sequence alignment algorithms. This project can be useful for bioinformatics and other full-text search tasks.\n\n## Dependencies\n\nThe project requires the following dependencies:\n\n- CMake \u003e= 3.15\n- C++17-compatible compiler\n\n## Build\n\nTo build the project, follow these steps:\n\n1. Clone the repository:\n\n```bash\ngit clone https://github.com/your-username/TextAlgorithms.git\n```\n\n2. Navigate to the project directory:\n\n```bash\ncd TextAlgorithms\n```\n\n3. Run the following commands:\n\n```bash\ncmake -S . -B ./build\ncmake --build ./build\n```\n\n## Usage\n\n\n\u003cimg align=\"center\" src=\"img/init.png\" alt=\"Alt Text\" style=\"display:block; margin:auto;\"\u003e\n\n\n### Substring Search\n\nThe project implements the Rabin-Karp algorithm for substring search. To use it, include the `SubstringSearch.h` header and call the `rabinKarp` function with the haystack and needle strings:\n\n```cpp\n#include \"SubstringSearch.h\"\n\n// ...\n\nstd::string haystack = \"Madam, I'm Adam\";\nstd::string needle = \"am\";\nstd::vector\u003cint\u003e matches = rabinKarp(haystack, needle);\n// matches contains the positions of the needle occurrences in the haystack\n```\n\n### Sequence Alignment\n\nThe project implements the Needleman-Wunsch algorithm for sequence alignment. To use it, include the `SequenceAlignment.h` header and call the `needlemanWunsch` function with the two sequences and the similarity matrix:\n\n```cpp\n#include \"SequenceAlignment.h\"\n\n// ...\n\nstd::string seq1 = \"GGGCGACACTCCACCATAGA\";\nstd::string seq2 = \"GGCGACACCCACCATACAT\";\nstd::vector\u003cstd::string\u003e alignment = needlemanWunsch(seq1, seq2, similarityMatrix);\n// alignment contains the two sequences aligned with gaps\n```\n\n## Examples\n\n### Substring Search\n\nFind all occurrences of the string \"AAGCCTCTCAAT\" in the HIV virus sequence:\n\n```cpp\n#include \"SubstringSearch.h\"\n#include \u003cfstream\u003e\n#include \u003ciostream\u003e\n\nint main() {\n  std::ifstream file(\"HIV.txt\");\n  std::string haystack((std::istreambuf_iterator\u003cchar\u003e(file)), std::istreambuf_iterator\u003cchar\u003e());\n  std::string needle = \"AAGCCTCTCAAT\";\n  std::vector\u003cint\u003e matches = rabinKarp(haystack, needle);\n  for (int match : matches) {\n    std::cout \u003c\u003c \"Match at position \" \u003c\u003c match \u003c\u003c std::endl;\n  }\n  return 0;\n}\n```\n\n### Sequence Alignment\n\nAlign two DNA sequences using a similarity matrix:\n\n```cpp\n#include \"SequenceAlignment.h\"\n#include \u003ciostream\u003e\n\nint main() {\n  std::string seq1 = \"GGGCGACACTCCACCATAGA\";\n  std::string seq2 = \"GGCGACACCCACCATACAT\";\n  std::vector\u003cstd::string\u003e alignment = needlemanWunsch(seq1, seq2, similarityMatrix);\n  std::cout \u003c\u003c alignment[0] \u003c\u003c std::endl \u003c\u003c alignment[1] \u003c\u003c std::endl;\n  return 0;\n}\n```\n\n### Matching regular expressions\n\nThe program checks whether a sequence over the alphabet `{A, C, G, T}` matches a regular expression. \\\nThe input of the program is a file with *two* lines. The first line contains the sequence to be checked for a match. The second line contains a pattern that includes characters from the alphabet and the following characters:\n- `.` -- matches any single character from the alphabet;\n- `?` -- matches any single character from the alphabet or the absence of a character;\n- `+` -- matches zero or more repetitions of the previous element;\n- `*` -- matches any sequence of characters from the alphabet or the absence of characters.\n\nThe output of the program is *True*/*False* - whether the given sequence matches the pattern.\n\nExample input:\n```\nGGCGACACCCACCATACAT\nG?G*AC+A*A.\n```\n\nExample output:\n```\nTrue\n```\n\n### K-similar strings\n\nStrings s1 and s2 are k-similar (for some non-negative integer *k*) if it is possible to swap two letters in s1 exactly *k* times so that the resulting string is equal to s2.\n\nThe program checks k-similarity of two sequences over the alphabet `{A, C, G, T}`. \\\nThe input of the program is a file with *two* lines. The output of the program is the smallest *k* for which s1 and s2 are k-similar. If the strings are not anagrams, print an error message.\n\nExample input:\n```\nGGCGACACC\nAGCCGCGAC\n```\n\nExample output:\n```\n3\n```\n\n### Minimum Window Substring\n\nA program for finding the minimum window substring for a sequence over the alphabet `{A, C, G, T}`.\nThe input to the program is a file containing *two* lines: s and t. A window substring of string s is a substring that contains all characters present in string t (including duplicates).\nThe output of the program is the minimum length window substring. If there is no window substring, return an empty string.\n\nExample input:\n```\nGGCGACACCCACCATACAT\nTGT\n```\n\nExample output:\n```\nGACACCCACCATACAT\n```\n\n## License\n\nThis project is licensed under the terms of the MIT license. See [LICENSE](LICENSE) for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrodynamic%2Fdna_analazer-algorithms-for-working-with-text-in-cpp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fastrodynamic%2Fdna_analazer-algorithms-for-working-with-text-in-cpp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrodynamic%2Fdna_analazer-algorithms-for-working-with-text-in-cpp/lists"}