{"id":15285827,"url":"https://github.com/sirwumpus/erlang-fgrep","last_synced_at":"2026-01-04T19:36:50.110Z","repository":{"id":57493721,"uuid":"88000745","full_name":"SirWumpus/erlang-fgrep","owner":"SirWumpus","description":"Erlang version of fgrep(1).","archived":false,"fork":false,"pushed_at":"2023-01-04T12:00:07.000Z","size":7,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-23T21:13:39.353Z","etag":null,"topics":["erlang","fgrep","string-search"],"latest_commit_sha":null,"homepage":null,"language":"Erlang","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SirWumpus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-04-12T02:28:04.000Z","updated_at":"2023-01-04T12:00:10.000Z","dependencies_parsed_at":"2023-02-02T11:30:48.615Z","dependency_job_id":null,"html_url":"https://github.com/SirWumpus/erlang-fgrep","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SirWumpus%2Ferlang-fgrep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SirWumpus%2Ferlang-fgrep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SirWumpus%2Ferlang-fgrep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SirWumpus%2Ferlang-fgrep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SirWumpus","download_url":"https://codeload.github.com/SirWumpus/erlang-fgrep/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245168912,"owners_count":20571804,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["erlang","fgrep","string-search"],"created_at":"2024-09-30T15:07:46.758Z","updated_at":"2026-01-04T19:36:50.073Z","avatar_url":"https://github.com/SirWumpus.png","language":"Erlang","funding_links":[],"categories":[],"sub_categories":[],"readme":"Erlang fgrep(1)\n===============\n\n```\nusage: efgrep [-Fln][-k max] string [file ...]\n\n-F              frame the found string in square brackets\n-l              list files with a matching line\n-k max          max. number of pattern mismatches\n-n              number the output lines\n```\n\nThis is intended more of an Erlang learning exercise than anything really practical.\n\nDescription\n-----------\n\nThe paper on [Approximate Boyer-Moore String Matching][TU90] uses the [Boyer-Moore-Horspool][WKHORS] algorithm to implement two approximate string matching alogrithms for k-mismatches and k-differences.  However, [Daniel Sunday's][DMS90] Boyer-Moore [Quick Search][SUNQS] variant is slightly more efficient than [Horspool's][HORSPOOL].\n\nThis program implements a generalised Boyer-Moore-Sunday approximate string matching for k-mismatches.  For k=0, the program performs exact string searching.  The implemntation turns out to be slightly easier than the Horspool version presented by Tarhio \u0026 Ukkonen or [Kuei-Hao Chen's slides][KHC1].\n\nThe Sunday alogrithm noted that the text character one past the pattern window will factor into the next round of comparisions and can be used to determine the offset of the next pattern window.  Whereas Horspool relied on the text character at the end of the current pattern window to compute the offset of the next window.  Sunday precomputes a bad-character shift table where every character not in the pattern is assigned the pattern length plus one and every character in the pattern is assigned its offset from the right of its right most occurence plus one.  \n\nSo for the pattern \"AGCT\", m (the pattern length) is 4 and for k = 0 mismatches, the shift table looks like:\n\n        k  A C G T *\n        -------------\n        0: 4 2 3 1 5\n\nThe shift table for k \u003e 0 mismatches works simply by shortening the pattern length by k from the right.  So the shift table for k = 1, looks like:\n\n        k  A C G T *\n        -------------\n        0: 4 2 3 1 5\n        1: 3 1 2 4 4\n\nThe pattern is compared with the text from left-to-right.  If a mismatch occurs, the text character just right of the pattern window is used to determine the pattern window shift; otherwise the pattern matches and the position in the text can be reported.  Note the Sunday algorithm can do the character comparasions in any order, unlike regular Boyer-Moore which is strictly right-to-left order.\n\n\nExamples\n--------\n\nThese examples show how the pattern window shifts each iteration for values of 0 \u003c= k \u003c m.  A dot (.) indicates a mismatch and equals (=) a match.  \n\n\n* k=0 m=4 pat=AGCT\n\n        k  A C G T *\n        ------------\n        0: 4 2 3 1 5\n        \n        T T A A C G T A A T G C A G C T A\n            ^   ^ ^       ^   ^\n        A G C T\n        .\n                2 = shift[0][C]\n        \n            A G C T\n            = .\n                    1 = shift[0][T]\n        \n              A G C T\n              = .\n                      4 = shift[0][A]\n        \n                      A G C T\n                      = .\n                              2 = shift[0][C]\n        \n                          A G C T\n                          .\n                                  3 = shift[0][G]\n        \n                                A G C T\n                                = = = =\n\n\n* k=1 m=4 pat=AGCT\n\n        k  A C G T *\n        -------------\n        0: 4 2 3 1 5\n        1: 3 1 2 4 4\n        \n        T T A A C G T A A T G C A G C T A\n        \n        A G C T\n        . .\n              3 2    min(shift[1][A], shift[0][C])\n        \n            A G C T\n            = . = .\n                  2 1    min(shift[1][G], shift[0][T])\n        \n              A G C T\n              = . .\n                    4 4    min(shift[1][T], shift[0][A])\n        \n                      A G C T\n                      = . .\n                            2 2    min(shift[1][G], shift[0][C])\n        \n                          A G C T\n                          . = = .\n                                3 3    min(shift[1][A], shift[0][G])\n        \n                                A G C T\n                                = = = =\n\n\n* k=2 m=4 pat=AGCT\n\n        k  A C G T *\n        -------------\n        0: 4 2 3 1 5\n        1: 3 1 2 4 4\n        2: 2 3 1 3 3\n        \n        T T A A C G T A A T G C A G C T A\n        \n        A G C T\n        . . .\n            2 3 2   min(shift[2][A], shift[1][A], shift[0][C])\n        \n            A G C T\n            = . = .\n        \n\n* k=3 m=4 pat=AGCT\n\n        k  A C G T *\n        -------------\n        0: 4 2 3 1 5\n        1: 3 1 2 4 4\n        2: 2 3 1 3 3\n        3: 1 2 2 2 2\n        \n        T T A A C G T A A T G C A G C T A\n        \n        A G C T\n        . . . .\n          2 2 3 2    min(shift[3][T], shift[2][A], shift[1][A], shift[0][C])\n        \n            A G C T\n            = . = .\n        \n\n* k=0 m=8 pat=GCAGAGAG\n\n        k  A C G T *\n        -------------\n        0: 2 7 1 9 9\n\n        G C A T C G C A G A G C G T A T G C A G A G A G\n\n        G C A G A G A G\n        = = = .\n                        1\n          G C A G A G A G\n          .\n                          2\n              G C A G A G A G\n              .\n                              7\n                            G C A G A G A G\n                            = = .\n                                            2\n                                G C A G A G A G\n                                = .\n                                                2\n                                    G C A G A G A G\n                                    .\n                                                    2\n                                        G C A G A G A G\n                                        = = = = = = = =\n\n\n* k=1 m=8 pat=GCAGAGAG\n\n        k  A C G T *\n        -------------\n        0: 2 7 1 9 9\n        1: 1 6 2 8 8\n        \n        G C A T C G C A G A G C G T A T G C A G A G A G\n        \n        G C A G A G A G\n        = = = . .\n                      1 1\n          G C A G A G A G\n          . .\n                        2 2\n              G C A G A G A G\n              . = .\n                            2 7\n                  G C A G A G A G\n                  = = = = = = . =\n\n\nReferences\n----------\n\n\"A very fast substring search algorithm\";  \nDaniel M. Sunday; Communications ofthe ACM; August 1990;  \n\u003chttps://csclub.uwaterloo.ca/~pbarfuss/p132-sunday.pdf\u003e\n\n\"Approximate Boyer-Moore String Matching\";  \nJorma Tarhio And Esko Ukkonen; 1990;  \n\u003chttps://www.cs.hut.fi/u/tarhio/papers/abm.pdf\u003e\n\n\"Approximate Boyer-Moore String Matching\" Explained;  \nPresention by Kuei-hao Chen;  \n\u003chttp://t2.ecp168.net/webs@73/cyberhood/Approximate_String_Matching/BHM_approximate_string_Algorithm.ppt\u003e\n\n\"Exact String Matching Algorithms\";  \nThierry Lecroq;  \n\u003chttp://www-igm.univ-mlv.fr/~lecroq/string/index.html\u003e\n\nHorspool on Wikipedia;  \n\u003chttps://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm\u003e\n\nHorspool Explained;  \nPresention by Kuei-hao Chen;  \n\u003chttp://alg.csie.ncnu.edu.tw/course/StringMatching/Horspool.ppt\u003e\n\n[DMS90]: https://csclub.uwaterloo.ca/~pbarfuss/p132-sunday.pdf\n\n[TU90]: https://www.cs.hut.fi/u/tarhio/papers/abm.pdf\n\n[KHC1]: http://t2.ecp168.net/webs@73/cyberhood/Approximate_String_Matching/BHM_approximate_string_Algorithm.ppt\n\n[KHC2]: http://alg.csie.ncnu.edu.tw/course/StringMatching/Horspool.ppt\n\n[LECROQ]: http://www-igm.univ-mlv.fr/~lecroq/string/index.html\n\n[HORSPOOL]: http://www-igm.univ-mlv.fr/~lecroq/string/node18.html\n\n[SUNQS]: http://www-igm.univ-mlv.fr/~lecroq/string/node19.html\n\n[WKHORS]: https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm\n\n\nCopyright\n---------\n\nCopyright 2017 by Anthony Howe.  All rights reserved.\n\n\nMIT License\n-----------\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsirwumpus%2Ferlang-fgrep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsirwumpus%2Ferlang-fgrep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsirwumpus%2Ferlang-fgrep/lists"}