{"id":29538691,"url":"https://github.com/hhf112/moore-search","last_synced_at":"2025-07-17T05:09:57.870Z","repository":{"id":297950634,"uuid":"998389743","full_name":"hhf112/moore-search","owner":"hhf112","description":"A parallelized header-only implementation of the Boyre Moore exact string searching algorithm in C++17. ","archived":false,"fork":false,"pushed_at":"2025-07-11T22:02:19.000Z","size":158,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-07-11T22:19:00.618Z","etag":null,"topics":["algorithm","cpp17","multithreading"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hhf112.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-08T14:08:02.000Z","updated_at":"2025-07-11T22:02:23.000Z","dependencies_parsed_at":"2025-07-11T22:11:27.368Z","dependency_job_id":"4542bcb4-6caa-4ded-a1ba-1e9655946df9","html_url":"https://github.com/hhf112/moore-search","commit_stats":null,"previous_names":["hhf112/moore-search"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hhf112/moore-search","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hhf112%2Fmoore-search","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hhf112%2Fmoore-search/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hhf112%2Fmoore-search/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hhf112%2Fmoore-search/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hhf112","download_url":"https://codeload.github.com/hhf112/moore-search/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hhf112%2Fmoore-search/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265566783,"owners_count":23789342,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","cpp17","multithreading"],"created_at":"2025-07-17T05:09:55.486Z","updated_at":"2025-07-17T05:09:57.865Z","avatar_url":"https://github.com/hhf112.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"under development. tests to be added. \u003cbr\u003e\nsquashed and organised commit history: 12-07-2025::02:39\n\n# Moore Search \u003cimg src = \"https://img.shields.io/github/actions/workflow/status/hhf112/moore-search/c-cpp.yml\" alt=\"build status\"\u003e\nA header only implementation of parallelized Boyre Moore exact string searching algorithm. compatible with C++17.\n\n# Benchmarks\nto be added.\n~38% faster than single threaded Boyre Moore for now.\n\n# Test run \n1. clone the repo and `cd` into it\n2. run `sh build`\n3. run: `./search \u003cfilename\u003e \u003cpattern\u003e \u003cmax search count\u003e`\n\n#### Note\nbuild system to be added.\n\n### Sample output\n```console\nhrsh $(LAPTOP-HK58DTQE):~/dev/moore$🌙 ./srch 800mb.txt example 10000000\nclassical search function find: 1446 ms.\nfound: 10000000\nparallel search function pfind: 1119 ms.\nfound: 10000004\n```\n# Public API\n## getters and setters \n```cpp\n  inline void set_search_count(size_t n) { m_search_count = n; }\n  inline int get_search_count() { return m_search_count; }\n  inline void set_chunk_size(size_t n) { m_chunk_size = n; }\n\n  inline std::string \u0026getBuf() { return m_buffer; }\n  inline const std::string \u0026getBufconst() { return m_buffer; }\n  inline std::string getPath() { return m_path; }\n```\n## Nested types\n### PatternData struct\n```cpp\n  struct PatternData {\n    std::vector\u003csize_t\u003e shift;\n    std::vector\u003csize_t\u003e bpos;\n    std::vector\u003cindex_t\u003e badchars;\n\n    PatternData() = default;\n    PatternData(int nchars, size_t patternlen) {\n      shift.resize(patternlen + 1);\n      bpos.resize(patternlen + 1);\n      badchars.resize(nchars, -1);\n    }\n  };\n```\n#### References\nhttps://www.geeksforgeeks.org/boyer-moore-algorithm-good-suffix-heuristic/\nhttps://www.geeksforgeeks.org/boyer-moore-algorithm-for-pattern-searching/\n\n\n## Members\n\n### patternCache\n```cpp\nstd::unordered_map\u003cstd::string, PatternData\u003e patternCache;\n```\n#### Note\n A better implementation possibly a round robin hashmap to be added.\n## Functions\n### find\n```cpp\ntemplate \u003ctypename OutputItStart\u003e\ninline std::optional\u003cOutputItStart\u003e find(const std::string \u0026path,\n                                  const std::string \u0026pattern,\n                                  OutputItStart beg,\n                                  int matches = MAX_MATCHES);\n\n```\n```yaml\nParams\npath       file path to search in\npattern    pattern to search for\nbeg        input iterator of the container matches are appended to\nmatches     maximum number of matches to look for [optional]\n\nReturn    \nsuccess    beg translated by number of matches appended on success\nfail       {}\n```\nRuns `search` on every chunk. Appends all matches found into container iterated by beg until specified matches are found or eof encountered\n#### Notes\nMay return repeated indexes due to overlapped chunks to avoid search misses\n\n### pfind: threaded find\n\n```cpp\ntemplate \u003ctypename OutputItStart\u003e\ninline std::optional\u003cOutputItStart\u003e pfind(const std::string \u0026path,\n                                        const std::string \u0026pattern,\n                                        OutputItStart beg,\n                                        int matches = MAX_MATCHES);\n\n\n```\n```yaml\nParams\npath       file path to search in\npattern    pattern to search for\nbeg        input iterator of the container matches are appended to\nmatches    maximum number of matches to be specified [optional]\n\nReturn     \nsuccess   beg translated by number of matches appended on success\nfail      {}\n```\nRuns `parallelSearch` on every chunk. Appends all matches found into container iterated by beg until specified matches are found or eof encountered\n#### Notes\n1. May return unordered indexes as total matches are counted by threads running all over the chunk\n2. May return repeated indexes due to local search space overlapping of each thread and overlapped chunks to avoid misses\n\n### search\n\n```cpp\ntemplate \u003ctypename OutputItStart\u003e  inline int search(const std::string \u0026text,\n                                                    const std::string \u0026pat,\n                                                    size_t startPos,\n                                                    size_t endPos,\n                                                    size_t startIndex,\n                                                    OutputItStart beg,\n                                                    int matches = MAX_MATCHES);\n```\n```yaml\nParams\ntext           text to search on\npat            pattern to search for\nstartPos       search start index inclusive\nendPos         search end index exclusive\nstartIndex     index to appened on every find\nbeg            input iterator of the container matches are appended to\nmatches        maximum number of matches to look for [optional]\n\nReturn          number of matches\n```\nPerforms classical boyre moore seach and determines shifts by the maximum of good suffix heuristic \nand bad character heuristic. Appends all matches found into container iterated by beg until specified matches are found or eof encountered. An atomic counter\nis polled between iterations for checking search count.\n\n### parallelSearch: threaded search\n```cpp\ntemplate \u003ctypename OutputItStart\u003e inline std::optional\u003cOutputItStart\u003e \nparallelSearch(const std::string \u0026text,\n                const std::string \u0026pattern,\n                size_t startIndex,\n                OutputItStart beg,\n                int matches = MAX_MATCHES);\n```\n```yaml\nParams\ntext           text to search on\npat            pattern to search for\nstartPos       search start index inclusive\nendPos         search end index exclusive\nstartIndex     index to appened on every find\nbeg            input iterator of the container matches are appended to\nmatches        maximum number of matches to look for [optional]\n\nReturn          \nsuccess         beg translated by number of matches found\nfail            {}\n```\nAllocates partitions of the `text` to `search` threads. Appends all matches found first into local containers for threads then\nensembles into container iterated by beg until specified matches are found or eof encountered. Search counting is atomic.\n\n### preprocess_pattern\n```cpp\ninline void preprocess_pattern(int nchars, const std::string \u0026pattern);\n```\n```yaml\nParams\nnchars    number of badcharacters\npattern   pattern to search for\n```\nadds preprocessed tables to cache if not existing.\n\n\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhhf112%2Fmoore-search","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhhf112%2Fmoore-search","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhhf112%2Fmoore-search/lists"}