{"id":17549192,"url":"https://github.com/yhirose/cpp-searchlib","last_synced_at":"2025-04-24T02:10:37.685Z","repository":{"id":138075793,"uuid":"242804373","full_name":"yhirose/cpp-searchlib","owner":"yhirose","description":"A C++17 full-text search engine library","archived":false,"fork":false,"pushed_at":"2021-10-18T02:47:41.000Z","size":3236,"stargazers_count":33,"open_issues_count":0,"forks_count":7,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-18T09:34:33.399Z","etag":null,"topics":["cpp","search-engine"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yhirose.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-02-24T17:55:49.000Z","updated_at":"2025-02-17T09:39:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"b21a07c5-379c-45ee-952a-1d718250f1bc","html_url":"https://github.com/yhirose/cpp-searchlib","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhirose%2Fcpp-searchlib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhirose%2Fcpp-searchlib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhirose%2Fcpp-searchlib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhirose%2Fcpp-searchlib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yhirose","download_url":"https://codeload.github.com/yhirose/cpp-searchlib/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250546086,"owners_count":21448260,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","search-engine"],"created_at":"2024-10-21T02:50:14.176Z","updated_at":"2025-04-24T02:10:37.679Z","avatar_url":"https://github.com/yhirose.png","language":"C++","readme":"# cpp-searchlib\n\nC++17 full-text search engine library (WIP. Far from release...)\n\nTODO:\n- [ ] Save/load index to/from storage\n- [ ] Posting list compression\n- [ ] Search scope (document, section, paragraph)\n\n```cpp\nusing namespace searchlib;\n\nstd::vector\u003cstd::string\u003e documents = {\n  \"This is the first document.\",\n  \"This is the second document.\",\n  \"This is the third document. This is the second sentence in the third document.\",\n  \"This is not the first document.\"\n};\n\n// Indexing...\nauto normalizer = [](auto str) { return unicode::to_lowercase(str); };\n\nauto index = make_in_memory_index\u003cTextRange\u003e(normalizer, [\u0026](auto \u0026indexer) {\n  size_t document_id = 0;\n  for (const auto \u0026doc : documents) {\n    indexer.index_document(document_id, UTF8PlainTextTokenizer(doc));\n    document_id++;\n  }\n};\n\n// Search...\nauto expr = parse_query(*index, normalizer, R\"( first not | \"the second sentence\" )\");\n\nauto result = perform_search(*index, *expr);\nresult-\u003esize(); // 2\n\nresult-\u003edocument_id(0); // 2\nresult-\u003esearch_hit_count(0); // 1\n\n  // 'the second sentence'\n  result-\u003eterm_position(0, 1); // 7\n  result-\u003eterm_length(0, 1); // 3\n  auto [pos, len] = index-\u003etext_range(*result, 0, 1); // 36, 19\n\nresult-\u003edocument_id(1); // 3\nresult-\u003esearch_hit_count(1); // 2\n\n  // 'not'\n  result-\u003eterm_position(1, 0); // 2\n  result-\u003eterm_length(1, 0); // 1\n  auto [pos, len] = index-\u003etext_range(*result, 1, 0); // 8, 3\n\n  // 'first'\n  result-\u003eterm_position(1, 1); // 4\n  result-\u003eterm_length(1, 1); // 1\n  auto [pos, len] = index-\u003etext_range(*result, 1, 1); // 16, 5\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyhirose%2Fcpp-searchlib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyhirose%2Fcpp-searchlib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyhirose%2Fcpp-searchlib/lists"}