{"id":21368207,"url":"https://github.com/chrisvilches/partial-text-search","last_synced_at":"2026-05-20T03:32:24.804Z","repository":{"id":44693812,"uuid":"453541453","full_name":"ChrisVilches/Partial-Text-Search","owner":"ChrisVilches","description":"A JavaScript library that finds string patterns in a collection of documents. It efficiently finds matches even if the words in each document do not begin with the query pattern.","archived":false,"fork":false,"pushed_at":"2022-02-02T01:04:21.000Z","size":990,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-24T04:03:21.927Z","etag":null,"topics":["javascript","string-matching","suffix-array","text-search"],"latest_commit_sha":null,"homepage":"https://www.npmjs.com/package/partial-text-search","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ChrisVilches.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-01-29T23:15:35.000Z","updated_at":"2023-06-27T16:23:56.000Z","dependencies_parsed_at":"2022-09-12T15:14:05.832Z","dependency_job_id":null,"html_url":"https://github.com/ChrisVilches/Partial-Text-Search","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChrisVilches%2FPartial-Text-Search","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChrisVilches%2FPartial-Text-Search/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChrisVilches%2FPartial-Text-Search/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChrisVilches%2FPartial-Text-Search/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ChrisVilches","download_url":"https://codeload.github.com/ChrisVilches/Partial-Text-Search/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243841217,"owners_count":20356446,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["javascript","string-matching","suffix-array","text-search"],"created_at":"2024-11-22T07:23:15.012Z","updated_at":"2026-05-20T03:32:24.760Z","avatar_url":"https://github.com/ChrisVilches.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Partial Text Search\n\nA JavaScript library that finds string patterns in a collection of documents. It efficiently finds matches even if the words in each document do not begin with the query pattern.\n\nThe result of each query is a set containing the document indices where the query pattern is contained.\n\nIt uses the suffix array data structure to achieve high performance queries.\n\n## Basic usage\n\n```javascript\nconst PartialTextSearch = require('partial-text-search')\n\nconst docs = [\n  { title: 'the beatles', summary: 'the beatles were an english rock band formed in liverpool in 1960.' },\n  { title: 'blackpink', summary: 'blackpink is a south korean girl group formed by yg entertainment, consisting of members jisoo, jennie, rose, and lisa.' }\n]\n\nconst partialTextSearch = new PartialTextSearch(docs)\n\npartialTextSearch.search('li')\n// Set { 1, 0 }\n\npartialTextSearch.search('liv')\n// Set { 0 }\n```\n\n## Install\n\n```\nnpm install partial-text-search\n```\n\n## Advanced\n\n### .searchRanked method\n\nInstead of a document index set, you can get an object that maps each document to the amount of occurrences found.\n\n```javascript\npartialTextSearch.searchRanked('a')\n/*\n{\n  '0': 42,\n  '1': 311,\n  '2': 23\n}\n*/\n```\n\n### Limit the results\n\nAdd the `limit` option to get fewer results:\n\n```javascript\npartialTextSearch.search('aaa')\n// Set { 0, 1, 2, 3, 4, 5 }\n\npartialTextSearch.search('aaa', { limit: 3 })\n// Set { 0, 1, 2 }\n```\n\nThis option is only available for the `.search` method, and not for `.searchRanked`.\n\nThe decision of which ones to return or omit is completely arbitrary.\n\n### Ways to index each document\n\nIn order for the suffix array to work properly, it's necessary to reduce each document to a single string before indexing them.\n\nBy default this library will examine each document and extract (from the first level of nesting only) all strings and numbers (converted to strings) and concatenate them to create a single string.\n\n```javascript\nconst doc = {\n  title: 'hello world',\n  body: 'document content',\n  info: {\n    year: 2000\n  }\n}\n```\n\nIn this example, the resulting string to be indexed for this document will be `hello world|document content`. Note that the `info` field was ignored.\n\nPlus, note that a separator `|` was added between the two fields. Read more about the [separator](#separator).\n\nThere are more ways to convert documents to strings, which are described next.\n\n#### Index only certain fields from the document\n\nExtract only certain fields from each document:\n\n```javascript\npartialTextSearch = new PartialTextSearch(docs, { docToString: ['summary', 'anotherField'] })\n```\n\n#### Custom function to convert a document to a string\n\nYou can fully customize the way a document is indexed by providing a function, for example:\n\n```javascript\n// myDocConversion :: Object -\u003e String\nconst myDocConversion = doc =\u003e (doc.age * 2) + '||' + doc.name + '||' + doc.surname\n\npartialTextSearch = new PartialTextSearch(docs, { docToString: myDocConversion })\n```\n\nIn this case you must manually add a separator between fields in case you need it.\n\n### Separator\n\nBefore indexing the document list, it's necessary to convert each document to a single string, where some or all fields are concatenated. In order to improve search accuracy, a separator can be added (by default a pipe character, or `|`) so that it's possible to clearly differentiate one document field from another. This avoids matching a substring that only exists because of the concatenation of two fields, but not in any individual field of the document. Take a look at the following example:\n\n```javascript\nconst docList = [\n  { text: 'bana', about: 'na' }\n]\n```\n\nWhen indexing this document, the document needs to be reduced to a single string, and if the resulting string has no separators, then the query `banana` would match this document, even though the word was not present in any individual field.\n\nThe workaround used by this library to avoid this problem is to insert a separator between document fields.\n\nNote that not using a separator (or not configuring it properly) doesn't necessarily lead to severe harmful outcomes, but it's nevertheless recommended to configure it.\n\nIf you want to use a character different from `|`, you can configure a different separator for combining fields into a single string:\n\n```javascript\npartialTextSearch = new PartialTextSearch(docs, { separator: '/' })\n```\n\nWhat if the query patterns and/or the document strings contain the separator being used? The separator is only used as a way to improve accuracy, but it's not part of the actual text (since it's inserted by the library), therefore it shouldn't be used for pattern matching. One way to deal with this problem is to remove the separator from both the document's text (at the time of indexing) and from each query (before calling the search methods). This way, the separator character will only ever appear as a separator, and in no other context:\n\n```javascript\nconst removePipe = x =\u003e x.replace(/\\|/g, '')\n\nconst myDocConversion = doc =\u003e removePipe(doc.name) + '|' + removePipe(doc.surname)\n\nconst partialTextSearch = new PartialTextSearch(docs, {\n  docToString: myDocConversion\n})\n\nconst myQuery = 'I love the pipe | symbol'\n\npartialTextSearch.search(removePipe(myQuery))\n```\n\n### Case insensitive support\n\nSearch is case sensitive by design, but there are a few ways to support case insensitive search. The recommended way is to:\n\n1. At indexing time, convert the strings to lowercase (don't modify the original documents, simply modify the string to index).\n2. Lowercase the query before executing the search.\n\n```javascript\nconst myDocConversion = doc =\u003e (doc.title + '|' + doc.text).toLowerCase()\n\npartialTextSearch = new PartialTextSearch(docs, { docToString: myDocConversion })\n\nconst someQuery = 'I hAve mIXED cAsEs'\n\npartialTextSearch.search(someQuery.toLowerCase())\n```\n\nThis trick can also be used to remove characters determined to be \"useless\" like dots, commas, extra whitespace, etc. Remember to apply the same pre-processing to both the documents and the query patterns, otherwise they would not match.\n\n## Contribution\n\nYour contributions are always welcome and appreciated. Following are the things you can do to contribute to this project.\n\n1. **Report a bug:** If you think you have encountered a bug, and I should know about it, feel free to report it in the issues section and I will take care of it.\n2. **Request a feature:** You can request a feature in the issues section, and if it's viable, it will be added to the development backlog.\n3. **Create a pull request:** Your pull request will be appreciated by the community. You can get started by picking up any open issues and make a pull request.\n\n## License\n\nThis library is available as open source under the terms of the MIT License.\n\n## Development\n\nTests:\n\n```\nnpm run test\n```\n\nFormat:\n\n```\nnpm run format\n```\n\nBenchmarks:\n\n```\nnode benchmarks/benchmark.js\nnode benchmarks/benchmark2.js 100000\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchrisvilches%2Fpartial-text-search","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchrisvilches%2Fpartial-text-search","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchrisvilches%2Fpartial-text-search/lists"}