{"id":15641460,"url":"https://github.com/hexagon/thinker-fts","last_synced_at":"2025-10-05T02:23:48.618Z","repository":{"id":62422591,"uuid":"46585046","full_name":"Hexagon/thinker-fts","owner":"Hexagon","description":"Fast and extendible Node.js/Javascript fulltext search engine.","archived":false,"fork":false,"pushed_at":"2022-03-29T22:38:46.000Z","size":377,"stargazers_count":72,"open_issues_count":2,"forks_count":7,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-07-30T00:15:38.161Z","etag":null,"topics":["factor","fts","full-text-search","ranker","search-engine","stemmer","suggestions","thinker","wordforms"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Hexagon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":["hexagon"]}},"created_at":"2015-11-20T20:28:14.000Z","updated_at":"2025-04-22T00:17:42.000Z","dependencies_parsed_at":"2022-11-01T17:32:49.173Z","dependency_job_id":null,"html_url":"https://github.com/Hexagon/thinker-fts","commit_stats":null,"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"purl":"pkg:github/Hexagon/thinker-fts","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hexagon%2Fthinker-fts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hexagon%2Fthinker-fts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hexagon%2Fthinker-fts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hexagon%2Fthinker-fts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Hexagon","download_url":"https://codeload.github.com/Hexagon/thinker-fts/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hexagon%2Fthinker-fts/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271448397,"owners_count":24761438,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-21T02:00:08.990Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["factor","fts","full-text-search","ranker","search-engine","stemmer","suggestions","thinker","wordforms"],"created_at":"2024-10-03T11:42:41.412Z","updated_at":"2025-10-05T02:23:43.594Z","avatar_url":"https://github.com/Hexagon.png","language":"JavaScript","funding_links":["https://github.com/sponsors/hexagon"],"categories":[],"sub_categories":[],"readme":"# thinker\n\n![Node.js CI](https://github.com/Hexagon/thinker-fts/workflows/Node.js%20CI/badge.svg?branch=master) [![npm version](https://badge.fury.io/js/thinker-fts.svg)](https://badge.fury.io/js/thinker-fts) [![Codacy Badge](https://app.codacy.com/project/badge/Grade/84d7dc1fc1074d619f06546a409fdd79)](https://www.codacy.com/gh/Hexagon/thinker-fts/dashboard?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=Hexagon/thinker-fts\u0026amp;utm_campaign=Badge_Grade) [![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](https://img.shields.io/badge/license-MIT-blue.svg)\n\nFast, extendible and stand alone pure JavaScript full text search engine. Node. Deno. Browser.\n\n## Features\n\n*   In-memory operation\n*   Highly optimized, will give a ranked resultset within 10 ms on a 5000 (average wikipedia sized) document dataset.\n*   Few external dependencies\n*   Natural language search\n*   Partial matching\n*   Expression correction / suggestions\n*   Weighted ranker (configurable weights for each field, all-expression-match-factor, partial vs exact factor etc.)\n*   Search modifiers (+ require, - exclude, \"searchword\" precise match which excepts wordprocessors)\n*   Result filters (hard filters)\n*   Result reduction (soft filters)\n*   Metadata collection (example: collect metadata tags from all results, including those removed by reduction)\n*   Field preprocessors\n\t*   HTML-Stripper\n\t*   Word preprocessors\n\t\t*   [Stemmers](https://en.wikipedia.org/wiki/Stemming)\n\t    *   Swedish\n\t    *   English\n\t*   [Stop words](https://en.wikipedia.org/wiki/Stop_words)\n\t*   Word forms\n\t*   [Soundex](https://en.wikipedia.org/wiki/Soundex)\n\t*   Stripper for repeated characters\n*   Works in Node.js \u003e=4.0 (both require and import).\n*   Works in Deno \u003e=1.16.\n*   Works in browsers as standalone, UMD or ES-module.\n\n## Installation\n\n### Node.js\n\n```npm install thinker-fts --save```\n\nJavaScript ESM\n\n```javascript\nimport Thinker from \"thinker-fts\";\n\nconst thinker = Thinker();\n```\n\nJavaScript CommonJS\n```javascript\nconst Thinker = require(\"thinker-fts\");\n\nconst thinker = Thinker();\n```\n\nTypeScript\n\n*Note that only default export is available in Node.js TypeScript, as the commonjs module is used internally.*\n\n```typescript\nimport Thinker from \"thinker-fts\";\nconst thinker = Thinker();\n```\n\n### Deno\n\nJavaScript\n\n```javascript\nimport Thinker from \"https://cdn.jsdelivr.net/gh/hexagon/thinker-fts@2/dist/thinker.min.mjs\";\nconst thinker = Thinker();\n```\n\nor\n\n```javascript\nimport Thinker from \"https://deno.land/x/thinker/dist/thinker.min.mjs\";\nconst thinker = Thinker();\n```\n\n### Browser \n\n#### Manual\n\n*   Download latest [zipball](https://github.com/Hexagon/thinker-fts/archive/refs/heads/master.zip)\n*   Unpack\n*   Grab ```thinker.min.js``` (UMD and standalone) or ```thinker.min.mjs``` (ES-module) from the [dist/](/dist) folder\n\n#### CDN\n\nTo use as a [UMD](https://github.com/umdjs/umd)-module (stand alone, [RequireJS](https://requirejs.org/) etc.)\n\n```html\n\u003cscript src=\"https://cdn.jsdelivr.net/npm/thinker-fts/dist/thinker.min.js\"\u003e\u003c/script\u003e\n```\n\nTo use as a [ES-module](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules)\n\n```html\n\u003cscript type=\"module\"\u003e\n\timport Thinker from \"https://cdn.jsdelivr.net/npm/thinker-fts/dist/thinker.min.mjs\";\n\tconst thinker = Thinker();\n\t// ... see usage section ...\n\u003c/script\u003e\n```\n\n## Quick-start\n\nA simple setup with feeding and searching would look something like the snippet below\n\n```javascript\n// See installation section for exact procedure depending on environment, this is Node.js/CommonJS\nconst Thinker = require('thinker-fts'),\n\nconst thinker = Thinker();\n\n// Connect standard ranker\nthinker.ranker = Thinker.rankers.standard();\n\n// Feed thinker with an array of documents formatted like { id: id, fields: [textfield, textfield] }\nthinker.feed([\n\t{ id: 1, fields: ['Lorem', 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.'] },\n\t{ id: 2, fields: ['Ipsum', 'Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.'] }\n]);\n\n// Search for text\nvar result = thinker.find('ut in');\n\n// Show result\nconsole.log(result);\n```\n\nResults:\n```json\n{ \n\texpressions: [ \n\t\t{ \n\t\t\toriginal: 'ut',\n\t\t\tinterpretation: [Object],\n\t\t\tsuggestion: undefined,\n\t\t\tmodifier: undefined,\n\t\t\texactMode: false \n\t\t},\n\t\t{\n\t\t\toriginal: 'in',\n\t\t\tinterpretation: [Object],\n\t\t\tsuggestion: undefined,\n\t\t\tmodifier: undefined,\n\t\t\texactMode: false \n\t\t}\n\t],\n\tperformance: { \n\t\tfind: 1.107075,\n\t\trank: 0.598558,\n\t\tsort: 0.688598,\n\t\tfilter: 0.060182,\n\t\ttotal: 2.639159 \n\t},\n\tdocuments: [\n\t\t{ id: 2, weight: 1.5, expressions: [Object] },\n\t\t{ id: 1, weight: 1.5, expressions: [Object] } \n\t],\n\ttotalHits: 2,\n\treturnedHits: 2 \n}\n\n```\n\nPlease not that you _have to_ connect a ranker, else find won't provide a result set. The ranker build the result set.\n\n## Basic configuration\n\nThinkers default configuration is overridden by supplying an options object to Thinkers constructor.\n\n```javascript\n\n// Options only available at initialization\nvar thinker = Thinker({\n\tcharacters: /([a-zA-Z0-9]*)/g,\n\tcaseSensitive: false,\n\tminWildcardWordLen: 3,\n\tmaxWildcardWordLen: 32,\n\tminWordLen: 2,\n\tmaxWordLen: 32,\n\tsuggestionMinWordCount: 6,\n\tenableSuggestions: false,\n\toptionalPlusFromExpressions: 1,\n\tconcatenateWords: 1\n});\n\n```\n\n### opts.characters\n\nRegular expressing stating which characters to pick up as words, if you (as an example) want to use Thinker with swedish characters the setting would be\n\n```javascript\n{ characters: /([a-zA-Z0-9åäöÅÄÖ]*)/g }\n```\n\n### opts.caseSensitive\n\nSelf explanatory, true or false\n\n### opts.minWildcardWordLen\n\nThinker always does partial matching, minWildcardWordLen sets how short parts of words that should be indexed. The default setting is 4 which matches 'xpre' to 'expression', but not 'pre'. Setting this too short could give an unnessecary amount of bogus matches and could affect performance if used with a heavy ranker.\n\n### opts.maxWildcardWordLen\n\nSame as above, but max.\n\n### opts.minWordLen\n\nThe shortest word to index, default is 2 which adds 'ex' to the index, but not 'e'\n\n### opts.maxWordLen\n\nSame as above, but max.\n\n### opts.suggestionMinWordCount\n\nSet how many times a word have to exist in the index to be used for suggestions. Defaults to 6.\n\n### opts.enableSuggestions\n\nIf this is enabled, thinker will use unprocessed words from the inputted texts to give suggestions when expressions doesn't give an direct match.\n\nThis is what results.expressions[n] will look like when you search for 'exression' (missing p)\n\n### opts.optionalPlusFromExpressions\n\nWill be renamed, I promise. \n\nThis is how many words there should be in the expression before all words become optional. Defaults to 1 (disabled).\n\nIf you set this to 4, and search for a three word expression, all words will need to exist in the document to giva e match. In the background ```what you want``` become ```+what +you +want```.\nIf you giva a four word expression, all words become optional as usuabl.\n\n### opts.concatenateWords\n\nWhen this property is set to greater than one, augmented words will be inserted into the index, consisting of current and next word. If this property is set to 3 and the field is \"i want cookies today\", a search for ```iwantcookies```, ```wantcookiestoday``` or ```wantcookies``` will give a match.\n\n```javascript \n{\n\tinterpretation: {\n\t\toriginal: 'expression',\n\t\t...\n\t},\n\t...\n\tsuggestion: 'expression',\n\t...\n}\n```\n\n## 'Standard' ranker options\n\nThe ranker is configured by passing an options object to its constructor.\n\n```javascript\nvar thinker = Thinker(),\n\tranker = Thinker.rankers.standard({\n\t\tdirectHit: 1,\n\t\tpartialHit: 0.5,\n\t\teachPartialExpressionFactor: 1.5,\n\t\teachDirectExpressionFactor: 2,\n\t\tfields: {\n\t\t\t1: { weight: 4, boostPercentage: false },\n\t\t\t2: { weight: 2, boostPercentage: false }\n\t\t}\n\t});\n\nthinker.ranker = ranker;\n```\n\n### directHit / partialHit\n\nFactor to weight when an expression match a word directly resp. partially\n\n### eachPartialExpressionFactor\n\nFactor which is applied to a documents total weight when a expressions give a partial match. If the query consist of three expressions that all match partially this factor will be applied three times.\n\n### eachDirectExpressionFactor\n\nSame as above, but with direct hits.\n\n### fields\n\nObject defining a different base weight for a match in each field of a document, if your documents look like\n\n```javascript\nvar docs = [\n\t{ id: 1, fields: [\"This is the title\", \"This is the ingress\", \"This is the text\"] },\n\t...\n];\n```\n\nand your fields weights look like\n\n```javascript\nfields: {\n\t0: { weight: 4, boostPercentage: true },\n\t1: { weight: 2, boostPercentage: false },\n\t2: { weight: 2, boostPercentage: false }\n}\n```\n\nMatches in the title field would get a weight of four, matches in the ingress field would get a weight of two etc. \n\nAdditionally, as boostPercentage is set to true for title, that weight can get up to it's double if the match is the only word in the title. \n\nFor example, if the title is 'This is the stuff', and we search for 'stuff', the base weight is four, and that is multiplied by a calculated factor \n\n1 word matched, 4 words totally\n\n1+1/4\n\n1+0.25\n\ngives 1.25 in boostPercentage factor\n\n## Field processors\n\nField processors is functions that is applied to each and every field that thinker is fed with, before the indexing is done.\n\n### stripHtml\n\nStripts HTML, leaving links (a href=\"*\") and image descriptions (img alt=\"*\") in the returned result.\n\nExample setting up thinker with standard ranker and html-stripping\n\n```javascript\nvar\n\tthinker = Thinker(),\n\tranker = Thinker.rankers.standard(),\n\tstripHtml = Thinker.processors.stripHtml();\n\nthinker.addFieldProcessor(stripHtml);\n\nthinker.ranker = ranker;\n\n```\n\n## Word processors\n\nWord processors is functions that is applied to each and every word that thinker is fed with. They are applied the same way both when indexing and when querying.\n\nWord processors is handled in the same way they are configured, keep that in mind when setting up things. If you for example stem the word before applying wordforms, you need to use stemmed words in the wordforms list.\n\n### Wordforms\n\nReplaces chosen words with others, effectively making synonyms equal each other.\n\nExample setting up thinker with standard ranker and wordforms\n\n```javascript\nvar thinker   = Thinker(),\n\tranker \t  = Thinker.rankers.standard(),\n\twordforms = Thinker.processors.wordforms({\n\t\t\"we\": \"us\",\n\t\t\"they\": \"them\",\n\t\t\"github\": \"repository\"\n\t});\n\nthinker.addWordProcessor(wordforms);\n\nthinker.ranker = ranker;\n```\n\n### Stop words\n\nRemoves words that don't give better precision, normally stuff like 'and', 'I', 'they', 'we', 'can'. Adding the most common words here can speed up the quries a bit, and save some RAM.\n\nExample setting up thinker with standard ranker and stop words\n\n```javascript\nvar thinker   = Thinker(),\n\tranker \t  = Thinker.rankers.standard(),\n\tstopwords = Thinker.processors.stopwords({\n\t\t\"artikel\": true,\n\t\t\"bemötande\": true\n\t});\n\nthinker.addWordProcessor(stopwords);\n\nthinker.ranker = ranker;\n```\n\n### Stemmers\n\nFinds the stem of each word that is indexed, 'computers' will become 'computer', 'organized' will become 'organize' etc. This greatly improves accuracy of the matches and weighting.\n\nAn optional feature of the stemmers is to supply a list of words that you don't want to stem down.\n\nCurrently there is two stemmers available, swedish through a custom version of the Snowball algorithm, and english through the Porter algorithm.\n\nExample setting up thinker with standard ranker, english stemming and some stemmer stopwords.\n\n```javascript\nvar\n\tthinker \t= Thinker(),\n\tranker \t\t= Thinker.rankers.standard(),\n\tstemmer \t= Thinker.processors.stemmers.english({\n\t\t\"stemmer\": true,\n\t\t\"stemming\": true,\n\t\t\"dontstemthiseither\": true,\n\t\t\"leonardo\": true,\n\t\t\"anders\", true\n\t});\n\nthinker.addWordProcessor(stemmer);\n\nthinker.ranker = ranker;\n\n```\n\n\nExample setting up thinker with standard ranker, swedish stemming, and stemmer stop words\n\n```javascript\nvar\n\tthinker \t= Thinker(),\n\tranker \t\t= Thinker.rankers.standard(),\n\tstemmer \t= Thinker.processors.stemmers.swedish({\n\t\t\"berta\": true,\n\t\t\"jonas\": true,\n\t\t\"leonardo\": true,\n\t\t\"anders\": true\n\t});\n\nthinker.addWordProcessor(stemmer);\n\nthinker.ranker = ranker;\n```\n\n### Soundex\n\nSoundex preprocesses the words in such way that words that sounds alike matches each other.\n\nExample setting up thinker with Soundex processing.\n\n```javascript\nvar\n\tthinker \t= Thinker(),\n\tranker \t\t= Thinker.rankers.standard(),\n\tsoundex \t= Thinker.processors.soundex();\n\nthinker.addWordProcessor(soundex);\n\nthinker.ranker = ranker;\n```\n\n\n## Dependencies\n\nNote: For normal usage, all needed dependencies are bundled\n\n## Development dependencies\n\n  [fast-levenshtein](https://github.com/hiddentao/fast-levenshtein) (https://github.com/hiddentao/fast-levenshtein)\n\n  [stemmer](https://github.com/wooorm/stemmer) (https://github.com/wooorm/stemmer)\n\n  [node-soundex](https://github.com/LouisT/node-soundex) (https://github.com/LouisT/node-soundex)\n\n  [mocha](https://github.com/mochajs/mocha) (https://github.com/mochajs/mocha)\n\n  [should](https://github.com/shouldjs/should.js) (https://github.com/shouldjs/should.js)\n\n## Credits\n   \n  [Hexagon](https://github.com/hexagon/)\n   \n  [Pehr Boman](https://github.com/unkelpehr/)\n\n## Licence\n\nLicensed under the [MIT License](http://opensource.org/licenses/MIT)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhexagon%2Fthinker-fts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhexagon%2Fthinker-fts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhexagon%2Fthinker-fts/lists"}