Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/localvoid/ndx
:mag: Full text indexing and searching library
https://github.com/localvoid/ndx
full-text-search inverted-index javascript search-engine typescript
Last synced: 6 days ago
JSON representation
:mag: Full text indexing and searching library
- Host: GitHub
- URL: https://github.com/localvoid/ndx
- Owner: localvoid
- License: mit
- Created: 2016-10-21T11:13:53.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2023-03-15T07:40:24.000Z (over 1 year ago)
- Last Synced: 2024-11-06T20:52:08.790Z (6 days ago)
- Topics: full-text-search, inverted-index, javascript, search-engine, typescript
- Language: TypeScript
- Homepage:
- Size: 1.06 MB
- Stars: 152
- Watchers: 9
- Forks: 11
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-tiny-js - ndx - Similar to js-search, differs in [ranking](https://kmwllc.com/index.php/2020/03/20/understanding-tf-idf-and-bm-25/) and is less strict for multi-word queries [(compare)](https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=js-search,ndx,Wade&search=twilight%20sag). Supports field weights. <img align="top" height="24" src="./img/ndx-ndxquery.svg"> (Text Search / Reactive Programming)
README
# [ndx](https://github.com/ndx-search/ndx) · [![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/ndx-search/ndx/blob/master/LICENSE)
Lightweight Full-Text Indexing and Searching Library.
This library were designed for a specific use case when all documents are
stored on a disk (IndexedDB) and can be dynamically added or removed to an
index.Query function supports only disjunction operators. Queries like `one two` will
work as `"one" or "two"`.Inverted Index doesn't store term locations and query function won't be able
to search for phrases like `"Super Mario"`.There are many [alternative solutions](https://github.com/leeoniya/uFuzzy#benchmark) with different tradeoffs that may better suit for your
particular use cases. For a simple document search with a static dataset, I
would recommend to use something like [fst](https://github.com/BurntSushi/fst)
and deploy it as an edge function (wasm).## Features
- Multiple fields full-text indexing and searching.
- Per-field score boosting.
- [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) ranking function to rank
matching documents.
- [Trie](https://en.wikipedia.org/wiki/Trie) based dynamic
[Inverted Index](https://en.wikipedia.org/wiki/Inverted_index).
- Configurable tokenizer and term filter.
- Free text queries with query expansion.## Example
```js
import { createIndex, indexAdd } from "ndx";
import { indexQuery } from "ndx/query";const termFilter = (term) => term.toLowerCase();
function createDocumentIndex(fields) {
// `createIndex()` creates an index data structure.
// First argument specifies how many different fields we want to index.
const index = createIndex(
fields.length,
// Tokenizer is a function that breaks text into words, phrases, symbols,
// or other meaningful elements called tokens.
(s) => s.split(" "),
// Filter is a function that processes tokens and returns terms, terms are
// used in Inverted Index to index documents.
termFilter,
);
// `fieldGetters` is an array with functions that will be used to retrieve
// data from different fields.
const fieldGetters = fields.map((f) => (doc) => doc[f.name]);
// `fieldBoostFactors` is an array of boost factors for each field, in this
// example all fields will have identical weight.
const fieldBoostFactors = fields.map(() => 1);return {
index,
// `add()` will add documents to the index.
add(doc) {
indexAdd(
index,
fieldGetters,
// Docum ent key, it can be an unique document id or a refernce to a
// document if you want to store all documents in memory.
doc.id,
// Document.
doc,
);
},
// `remove()` will remove documents from the index.
remove(id) {
// When document is removed we are just marking document id as being
// removed. Index data structure still contains references to the removed
// document.
indexRemove(index, removed, id);
if (removed.size > 10) {
// `indexVacuum()` removes all references to removed documents from the
// index.
indexVacuum(index, removed);
}
},// `search()` will be used to perform queries.
search(q) {
return indexQuery(
index,
fieldBoostFactors,
// BM25 ranking function constants:
// BM25 k1 constant, controls non-linear term frequency normalization
// (saturation).
1.2,
// BM25 b constant, controls to what degree document length normalizes
// tf values.
0.75,
q,
);
}
};
}// Create a document index that will index `content` field.
const index = createDocumentIndex([{ name: "content" }]);const docs = [
{
"id": "1",
"content": "Lorem ipsum dolor",
},
{
"id": "2",
"content": "Lorem ipsum",
}
];// Add documents to the index.
docs.forEach((d) => { index.add(d); });// Perform a search query.
index.search("Lorem");
// => [{ key: "2" , score: ... }, { key: "1", score: ... } ]
//
// document with an id `"2"` is ranked higher because it has a `"content"`
// field with a less number of terms than document with an id `"1"`.index.search("dolor");
// => [{ key: "1", score: ... }]
```### Tokenizers and Filters
`ndx` library doesn't provide any tokenizers or filters. There are other
libraries that implement tokenizers, for example
[Natural](https://github.com/NaturalNode/natural/) has a good collection of
tokenizers and stemmers.## License
[MIT](http://opensource.org/licenses/MIT)