Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/vsm/vsm-dictionary-cacher

VSM-dictionary wrapper that manages a cache for various get-functions
https://github.com/vsm/vsm-dictionary-cacher
cache dictionary ontology-search vsm
Last synced: about 1 month ago
JSON representation
VSM-dictionary wrapper that manages a cache for various get-functions
Host: GitHub
URL: https://github.com/vsm/vsm-dictionary-cacher
Owner: vsm
License: agpl-3.0
Created: 2018-04-09T19:13:47.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2023-01-03T16:30:50.000Z (about 2 years ago)
Last Synced: 2024-05-27T20:41:03.394Z (8 months ago)
Topics: cache, dictionary, ontology-search, vsm
Language: JavaScript
Homepage:
Size: 1.09 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 15
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project

README

        # vsm-dictionary-cacher




## Summary

`vsm-dictionary-cacher` augments a given VSM-dictionary with a layer of

caching functionality.

This speeds up requests for string-matches, refTerms, and dictInfos;

and fixedTerms-preloading.




## Use in Node.js

Install like (after also installing a `vsm-dictionary-...` of choice) :

```

npm install vsm-dictionary-cacher

```

Then use like:

```js

const Dictionary       = require('vsm-dictionary-local');  // ...or any other VsmDictionary implementation.

const cacher           = require('vsm-dictionary-cacher');

const CachedDictionary = cacher(Dictionary);  // This makes a cache-enabled subclass.

var dict = new CachedDictionary();  // This makes an instance.

// This will query the Dictionary as normal, bypassing the cache.

dict.getMatchesForString('abc', {filter: {dictID: ['Foo']}}, (err, res) => {

  console.dir(res);

  // This will get the result from the cache, instead of re-running the query.

  dict.getMatchesForString('abc', {filter: {dictID: ['Foo']}}, (err, res) => {

    console.dir(res);

  });

  // These will *not* get their result from the cache.

  dict.getMatchesForString('abc', {filter: {dictID: ['BAR']}}, (err, res) => {});

  dict.getMatchesForString('QQQ', {filter: {dictID: ['Foo']}}, (err, res) => {});

  // And similar behavior for the other three cached functions:

  // - dict.getRefTerms({}, cb)

  // - dict.getDictInfos({}, cb)

  // - dict.loadFixedTerms([], {}, cb)

});

```

Specify options like:

```js

const CachedDictionary = cacher(Dictionary, { maxItems: 1000 });

```




## Use in the browser

```

```

after which it is accessible as the global variable `VsmDictionaryCacher`.  

Then it can be wrapped around a VsmDictionary, e.g. a `VsmDictionaryLocal`, like:

```

....

var dict = new (VsmDictionaryCacher(VsmDictionaryLocal)) (options);

dict.getMatchesForString(....

```




## Details

This package provides a factory function that accepts any VsmDictionary

(sub)class, and returns a (further) subclass of it,  

which inserts cache handling code into several functions:




+ It speeds up requests for string-matches, to `getMatchesForString()`, in

  three ways:

  - It stores results from requests to this function in a cache.  

    These results are returned for subsequent requests

    that use same search-string & options, instead of re-running the query.

    > This helps e.g. `vsm-autocomplete` avoid making duplicate requests to an

    > online dictionary server.  

    > It creates a more responsive autocomplete when the user

    > types, and then backspaces.

  - It also prevents sending a second (or more) query to the underlying

    datastore, if this query has the same search-string & options as an

    ongoing query, whose results haven't arrived yet.  

    Instead, it puts such identical queries in a queue, and when the first one's

    result comes in, it shares its error+result with the queued ones,

    almost immediately (=on individual, next event-loops).

    > This helps e.g. `vsm-autocomplete` avoid making duplicate requests  

    > when a user types and backspaces quickly, before any results could come in.

    Note: when a query on the underlying storage fails, then no item will be

    added to the cache, and no attempt to re-query will be made by the queued

    ones.  

    This also means that on error, the same error would be returned by all

    queued requests.

  - It can also remember for which strings there were no 'normal entry'-type

    matches.  

    Then for subsequently queried strings, that start with such a 'no matches'

    string, it can immediately return an empty list for the 'entry-matches' too.

    (But it still checks for refTerm/number/etc-type matches).

    > This helps e.g. `vsm-autocomplete` avoid making unnecessary requests for

    > search-strings, for which a substring already returned no entry-matches.




+ It maintains a cache for `getRefTerms()`.

  - There are usually only a small number of refTerms, and they are just Strings.  

    Therefore, at a first call for anything, it queries all of them, and puts

    them into a cache that is used for all further lookups.

    > This makes e.g. `vsm-autocomplete` not launch two queries per

    > search-string (i.e. one for entries, and one for refTerms).  

    > And for cacheEmpty-hits, it will even serve all data from either cache or

    > computation.

  - It also catches and prevents any concurrent calls, until this cache data

    is received.




+ It partially maintains a cache for `getDictInfos()`.

  - It caches all dictInfo-objects that are returned from any queries to the

    underlying datastore, no matter what `options` were used.

  - Then it may use this cache:

    - It uses it: _only_ for requests that filter for a list of dictIDs

      (i.e. having a `options.filter.id`),  

      _or_ for requests that ask all dictInfos (i.e. having no `options.filter`).

    - • Then it collects all dictInfos with a cache-hit in the

      dictInfos-cache,  

      • and sends a query _only_ for the dictInfos that had a cache-miss,

      _and_ that are not marked as being-queried by another concurrent

      `getDictInfos()` call,  

      • after marking these as being-queried now too.  

      &plus; Note: queries to the underlying datastore are made, explicitly

      unpaginated (i.e. with `options = { perPage: Number.MAX_VALUE, ... }`).

    - When all dictInfos that it depends on have come in, (possibly as partial

      results from several other concurrent calls), then it finally returns

      its own, complete, assembled result.

    - The implementation of this is not trivial. But it ensures that for a

      large amount of concurrent requests (which may be launched when an app

      starts up), no dictID is queried twice.  

      And it works even if `clearCache()` (see below) is called during this

      process.

      > A `vsm-autocomplete` needs a corresponding dictInfo for each of its

      > string-matches.  

      > So if its string-matches already came from cache-hits, then the above

      > makes all its other data also come only from cache.

  

      > If a `vsm-box` with a template, or several `vsm-box`es loaded on a same

      > page, would launch multiple concurrent requests for dictInfos, then this

      > caching may result in a lot less queries to the underlying datastore.




+ It enhances the cache management of `loadFixedTerms()`.

  - - Note: this function (of the VsmDictionary parent class) queries

      a VsmDictionary subclass's `getEntries()`, and puts the processed

      results in its own simple cache (in the VsmDictionary parent class).

    - But if a web page would contain multiple `vsm-box`es based on a template,

      then each of them may call the (shared) VsmDictionary's `loadFixedTerms()`

      and launch a query. This would query and add results to the VsmDictionary's

      `fixedTermsCache`, no matter whether these results were in there already.

  - • It maintains a list of fixedTerms (`idts`) that have ever been

    queried.  

    • It removes, from a request's `idts`-argument, any that were queried

    before,  

    • and then launches the query only for the remaining `idts`,  

    • after marking them as 'pending/having-been-queried-now',  

    • so that concurrent calls can be prevented from requesting anything

    twice.

    Note: the `options.z`, for z-object-pruning, is not taken into account here,

    because VsmDictionary's fixedTerms-cache does so neither.

    > This prevents that `loadFixedTerms()` is called multiple times for the same

    > data, when loading multiple `vsm-box`es with the same template.




## Options

An options object can be given as second argument to the factory function

(see example above), with these optional properties:

- `maxItems`: {Number}:  

    This limits the amount of items kept in the string-match cache (only).

    One item equals the result of one `getMatchesForString()` query (which is

    often a list of match-objects).  

    When adding a new item to a full cache, the least recently added or accessed

    item gets removed first.  

    Default is 0, which means unlimited storage.

    + Note: this pertains to _string-match-objects_ only;  

      so not to refTerms (they are small and few), not to dictInfos (same reason,

      and the result of one query is spread out over multiple cache-items, which

      is hard to manage), and not to fixedTerms (same).

- `predictEmpties`: {Boolean}:  

    If `true`, then it keeps a list of strings (per options-object) for which

    `getEntryMatchesForString()` returned no results (i.e.: `{ items: [] }`).  

    Then for any subsequent query (with same options) for a string that

    starts with any such empty-returning-string, we can assume that no results

    will be returned either.  

    E.g. if a search for 'ab' returned no matching entries, then neither will

    'abc'. So it can avoid running that query and immediately return the

    empty `{ items: [] }` for 'abc'.  

    Default is `true`.  

    - Note: `maxItems` does not apply to this collection of strings either.  

      But the collection gets cleared, like everything else, by a call

      to `clearCache()` (see below).

    - Note: this is handled in `getEntryMatchesForString()`, not in

      `getMatchesForString()`, because the latter may still add 'extra' matches

      (refTerm/number/fixedTerm), other than the 'entry'-type matches.  

      - Example 1: after a call for "i" would give no entry-matches (and "i"

        ends up in the 'cacheEmpties'), a subsequent call for "it" should still

        return "it" as a refTerm-match.  

        (Note that a refTerm only matches for a full, not partial, string match).  

      - Example 2: after a call for "1e" gave no results, a subsequent call

        for the valid number-string "1e5" should still return it as a result.




## Functions

An extra function is added to the VsmDictionary subclass:

- `clearCache()`:  

    This removes all data from the cache layer, including e.g. the list used by

    `predictEmpties`.




## License

This project is licensed under the AGPL license - see [LICENSE.md](LICENSE.md).