Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sile/cl-dawg
Direct Acyclic Word Graph
https://github.com/sile/cl-dawg
common-lisp dawg double-array
Last synced: about 1 month ago
JSON representation
Direct Acyclic Word Graph
- Host: GitHub
- URL: https://github.com/sile/cl-dawg
- Owner: sile
- License: mit
- Created: 2010-10-15T04:43:57.000Z (over 14 years ago)
- Default Branch: master
- Last Pushed: 2018-01-02T09:24:55.000Z (about 7 years ago)
- Last Synced: 2024-11-07T03:46:01.375Z (3 months ago)
- Topics: common-lisp, dawg, double-array
- Language: Common Lisp
- Homepage:
- Size: 68.4 KB
- Stars: 5
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: COPYING
Awesome Lists containing this project
README
cl-dawg
========- A map implementation based on DAWG (Directed Acyclic Word Graph)
- Maps are serialized to sequences of bytes using double array trie format
- This takes static key set, and assigns the unique identifiers for the each keys
- Note that the input keys must be unique and lexically ordered
- The identifier assigned to a key is the zero-origin index of the key in the input sequence
- This library aims to provide a handy way for building maps which have tens of millions of elements in Common LispBuild
-----```lisp
(require :asdf)
(push *default-pathname-defaults* asdf:*central-registry*)
(asdf:load-system :dawg)
```Example
-------```lisp
(dawg:build :input "/usr/share/dict/words" :output "words.dawg")
(defparameter *dawg* (dawg:load "words.dawg"))(dawg:member? "hello" *dawg*)
T(dawg:get-id "hello" *dawg*)
50195(dawg:each-common-prefix (id end) ("hello" *dawg*)
(print (list id (subseq "hello" 0 end))))
(49012 "h")
(49845 "he")
(50183 "hell")
(50195 "hello")
```API
---### `dawg:build (&key input output byte-order show-progress) => t`
Builds a DAWG index file from the input key set.
- `input`:
- The pathname of a key set file or a list of keys
- "key set file" is line delimitered plain text file (a line represents a key)
- Restrictions:
- The input keys must be unique and lexically ordered
- A key cannot contain null characters
- Type: `(or string pathname list)`
- `output`:
- The pathname of the resulting DAWG index file
- Type: `(or string pathname)`
- `byte-order`:
- The endianness of the output file
- Type: `(member :native :little :big)`
- Default: `:native`
- `show-progress`:
- Indicates whether or not to show the progress
- Type: `boolean`
- Default: `nil`### `dawg:load (index-path &key byte-order) => dawg:dawg`
Loads the DAWG map from the specified index file.
- `index-path`:
- The pathname of an index file that built via `dawg:build` function
- Type: `(or string pathname file-stream)`
- `byte-order`:
- The endianness of the input file
- Type: `(member :native :little :big)`
- Default: `:native`### `dawg:member? (key dawg &key start end) => boolean`
Returns `t` if `dawg` contains the given key, otherwise `nil`.
- `key`:
- Type: `(simple-array character)`
- `dawg`:
- Type: `dawg:dawg`
- `start`:
- The start position in `key`
- Type: `positive-fixnum`
- Default: `0`
- `end`:
- The end position in `key`
- Type: `positive-fixnum`
- Default: `(length key)`### `dawg:get-id (key dawg &key start end) => (or null fixnum)`
Returns the identifier assigned to the given key.
If the key does not exist in `dawg`, this function will return `nil`.- `key`:
- Type: `(simple-array character)`
- `dawg`:
- Type: `dawg:dawg`
- `start`:
- The start position in `key`
- Type: `positive-fixnum`
- Default: `0`
- `end`:
- The end position in `key`
- Type: `positive-fixnum`
- Default: `(length key)`
### `dawg:each-common-prefix ((match-id match-end) (key dawg &key start end) &body body)`Executes common-prefix search for the given key.
For each key in `dawg` that matches the prefix part of `key`, `match-id` and `match-end` are bound then `body` is exeucted.
By using the `return` function, it is possible to break the loop halfway.
- `match-id`:
- The identifier of the key matched with the prefix part of the input `key`
- Type: `positive-fixnum`
- `match-end`:
- The end position of the matched part in the input `key` (i.e., the length of the matched key)
- Type: `positive-fixnum`
- `key`:
- Type: `(simple-array character)`
- `dawg`:
- Type: `dawg:dawg`
- `start`:
- The start position in `key`
- Type: `positive-fixnum`
- Default: `0`
- `end`:
- The end position in `key`
- Type: `positive-fixnum`
- Default: `(length key)`
- `body`:
- The expression to be executed in each iteration### `dawg:each-predictive ((match-id) (key dawg &key start end) &body body)`
A simplified version of `dawg:each-common-prefix` that does not bind the `match-end` variable in each iteration.