An open API service indexing awesome lists of open source software.

https://github.com/zot/microfts

Small and fast FTS (full text search)
https://github.com/zot/microfts

Last synced: 12 months ago
JSON representation

Small and fast FTS (full text search)

Awesome Lists containing this project

README

          

* Microfts
A small full text indexing and search tool focusing on speed and
space. Initial tests seem to indicate that the database takes about
twice as much space as the files it indexes.

Microfts implements a trigram GIN (generalized inverted index),
relying on [[http://www.lmdb.tech/doc/index.html][LMDB]] for storage, an open source, embedded, NOSQL,
key-value store library (so it's linked into microfts, not an external
service). It uses [[https://github.com/AskAlexSharov/lmdb-go/lmdb][AskAlexSharov's fork]] of [[https://github.com/bmatsuo/lmdb-goto][bmatsuo's lmdb-go package]] to
connect to it.

* LICENSE

Microfts is MIT licensed, (c) 2020 Bill Burdick. All rights reserved.

* Building
Note that building may generate warning messages from lmdb-go's compilation of the LMDB C code.
#+begin_src sh
go build -o microfts
#+end_src

* Examples
** Creating a database
#+begin_src sh
./microfts create /tmp/bubba
#+end_src
** Adding Text
This adds /tmp/tst to the database in /tmp/bubba
#+begin_src sh
rm -rf /tmp/bubba
./microfts create /tmp/bubba
cat > /tmp/tst < BLOCK
GRAM is a 2-byte value
|----------|
| OID LIST |
|----------|
*** OID LISTS
9 lists of oids: [9][]byte.

Note -- this is probably too ornate and a simple byte array and a
count might have the same performance and space.
|---------------|
| # 1-byte OIDS |
| # 2-byte OIDS |
| # 3-byte OIDS |
| # 4-byte OIDS |
| # 5-byte OIDS |
| # 6-byte OIDS |
| # 7-byte OIDS |
| # 8-byte OIDS |
| # 9-byte OIDS |
| OIDS |
|---------------|
*** Gram 0 holds the info since 0 is not a legal gram
|-----------------|
| next unused oid |
| next unused gid |
| free oids |
| free gids |
|-----------------|
*** Chunks: OID -> BLOCK
OIDS are compressed integers
|-------------------------|
| GID |
| data (e.g. line number) |
| gram count |
|-------------------------|
*** Groups: GID -> BLOCK
GIDS are compressed integers
|-----------------------------------|
| NAME |
| oid count |
| last changed timestamp |
| validity (valid = 0, deleted = 1) |
| org flag (whether -org was used) |
|-----------------------------------|
*** Group Names: NAME->GID