https://github.com/zot/microfts
Small and fast FTS (full text search)
https://github.com/zot/microfts
Last synced: 12 months ago
JSON representation
Small and fast FTS (full text search)
- Host: GitHub
- URL: https://github.com/zot/microfts
- Owner: zot
- License: mit
- Created: 2020-12-11T20:34:47.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2023-10-30T14:21:32.000Z (over 2 years ago)
- Last Synced: 2025-03-23T03:32:11.396Z (about 1 year ago)
- Language: Go
- Size: 843 KB
- Stars: 32
- Watchers: 2
- Forks: 3
- Open Issues: 8
-
Metadata Files:
- Readme: README.org
- License: LICENSE
Awesome Lists containing this project
README
* Microfts
A small full text indexing and search tool focusing on speed and
space. Initial tests seem to indicate that the database takes about
twice as much space as the files it indexes.
Microfts implements a trigram GIN (generalized inverted index),
relying on [[http://www.lmdb.tech/doc/index.html][LMDB]] for storage, an open source, embedded, NOSQL,
key-value store library (so it's linked into microfts, not an external
service). It uses [[https://github.com/AskAlexSharov/lmdb-go/lmdb][AskAlexSharov's fork]] of [[https://github.com/bmatsuo/lmdb-goto][bmatsuo's lmdb-go package]] to
connect to it.
* LICENSE
Microfts is MIT licensed, (c) 2020 Bill Burdick. All rights reserved.
* Building
Note that building may generate warning messages from lmdb-go's compilation of the LMDB C code.
#+begin_src sh
go build -o microfts
#+end_src
* Examples
** Creating a database
#+begin_src sh
./microfts create /tmp/bubba
#+end_src
** Adding Text
This adds /tmp/tst to the database in /tmp/bubba
#+begin_src sh
rm -rf /tmp/bubba
./microfts create /tmp/bubba
cat > /tmp/tst < BLOCK
GRAM is a 2-byte value
|----------|
| OID LIST |
|----------|
*** OID LISTS
9 lists of oids: [9][]byte.
Note -- this is probably too ornate and a simple byte array and a
count might have the same performance and space.
|---------------|
| # 1-byte OIDS |
| # 2-byte OIDS |
| # 3-byte OIDS |
| # 4-byte OIDS |
| # 5-byte OIDS |
| # 6-byte OIDS |
| # 7-byte OIDS |
| # 8-byte OIDS |
| # 9-byte OIDS |
| OIDS |
|---------------|
*** Gram 0 holds the info since 0 is not a legal gram
|-----------------|
| next unused oid |
| next unused gid |
| free oids |
| free gids |
|-----------------|
*** Chunks: OID -> BLOCK
OIDS are compressed integers
|-------------------------|
| GID |
| data (e.g. line number) |
| gram count |
|-------------------------|
*** Groups: GID -> BLOCK
GIDS are compressed integers
|-----------------------------------|
| NAME |
| oid count |
| last changed timestamp |
| validity (valid = 0, deleted = 1) |
| org flag (whether -org was used) |
|-----------------------------------|
*** Group Names: NAME->GID