https://github.com/rqlite/rqlite-fts5
Building a highly-available search engine using rqlite
https://github.com/rqlite/rqlite-fts5
Last synced: 9 months ago
JSON representation
Building a highly-available search engine using rqlite
- Host: GitHub
- URL: https://github.com/rqlite/rqlite-fts5
- Owner: rqlite
- License: mit
- Created: 2022-08-20T15:02:14.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-10-14T15:28:48.000Z (over 1 year ago)
- Last Synced: 2025-04-02T21:42:37.887Z (10 months ago)
- Language: Python
- Homepage: http://www.rqlite.io
- Size: 66.4 KB
- Stars: 8
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# rqlite-fts5
Building a highly-available search engine using rqlite. Check out [this blog post](https://www.philipotoole.com/building-a-highly-available-search-engine-using-sqlite/) for full details.
## Test data
You can download the test data set with the following command (tested on Linux):
```bash
curl https://storage.googleapis.com/bucket-vallified/rqlite/access-5million.log.gz >access.log.gz
```
Decompress the data set as follows:
```bash
gunzip access.log.gz
```
What results is an Apache web server access log file, containing 5 million entries.
## Indexing the log data
Use the Python program in this repository to index the data. You must have at least 1 rqlite node up and running (check the [_Quick Start_](https://rqlite.io/docs/quick-start/) guide to get rqlite up and running). The indexing program assume rqlite is available at `127.0.0.1:4001`, but you can override this via command line options.
```bash
python indexer.py access.log
```
Pass `-h` to the program to see full options. Depending on the hardware you use for your rqlite system, it could take a few minutes to index all the log data.