https://github.com/pseitz/veloci

High performance fulltext search engine
https://github.com/pseitz/veloci

Last synced: 5 months ago
JSON representation

High performance fulltext search engine

Host: GitHub
URL: https://github.com/pseitz/veloci
Owner: PSeitz
License: mit
Created: 2017-02-12T22:42:23.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2024-10-07T07:31:16.000Z (almost 2 years ago)
Last Synced: 2025-04-11T21:14:47.532Z (over 1 year ago)
Language: Rust
Homepage:
Size: 4.42 MB
Stars: 10
Watchers: 3
Forks: 2
Open Issues: 11
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Veloci ![Veloci Tests](https://github.com/PSeitz/veloci/workflows/Veloci%20Tests/badge.svg) [![codecov](https://codecov.io/gh/PSeitz/veloci/branch/master/graph/badge.svg)](https://codecov.io/gh/PSeitz/veloci) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

CARGO_INCREMENTAL=1 RUST_BACKTRACE=full RUST_TEST_THREADS=1 RUST_LOG=veloci=trace,measure_time=info cargo watch -w src -x 'test -- --nocapture'

## Features

- Optional Schema

- Fuzzy Search

- Query Boosting

- Term Boosting

- Phrase Boosting

- Boost by Indexed Data

- Boost Parts of Query

- Boost by Text-Locality (multi-hit in same text)

- Facets

- Filters

- WhyFound

- Stopwordlists (EN, DE)

- Queryparser

- Compressed Docstore

- Support for In-Memory and Diskbased (MMap) Indices

- Speed

- Love💖

### Goals

- Super easy indexing and searching on data

- Ultrahigh performance

### Non-Goals (Currently)

- Delta update in indices

## Creating Indices

Use the tool in `veloci_bins/src/bin/create_index.rs` to create indices on your data.

Currently the data needs to be stored in the `json` format one json per line:

```json

{"text": "my first object", "sub_objects": [{"description": "this works"}]}

{"text": "my second object"}

```

If your json is not in this format, there is a tool to convert it in `veloci_bins/src/bin/convert_json_to_line_delimited.rs`

## Addressing fields

```json

{

    "text": "my first object",

    "sub_objects": [

        {"description": "this works", "deeper": ["tag1", "tag2"]}

    ],

    "structured":{

        "name": "a"

    }

}

```

The fields would be adressed like this:

text

sub_objects[].description

sub_objects[].deeper[]

structured.name

## Boosting 

Boost score based on values in the data. Given two products with the same name, but one is more common and should be ranked higher.

```json

{ "commonness": 10, "name": "product" }

{ "commonness": 99, "name": "product" }

```

Create a column index for the data. Note the "boost" prefix.

```toml

    [commonness.boost]

    boost_type = 'f32'

```

For search we create a boost query that adjusts the score with the following formula.

`hit.score *= (boost_value + boost_param).log10();`

```rust

    let req: search::Request = json!({

        "search_req": { "search": {

            "terms":["product"],

            "path": "name",

            "levenshtein_distance": 0

        }},

        "boost" : [{

            "path":"commonness",

            "boost_fun": "Log10",

            "param": 1

        }]

    });

```

## Webserver

To install the search enginge bundled with the webserver execute in the `server` folder:

`cd server;cargo install`

To start the server and load search indices inside the jmdict folder:

`ROCKET_ENV=stage RUST_BACKTRACE=1 RUST_LOG=veloci=info ROCKET_PORT=3000 rocket_server jmdict`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pseitz/veloci

Awesome Lists containing this project

README