https://github.com/buda-base/autocomplete-prototype
prototype for an autocomplete service for BDRC
https://github.com/buda-base/autocomplete-prototype
Last synced: 3 months ago
JSON representation
prototype for an autocomplete service for BDRC
- Host: GitHub
- URL: https://github.com/buda-base/autocomplete-prototype
- Owner: buda-base
- License: mit
- Created: 2024-02-15T11:34:53.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-11-17T12:48:44.000Z (7 months ago)
- Last Synced: 2026-01-19T19:43:56.709Z (5 months ago)
- Language: Python
- Size: 283 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# autocomplete-prototype
### Assumptions
- auto-complete starting at the beginning of results (no result where the query string starts in the middle of the result)
- some prefixes should be ignored
- there will be different indexes (for titles, person names, everything, different languages)
- the result should be a list of X (name + category), sorted using a ranking algorithm
- the source data for building an auto complete index is a list of strings with an associated category and score, the score for each string is computed separately based on the frequency of the string and possibly the entity score
### Vocabulary
Tokenization:
ཀུན་བཟང་བླ་མའི་ཞལ་ལུང -> [""]
- a full token is a token that we know is complete in the user query
- a partial token can appear as the last token of the user query if the user hasn't fully typed it