An open API service indexing awesome lists of open source software.

https://github.com/ddmitov/fupi

Serverless multilingual semantic search based on LanceDB
https://github.com/ddmitov/fupi

lancedb machine-learning nlp semantic-search

Last synced: 2 months ago
JSON representation

Serverless multilingual semantic search based on LanceDB

Awesome Lists containing this project

README

          

Fupi
--------------------------------------------------------------------------------

Fupi is a serverless multilingual semantic search system based on [LanceDB](https://github.com/lancedb/lancedb).



Once upon a time [a giraffe calf was orphaned during a severe drought and was saved thanks to the kindness and efforts of the local community and a knowledgeable animal whisperer](https://science.sandiegozoo.org/science-blog/lekiji-fupi-and-reteti). [Named Fupi, he became so attached to his rescuer that he visited him frequently long after he recuperated and was set free](https://www.theguardian.com/artanddesign/2023/jan/11/fupi-orphaned-giraffe-whisperer-ami-vitales-best-photograph).

Today we use complex machine learning technologies thanks to the knowledge, persistence and efforts of many people. Just like the small Fupi, we should always be thankful to them for their goodwill and contributions!

## Design Objectives

* **1.** Multilingual (Cross-Language) Semantic Search:
ability to search using one language in texts of another language
* **2.** Usability in serverless or scale-to-zero applications for low operational costs
* **3.** Adaptability to different cloud environments or on-premise systems
* **4.** No dependency on AI as a service (AIaaS) for production of embeddings
* **5.** No dependency on software as a service (SaaS) for storage of embeddings

## [Thanks and Credits](./CREDITS.md)

## Citations

```bibtex
@misc{bge-m3,
title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation},
author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
year={2024},
eprint={2402.03216},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

```bibtex
@misc{fan2020englishcentric,
title={Beyond English-Centric Multilingual Machine Translation},
author={Angela Fan and Shruti Bhosale and Holger Schwenk and Zhiyi Ma and Ahmed El-Kishky and Siddharth Goyal and Mandeep Baines and Onur Celebi and Guillaume Wenzek and Vishrav Chaudhary and Naman Goyal and Tom Birch and Vitaliy Liptchinsky and Sergey Edunov and Edouard Grave and Michael Auli and Armand Joulin},
year={2020},
eprint={2010.11125},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

## [License](./LICENSE)

This program is licensed under the terms of the Apache License 2.0.

## Authors

[Dimitar D. Mitov](https://www.linkedin.com/in/dimitar-mitov-12388982/), 2024,
[Adam Fauzi](https://www.linkedin.com/in/adam-fauzi-95a9322a1/), 2024