Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/linkml/linkml-store
wrapper for multiple linkml storage engines
https://github.com/linkml/linkml-store
data-lake data-stack database-wrapper duckdb hdf5 linkml mongo mongodb nosql-database rdf semweb triplestore vector-database
Last synced: 2 months ago
JSON representation
wrapper for multiple linkml storage engines
- Host: GitHub
- URL: https://github.com/linkml/linkml-store
- Owner: linkml
- License: mit
- Created: 2024-04-06T01:28:43.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-08-24T02:12:14.000Z (5 months ago)
- Last Synced: 2024-08-24T21:36:27.077Z (5 months ago)
- Topics: data-lake, data-stack, database-wrapper, duckdb, hdf5, linkml, mongo, mongodb, nosql-database, rdf, semweb, triplestore, vector-database
- Language: Python
- Homepage: https://linkml.io/linkml-store
- Size: 9.33 MB
- Stars: 13
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# linkml-store
An AI-ready data management and integration platform. LinkML-Store
provides an abstraction layer over multiple different backends
(including DuckDB, MongoDB, Neo4j, and local filesystems), allowing for
common query, index, and storage operations.For full documentation, see [https://linkml.io/linkml-store/](https://linkml.io/linkml-store/)
See [these slides](https://docs.google.com/presentation/d/e/2PACX-1vSgtWUNUW0qNO_ZhMAGQ6fYhlXZJjBNMYT0OiZz8DDx8oj7iG9KofRs6SeaMXBBOICGknoyMG2zaHnm/embed?start=false&loop=false&delayms=3000) for a high level overview.
__Warning__ LinkML-Store is still undergoing changes and refactoring,
APIs and command line options are subject to change!## Quick Start
Install, add data, query it:
```
pip install linkml-store[all]
linkml-store -d duckdb:///db/my.db -c persons insert data/*.json
linkml-store -d duckdb:///db/my.db -c persons query -w "occupation: Bricklayer"
```Index it, search it:
```
linkml-store -d duckdb:///db/my.db -c persons index -t llm
linkml-store -d duckdb:///db/my.db -c persons search "all persons employed in construction"
```Validate it:
```
linkml-store -d duckdb:///db/my.db -c persons validate
```## Basic usage
* [Command Line](https://linkml.io/linkml-store/tutorials/Command-Line-Tutorial.html)
* [Python](https://linkml.io/linkml-store/tutorials/Python-Tutorial.html)
* API
* Streamlit applications## The CRUDSI pattern
Most database APIs implement the **CRUD** pattern: Create, Read, Update, Delete.
LinkML-Store adds **Search** and **Inference** to this pattern, making it **CRUDSI**.The notion of "Search" and "Inference" is intended to be flexible and extensible,
including:* Search
* Traditional keyword search
* Search using LLM Vector embeddings (*without* a dedicated vector database)
* Pluggable specialized search, e.g. genomic sequence (not yet implemented)
* Inference (encompassing *validation*, *repair*, and inference of missing data)
* Classic rule-based inference
* Inference using LLM Retrieval Augmented Generation (RAG)
* Statistical/ML inference## Features
### Multiple Adapters
LinkML-Store is designed to work with multiple backends, giving a common abstraction layer
* [MongoDB](https://linkml.io/linkml-store/how-to/Use-MongoDB.html)
* [DuckDB](https://linkml.io/linkml-store/tutorials/Python-Tutorial.html)
* [Solr](https://linkml.io/linkml-store/how-to/Query-Solr-using-CLI.html)
* [Neo4j](https://linkml.io/linkml-store/how-to/Use-Neo4j.html)* Filesystem
Coming soon: any RDBMS, any triplestore, Neo4J, HDF5-based stores, ChromaDB/Vector dbs ...
The intent is to give a union of all features of each backend. For
example, analytic faceted queries are provided for *all* backends, not
just Solr.### Composable indexes
Many backends come with their own indexing and search
schemes. Classically this was Lucene-based indexes, now it is semantic
search using LLM embeddings.LinkML store treats indexing as an orthogonal concern - you can
compose different indexing schemes with different backends. You don't
need to have a vector database to run embedding search!See [How to Use-Semantic-Search](https://linkml.io/linkml-store/how-to/Use-Semantic-Search.html)
### Use with LLMs
TODO - docs
### Validation
LinkML-Store is backed by [LinkML](https://linkml.io), which allows
for powerful expressive structural and semantic constraints.See [Indexing JSON](https://linkml.io/linkml-store/how-to/Index-Phenopackets.html)
and [Referential Integrity](https://linkml.io/linkml-store/how-to/Check-Referential-Integrity.html)
## Web API
There is a preliminary API following HATEOAS principles implemented using FastAPI.
To start you should first create a config file, e.g. `db/conf.yaml`:
Then run:
```
export LINKML_STORE_CONFIG=./db/conf.yaml
make api
```The API returns links as well as data objects, it's recommended to use a Chrome plugin for JSON viewing
for exploring the API. TODO: add docs here.The main endpoints are:
* `http://localhost:8000/` - the root of the API
* `http://localhost:8000/pages/` - browse the API via HTML
* `http://localhost:8000/docs` - the Swagger UI## Streamlit app
```
make app
```## Background
See [these slides](https://docs.google.com/presentation/d/e/2PACX-1vSgtWUNUW0qNO_ZhMAGQ6fYhlXZJjBNMYT0OiZz8DDx8oj7iG9KofRs6SeaMXBBOICGknoyMG2zaHnm/embed?start=false&loop=false&delayms=3000) for more details