Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cthoyt/embeddingdb

A database for storing and comparing entity embeddings
https://github.com/cthoyt/embeddingdb

database network-representation-learning representation-learning

Last synced: about 1 month ago
JSON representation

A database for storing and comparing entity embeddings

Awesome Lists containing this project

README

        

Embedding Database |zenodo|
===========================
This package provides a database schema and Python wrapper
for storing the embeddings generated through various representation
learning packages.

Currently, this package focuses on using a SQL database with SQLAlchemy,
but might be extended to use a NoSQL database as an alternative.

Installation
------------
Install ``embeddingdb`` from `PyPI `_ with:

.. code-block:: sh

$ pip install embeddingdb

Alternatively, install the latest development version of ``embeddingdb`` directly
from GitHub with:

.. code-block:: sh

$ pip install git+https://github.com/cthoyt/embeddingdb

For developers, install ``embeddingdb`` in development mode from GitHub with:

.. code-block:: sh

$ git clone https://github.com/cthoyt/embeddingdb.git
$ cd embeddingdb
$ pip install -e .

Set the environment variable ``EMBEDDINGDB_CONNECTION`` to a valid
SQLAlchemy connection string for a PostgreSQL instance, as this package uses
the PostgreSQL-specific ``ARRAY`` type.

Command Line Interface
----------------------
This package installs an entrypoint ``embeddingdb`` that can be used directly from
the shell.

Uploading Entity Embeddings
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Entities can be embedded and stored from various types of representation learning,
including network representation learning, knowledge graph embedding, and textual
learning.

Upload embeddings generated by ``word2vec`` by specifying the file path with:

.. code-block:: sh

$ embeddingdb upload --fmt word2vec --path ~/path/to/file.txt

Upload embeddings generated by ``pykeen`` by specifying the output directory
with:

.. code-block:: sh

$ embeddingdb upload --fmt keen --path ~/path/to/directory/

Listing Entity Embeddings
~~~~~~~~~~~~~~~~~~~~~~~~~
After uploading, the collections can be listed with:

.. code-block:: sh

$ embeddingdb ls

Analyzing Entity Embeddings' Correlations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One of the motivations for building this repository was to make a convenient way to
compare the embeddings for entities generated through orthogonal embedding tecnhiques.
For example, we wanted to know to what extent the embeddings for proteins generated from
their sequences with ``ratvec`` contained the same information as the embeddings generated
from protein-protein interaction networks with ``pykeen`` or ``nrl``.

The two positional arguments correspond to the collection identifiers in the database.

.. code-block:: sh

$ embeddingdb analyze 1 2

Running with Docker
-------------------
After installing Docker, the entire web application can be instantiated with:

.. code-block:: sh

$ docker-compose up

Get the endpoint ``/test`` to instantiate the database and add a test collection.

.. |zenodo| image:: https://zenodo.org/badge/192898201.svg
:target: https://zenodo.org/badge/latestdoi/192898201