Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/philterd/entitydb
A unified means for storing and querying entities
https://github.com/philterd/entitydb
database entity nlp query
Last synced: about 3 hours ago
JSON representation
A unified means for storing and querying entities
- Host: GitHub
- URL: https://github.com/philterd/entitydb
- Owner: philterd
- License: apache-2.0
- Created: 2022-08-07T18:09:51.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-29T13:06:01.000Z (4 months ago)
- Last Synced: 2024-10-29T15:56:28.946Z (4 months ago)
- Topics: database, entity, nlp, query
- Language: Java
- Homepage: https://www.philterd.io
- Size: 1.16 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# EntityDB
EntityDB is an application that integrates several components to provide a unified means for storing and querying entities (people, places, and things). This project includes the Entity Query Language (EQL) which facilitates querying entities across various underlying databases through a single query language.
## Architecture
Entities are stored in an underlying database. Supported databases are MySQL, MongoDB, Cassandra, and DynamoDB. Entities are indexed in Elasticsearch for fast querying. A cache stores recently ingested and accessed entities to improve performance. A separate database, the data store, manages data such as users, groups, queries, and other information.
## Features
The following are brief high-level descriptions of EntityDB's main features. Refer to the wiki for more detailed descriptions and information on how to configure and use the features.
### API
The API is built on REST and JSON. The API allows for entity ingestion, status and health monitoring, and entity querying through the Entity Query Language (EQL).
### Entity Store
The entity store is the master dataset of entities. It is an immutable data store. EntityDB provides a choice of MySQL, MongoDB, Cassandra, and DynamoDB for the underlying entity store. You are free to choose the database that best satisfies your use-case requirements.
### Search Index
As entities are ingested they are indexed in a search engine. All queries are performed against the search engine. Currently, the only supported search index is Elasticsearch.
### Entity Access Control
Each ingested entity is assigned an ACL. The ACL determines the entity's visibility to users and groups of the system.
### Audit
Various actions that occur in EntityDB are outputted as audit events. Some of the audited events include entity ingests, entities returned through queries, and entity ACL modifications.
### Continuous Queries
Entities received through the API are evaluated by the continuous queries. A continuous query is an EQL query that generates a notification when an entity meets the query's conditions. Continuous queries can be used to receive notifications that an ingested entity satisfies some conditions. Continuous queries are designed to be fast and efficient and promote a low time-to-alert (TTA).
For example, the continuous query `select * from entities where text = 'George'` will generate a notification when an entity having the text "George" is ingested.
### Rules Engine
Similar to continuous queries, the rules engine is executed for each ingested entity. Rules are user-defined and can be created to take a specific action on entities that are found to match one or more conditions. However, unlike continuous queries, rules can contain complex logic and actions and are designed to be executed when time-to-alert is not critical.
### Metric Reporting
EntityDB can report metrics to AWS CloudWatch, InfluxDB, or the console. These metrics report values such as how long an entity is in the ingest queue before being ingested, how long continuous queries are taking to evaluate, and the counts of stored and indexed entities. These metrics provide a comprehensive overview of EntityDB's performance and statistics.
### Scalability
EntityDB is easily scaled horizontally since its components are all distributed. Simply stand up a new EntityDB instance to increase its throughput and performance. EntityDB's sample AWS CloudFormation creates an EntityDB autoscaling group behind an Elastic Load Balancer. The autoscaling group can be set to scale based on metrics such as the size of the ingest SQS queue, any of the EntityDB reported metrics, or any EC2 instance metrics.
## Building EntityDB
During EntityDB's build tests will be run. Some of the unit tests are more like integration tests and this is an area for improvement.
```
mvn clean install
```## Running
Once successfully built, an `entitydb.jar` will be under `entitydb-app/target`. This is a runnable jar that can be started with `java -jar entitydb.jar`. By default, all components will use internal implementations but this can be changed in the `entitydb.properties`. See the [Documentation](https://github.com/mtnfog/entitydb/blob/master/documentation.md) for details on configuring the `entitydb.properties`.
### Ingesting Entities
#### Via the REST API
Entities can be ingested through the API. Look under the `scripts/` directory for sample cURL scripts. Entities must be in the format defined in [entity-model](https://github.com/mtnfog/entity-model). Ingested entities are immutable.
#### Via the Internal API
When integrated directly with your application entities can be ingested through the queues bypassing the REST API. It is not recommended to ingest without queuing entities in order to prevent entity loss due to capacity or network issues.
Copyright © 2024 Philterd, LLC.