https://github.com/davidbakereffendi/providentia

All the project code for my 2019 honours project. This is a web application that measures the query response times of 3 databases on queries and data analysis similar to those found in the real-world.
https://github.com/davidbakereffendi/providentia

databases janusgraph nltk postgis spatio-temporal-data tigergraph yelp-challenge-dataset

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/davidbakereffendi/providentia
Owner: DavidBakerEffendi
License: gpl-3.0
Created: 2019-05-13T13:04:54.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2020-04-19T16:26:43.000Z (about 5 years ago)
Last Synced: 2025-01-12T22:43:35.414Z (4 months ago)
Topics: databases, janusgraph, nltk, postgis, spatio-temporal-data, tigergraph, yelp-challenge-dataset
Language: TSQL
Homepage:
Size: 61 MB
Stars: 4
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Providentia
A web-based bench marking tool for testing query speeds of graph databases vs relational in the context of spatio-temporal data mainly using the Yelp challenge dataset.

## Project Structure

![Providentia Structure](providentia-docs/assets/providentia-architecture.png "Providentia Structure")

* Client is based on Angular7+.
* The REST API is a Flask based backend. It is used to communicate with the client for analysis using NLTK and other statistic techniques to simulate 'real-world' analysis.
* The Flask backend communicates with the databases.

## Databases
The following databases are being benchmarked:
* JanusGraph with Cassandra and ElasticSearch
* PostgreSQL with PostGIS
* TigerGraph

## Hardware Requirements
Not all databases need to be run at the same time, but once they each have the dataset imported along with the NLTK classifiers the memory requirements stack up. To counter this, many of the configuration files contain percentage modifiers for how much of the data you would like loaded up.

The following are my minimum hardware requirements in terms of a full production run:

* Intel i5 / AMD Ryzen 5
* 32 GB RAM
* 18 GB storage for training data and the Yelp challenge dataset

## Setting Up

The following are the steps on how to prepare and run Providentia:

* First go to `providentia-db` and follow the steps there in order to launch each database, preprocess and import the dataset.
* Go to `providentia-flask` to configure the Flask backend.
* Go to `providentia-ng` to configure the Angular frontend.
* Once all of the databases are ready and the web application is configured, run `docker-compose up` in this directory to begin the web application.

## Project Documentation and Presentation

The document report, poster, and presentation source code can be found under `providenta-docs`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/davidbakereffendi/providentia

Awesome Lists containing this project

README