Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/davidbakereffendi/providentia
All the project code for my 2019 honours project. This is a web application that measures the query response times of 3 databases on queries and data analysis similar to those found in the real-world.
https://github.com/davidbakereffendi/providentia
databases janusgraph nltk postgis spatio-temporal-data tigergraph yelp-challenge-dataset
Last synced: about 1 month ago
JSON representation
All the project code for my 2019 honours project. This is a web application that measures the query response times of 3 databases on queries and data analysis similar to those found in the real-world.
- Host: GitHub
- URL: https://github.com/davidbakereffendi/providentia
- Owner: DavidBakerEffendi
- License: gpl-3.0
- Created: 2019-05-13T13:04:54.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-04-19T16:26:43.000Z (over 4 years ago)
- Last Synced: 2024-10-19T10:42:57.477Z (2 months ago)
- Topics: databases, janusgraph, nltk, postgis, spatio-temporal-data, tigergraph, yelp-challenge-dataset
- Language: TSQL
- Homepage:
- Size: 61 MB
- Stars: 3
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Providentia
A web-based bench marking tool for testing query speeds of graph databases vs relational in the context of spatio-temporal data mainly using the Yelp challenge dataset.## Project Structure
![Providentia Structure](providentia-docs/assets/providentia-architecture.png "Providentia Structure")
* Client is based on Angular7+.
* The REST API is a Flask based backend. It is used to communicate with the client for analysis using NLTK and other statistic techniques to simulate 'real-world' analysis.
* The Flask backend communicates with the databases.## Databases
The following databases are being benchmarked:
* JanusGraph with Cassandra and ElasticSearch
* PostgreSQL with PostGIS
* TigerGraph## Hardware Requirements
Not all databases need to be run at the same time, but once they each have the dataset imported along with the NLTK classifiers the memory requirements stack up. To counter this, many of the configuration files contain percentage modifiers for how much of the data you would like loaded up.The following are my minimum hardware requirements in terms of a full production run:
* Intel i5 / AMD Ryzen 5
* 32 GB RAM
* 18 GB storage for training data and the Yelp challenge dataset## Setting Up
The following are the steps on how to prepare and run Providentia:
* First go to `providentia-db` and follow the steps there in order to launch each database, preprocess and import the dataset.
* Go to `providentia-flask` to configure the Flask backend.
* Go to `providentia-ng` to configure the Angular frontend.
* Once all of the databases are ready and the web application is configured, run `docker-compose up` in this directory to begin the web application.## Project Documentation and Presentation
The document report, poster, and presentation source code can be found under `providenta-docs`.