https://github.com/ptaconet/r_sql_datawarehouse
Deployement, management and exploitation of PostgreSQL/PostGIS-based data warehouses using the R programming language
https://github.com/ptaconet/r_sql_datawarehouse
Last synced: 4 months ago
JSON representation
Deployement, management and exploitation of PostgreSQL/PostGIS-based data warehouses using the R programming language
- Host: GitHub
- URL: https://github.com/ptaconet/r_sql_datawarehouse
- Owner: ptaconet
- Created: 2018-12-08T17:39:10.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-01-08T10:48:30.000Z (over 6 years ago)
- Last Synced: 2024-08-13T07:12:50.289Z (8 months ago)
- Language: R
- Homepage:
- Size: 3.09 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- jimsghstars - ptaconet/r_sql_datawarehouse - Deployement, management and exploitation of PostgreSQL/PostGIS-based data warehouses using the R programming language (R)
README
# Deployment, management and exploitation of data warehouses with the R programming language
## Project
This repository gathers the scripts and documentation for a methodology to deploy, load, update and exploit a data warehouse (as a PostgreSQL database) through the R programming language. The R scripts that we have developed enable to i) deploy the physical model of the database, ii) manage the extraction - transformation - upload (ETL) process, iii) access the data in R, and iv) set-up the FAIR principles on the data stored on the DW, to improve their management (discovery, access, processing, information / visualization). The inputs of these scripts are mainly simple csv files. The data warehouse model that we propose has got the following characteristics:
- Flexibility, meaning ability for the user to adapt the facts and dimensions of the data
warehouse to his/her data ;
- Ability to store multiple reference data, including spatial reference data (managed
through the PostGIS extension of PostgreSQL) ;
- Inclusion of a table dedicated to the metadata associated to each dataset loaded in
the data warehouse.The methodology was initially developed to implement the [global tuna atlas] (https://bluebridge.d4science.org/group/fao_tunaatlas) and the [french tropical tuna atlas] (https://bluebridge.d4science.org/web/frenchtropicaltunaatlas) data warehouses and catalogues.
## Fundings
French Research Institute for Sustanaible Development (IRD, www.ird.fr)
This work has received funding from the European Union's Horizon 2020 research and innovation programme under the BlueBRIDGE project (Grant agreement No 675680).
## Repository organization
- figures : figures used in the documentation
- r_scripts_datawarehouse_management : R scripts to manage the datawarehouse, including i) scripts to deploy the physical model of the datawarehouse, ii) functions to load datasets in the DW, iii) functions to retrieve datasets stored in the DW
- sql_scripts_datawarehouse_creation : SQL queries that are used in the R scripts to deploy the physical model of the datawarehouse
- documentation_datawarehouse_SQL_R.pdf : documentation of the project