https://github.com/dbiir/paraflow
A real-time analytical system for ID-associated data
https://github.com/dbiir/paraflow
hadoop kafka orc parquet presto spark-sql
Last synced: 6 months ago
JSON representation
A real-time analytical system for ID-associated data
- Host: GitHub
- URL: https://github.com/dbiir/paraflow
- Owner: dbiir
- License: apache-2.0
- Created: 2017-07-27T01:22:38.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2022-06-17T01:59:59.000Z (almost 3 years ago)
- Last Synced: 2023-10-20T19:15:13.360Z (over 1 year ago)
- Topics: hadoop, kafka, orc, parquet, presto, spark-sql
- Language: Java
- Homepage: https://dbiir.github.io/paraflow/
- Size: 19.1 MB
- Stars: 39
- Watchers: 5
- Forks: 30
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ParaFlow
ParaFlow is an interactive analysis system for OLAP developed at [DBIIR Lab @ RUC](http://iir.ruc.edu.cn).
#### Install & Deploy
##### Hadoop
Hadoop file system is required.
##### Zookeeper-3.4.13
This is required by Kafka.
what need to deploy is simply config the cluster ip and port;
##### Kafka-2.11_1.11
##### Postgresql-9.5
##### Presto-0.192
##### Paraflow
1. MetaServer(one node)
2. Loader [cn.edu.ruc.iir.paraflow.example.loader.BasicLoader]config the ./paraflow-loader.sh then:
`./sbin/paraflow-loader.sh deploy`
3. Collector [cn.edu.ruc.iir.paraflow.example.loader.BasicCollector]
config the ./paraflow-collector.sh then:
`./sbin/paraflow-collector.sh deploy`
4. Presto connector#### Configuration
##### Initialization
1. Create user and database in pg for metadata.`CREATE USER paraflow WITH PASSWORD 'paraflow'`;
`CREATE DATABASE paraflowmeta`;
`GRANT ALL ON DATABASE paraflowmeta TO paraflow`.#### Startup
1. Start Zookeeper cluster
2. Start Kafka
3. Start PostgreSql
4. Start Paraflow MetaServer
`./bin/paraflow-metaserver-start.sh [-daemon]`
5. Start Paraflow Loader
`./sbin/paraflow-loader.sh start`
6. Start Paraflow Collector
`./sbin/paraflow-collector.sh start`
7. Start Presto cluster or single node to execute queries;