{"id":21236192,"url":"https://github.com/dbiir/paraflow","last_synced_at":"2026-03-13T18:48:51.322Z","repository":{"id":23228287,"uuid":"98480042","full_name":"dbiir/paraflow","owner":"dbiir","description":"A real-time analytical system for ID-associated data","archived":false,"fork":false,"pushed_at":"2022-06-17T01:59:59.000Z","size":19981,"stargazers_count":38,"open_issues_count":11,"forks_count":24,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-05T15:51:13.383Z","etag":null,"topics":["hadoop","kafka","orc","parquet","presto","spark-sql"],"latest_commit_sha":null,"homepage":"https://dbiir.github.io/paraflow/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dbiir.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-27T01:22:38.000Z","updated_at":"2025-04-05T01:27:35.000Z","dependencies_parsed_at":"2022-08-31T19:00:55.237Z","dependency_job_id":null,"html_url":"https://github.com/dbiir/paraflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dbiir/paraflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbiir%2Fparaflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbiir%2Fparaflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbiir%2Fparaflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbiir%2Fparaflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dbiir","download_url":"https://codeload.github.com/dbiir/paraflow/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbiir%2Fparaflow/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264619207,"owners_count":23638414,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hadoop","kafka","orc","parquet","presto","spark-sql"],"created_at":"2024-11-21T00:07:52.936Z","updated_at":"2026-03-13T18:48:46.297Z","avatar_url":"https://github.com/dbiir.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ParaFlow\n\nParaFlow is an interactive analysis system for OLAP developed at [DBIIR Lab @ RUC](http://iir.ruc.edu.cn).\n\n#### Install \u0026 Deploy\n##### Hadoop\nHadoop file system is required.\n##### Zookeeper-3.4.13\nThis is required by Kafka.\nwhat need to deploy is simply config the cluster ip and port;\n##### Kafka-2.11_1.11\n##### Postgresql-9.5\n##### Presto-0.192\n##### Paraflow\n1. MetaServer(one node)\n2. Loader [cn.edu.ruc.iir.paraflow.example.loader.BasicLoader]\n\n    config the ./paraflow-loader.sh then:\n\n    `./sbin/paraflow-loader.sh deploy`\n3. Collector [cn.edu.ruc.iir.paraflow.example.loader.BasicCollector]\n    \n    config the ./paraflow-collector.sh then:    \n    `./sbin/paraflow-collector.sh deploy`\n4. Presto connector\n\n#### Configuration\n##### Initialization\n1. Create user and database in pg for metadata.\n\n`CREATE USER paraflow WITH PASSWORD 'paraflow'`;\n`CREATE DATABASE paraflowmeta`;\n`GRANT ALL ON DATABASE paraflowmeta TO paraflow`.\n\n#### Startup\n1. Start Zookeeper cluster\n2. Start Kafka\n3. Start PostgreSql\n4. Start Paraflow MetaServer\n`./bin/paraflow-metaserver-start.sh [-daemon]`\n5. Start Paraflow Loader\n`./sbin/paraflow-loader.sh start`\n6. Start Paraflow Collector\n`./sbin/paraflow-collector.sh start`\n7. Start Presto cluster or single node to execute queries;\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdbiir%2Fparaflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdbiir%2Fparaflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdbiir%2Fparaflow/lists"}