https://github.com/chaokunyang/bigdata-examples
bigdata examples about spark and flink
https://github.com/chaokunyang/bigdata-examples
bigdata flink hadoop monitor python samples spark spark-sql sparkml
Last synced: 9 months ago
JSON representation
bigdata examples about spark and flink
- Host: GitHub
- URL: https://github.com/chaokunyang/bigdata-examples
- Owner: chaokunyang
- License: apache-2.0
- Created: 2018-02-01T10:34:10.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-08-23T04:08:18.000Z (over 7 years ago)
- Last Synced: 2025-04-03T07:42:59.576Z (11 months ago)
- Topics: bigdata, flink, hadoop, monitor, python, samples, spark, spark-sql, sparkml
- Language: Scala
- Homepage:
- Size: 50.8 KB
- Stars: 11
- Watchers: 3
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Awesome Bigdata Samples
A curated list of awesome bigdata applications, deploying, operations and monitoring.
## Environment
- Java: 1.8
- Scala: 2.11
- Python: 2.7
- Zookeeper: 3.4.6
- Hbase: 1.0.3
- Kafka: 0.10.0.1
- Redis: 3.2.6
- Hadoop: 2.6.5
- Spark: 2.2.1
- Flink: 1.4.0
## applications
- Spark Application
- Flink Application
## deploying
Operate a server cluster is not easy. Write some scripts can help us ease operations significantly. Here's some simple tools for this:
- `sync.sh`: recursively synchronize the files of current directory or specified directory and sub directory to same directory of all servers specified in hosts file.
- `del.sh`: delete current directory or specified directory of all servers specified in hosts file
- `dist_run.sh`: run a cmd on all servers specified in hosts
## operations
The scripts in awesome-bigdata-samples/bin provides some useful small operations tools to manage small and medium-sized server clusters. The details is as follows:
- `zk_admin.sh`: start or stop zookeeper cluster.
- start zookeeper cluster: ```./zk_admin.sh start```
- stop zookeeper cluster: ```./zk_admin.sh stop```
- `kafka_admin.sh`: start or stop kafka broker cluster.
- start kafka broker cluster: ```./kafka_admin.sh start```
- stop kafka broker cluster: ```./kafka_admin.sh stop```
- `rerun.py`: sometimes we may need to rerun some offline compute tasks for a couples of days. It would be tedious to rerun it one by one. `rerun.py` can be used to resolve scene like this. For example: ```python rerun.py -start 2017/11/21 -end 2017/12/01 -task dayJob.sh```
## monitoring
`monitor.py` in awesome-bigdata-samples/bin provides monitoring, auto recovery and alerting. The details is as follows:
- YarnChecker: monitor ResourceManager and NodeManagers
- HDFSChecker: monitor NameNode and DataNodes
- ZookeeperChecker: monitor zookeeper nodes
- KafkaChecker: monitor kafka brokers
- HBaseChecker: monitor HMaster and HRegionServer
- RedisChecker: monitor redis server
- YarnAppChecker: monitor yarn application. useful for monitor spark streaming application and flink streaming application
## Style
- Scala: The scala code use programing style from [databricks](https://github.com/databricks/scala-style-guide), and is integrated in to maven build lifestyle using [scalastyle-maven-plugin](http://www.scalastyle.org/)
- Java: The scala code use programing style from [Apache Beam](https://github.com/apache/beam/blob/master/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml)and is integrated in to maven build lifestyle using maven-checkstyle-plugin
##Run
Flink jobs containing Java 8 lambdas with generics cannot be compiled with IntelliJ IDEA at the moment. What you have to do is to build the project on the cli using `mvn compile` with **Eclipse JDT compiler**. Once the program has been built via maven, you can also run it from within IntelliJ.
## Build
```shell
mvn clean package -DskipTest -Pbuild-jar
```
## Contribute
- Source Code: https://github.com/chaokunyang/awesome-bigdata-samples
- Issue Tracker: https://github.com/chaokunyang/awesome-bigdata-samples/issues
## LICENSE
This project is licensed under Apache License 2.0.