{"id":21026757,"url":"https://github.com/chaokunyang/bigdata-examples","last_synced_at":"2025-05-15T10:31:50.030Z","repository":{"id":113392516,"uuid":"119821054","full_name":"chaokunyang/bigdata-examples","owner":"chaokunyang","description":"bigdata examples about spark and flink","archived":false,"fork":false,"pushed_at":"2018-08-23T04:08:18.000Z","size":52,"stargazers_count":11,"open_issues_count":0,"forks_count":5,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-03T07:42:59.576Z","etag":null,"topics":["bigdata","flink","hadoop","monitor","python","samples","spark","spark-sql","sparkml"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chaokunyang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-01T10:34:10.000Z","updated_at":"2022-05-20T08:00:32.000Z","dependencies_parsed_at":"2023-03-13T13:19:52.190Z","dependency_job_id":null,"html_url":"https://github.com/chaokunyang/bigdata-examples","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chaokunyang%2Fbigdata-examples","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chaokunyang%2Fbigdata-examples/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chaokunyang%2Fbigdata-examples/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chaokunyang%2Fbigdata-examples/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chaokunyang","download_url":"https://codeload.github.com/chaokunyang/bigdata-examples/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254322945,"owners_count":22051688,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigdata","flink","hadoop","monitor","python","samples","spark","spark-sql","sparkml"],"created_at":"2024-11-19T11:46:08.246Z","updated_at":"2025-05-15T10:31:50.020Z","avatar_url":"https://github.com/chaokunyang.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Awesome Bigdata Samples\n\nA curated list of awesome bigdata  applications, deploying, operations and monitoring.\n## Environment\n- Java: 1.8\n- Scala: 2.11\n- Python: 2.7\n- Zookeeper: 3.4.6\n- Hbase: 1.0.3\n- Kafka: 0.10.0.1\n- Redis: 3.2.6\n- Hadoop: 2.6.5\n- Spark: 2.2.1\n- Flink: 1.4.0\n\n## applications\n- Spark Application\n- Flink Application \n\n## deploying\nOperate a server cluster is not easy. Write some scripts can help us ease  operations significantly. Here's some simple tools for this:\n- `sync.sh`: recursively synchronize the files of current directory or specified directory and sub directory to same directory of all servers specified in hosts file.\n- `del.sh`: delete current directory or specified directory of all servers specified in hosts file\n- `dist_run.sh`: run a cmd on all servers specified in hosts\n\n## operations\nThe scripts in awesome-bigdata-samples/bin provides some useful small operations tools to manage small and medium-sized server clusters. The details is as follows:\n- `zk_admin.sh`: start or stop zookeeper cluster.\n    - start zookeeper cluster: ```./zk_admin.sh start```\n    - stop zookeeper cluster: ```./zk_admin.sh stop```\n- `kafka_admin.sh`: start or stop kafka broker cluster.\n    - start kafka broker cluster: ```./kafka_admin.sh start```\n    - stop kafka broker cluster: ```./kafka_admin.sh stop```\n- `rerun.py`: sometimes we may need to rerun some offline compute tasks for a couples of days. It would be tedious to rerun it one by one. `rerun.py` can be used to resolve scene like this. For example: ```python rerun.py -start 2017/11/21 -end 2017/12/01 -task dayJob.sh```\n\n## monitoring\n`monitor.py` in awesome-bigdata-samples/bin provides monitoring, auto recovery and alerting. The details is as follows:\n- YarnChecker: monitor ResourceManager and NodeManagers\n- HDFSChecker: monitor NameNode and DataNodes \n- ZookeeperChecker: monitor zookeeper nodes\n- KafkaChecker: monitor kafka brokers\n- HBaseChecker: monitor HMaster and HRegionServer\n- RedisChecker: monitor redis server\n- YarnAppChecker: monitor yarn application. useful for monitor spark streaming application and flink streaming application\n\n## Style\n- Scala: The scala code use programing style from [databricks](https://github.com/databricks/scala-style-guide), and is integrated in to maven build lifestyle using [scalastyle-maven-plugin](http://www.scalastyle.org/)\n- Java: The scala code use programing style from [Apache Beam](https://github.com/apache/beam/blob/master/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml)and is integrated in to maven build lifestyle using maven-checkstyle-plugin\n\n##Run\nFlink jobs containing Java 8 lambdas with generics cannot be compiled with IntelliJ IDEA at the moment. What you have to do is to build the project on the cli using `mvn compile` with **Eclipse JDT compiler**. Once the program has been built via maven, you can also run it from within IntelliJ.\n\n\n## Build\n```shell\nmvn clean package -DskipTest -Pbuild-jar\n```\n\n## Contribute\n- Source Code: https://github.com/chaokunyang/awesome-bigdata-samples\n- Issue Tracker: https://github.com/chaokunyang/awesome-bigdata-samples/issues\n\n## LICENSE\nThis project is licensed under Apache License 2.0.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchaokunyang%2Fbigdata-examples","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchaokunyang%2Fbigdata-examples","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchaokunyang%2Fbigdata-examples/lists"}