{"id":15208893,"url":"https://github.com/chabane/bigdata-playground","last_synced_at":"2025-04-13T08:25:12.442Z","repository":{"id":146751375,"uuid":"114044038","full_name":"Chabane/bigdata-playground","owner":"Chabane","description":"A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL","archived":false,"fork":false,"pushed_at":"2019-02-01T10:08:13.000Z","size":3230,"stargazers_count":209,"open_issues_count":7,"forks_count":74,"subscribers_count":18,"default_branch":"master","last_synced_at":"2025-03-17T12:08:13.442Z","etag":null,"topics":["angular","apache-flink","apache-spark","avro","big-data","docker","graphql","hadoop","hbase","kafka","kops","machine-learning","mongodb","nodejs","parquet","python","scala","spark-sql","spark-streaming","twitter-api"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Chabane.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-12-12T22:06:34.000Z","updated_at":"2025-03-08T08:42:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"8bab4b46-891e-4c21-804e-29619b65e4f9","html_url":"https://github.com/Chabane/bigdata-playground","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chabane%2Fbigdata-playground","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chabane%2Fbigdata-playground/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chabane%2Fbigdata-playground/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chabane%2Fbigdata-playground/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Chabane","download_url":"https://codeload.github.com/Chabane/bigdata-playground/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245755674,"owners_count":20667027,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["angular","apache-flink","apache-spark","avro","big-data","docker","graphql","hadoop","hbase","kafka","kops","machine-learning","mongodb","nodejs","parquet","python","scala","spark-sql","spark-streaming","twitter-api"],"created_at":"2024-09-28T07:03:20.758Z","updated_at":"2025-03-27T00:11:18.169Z","avatar_url":"https://github.com/Chabane.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bigdata Playground\n\nThe aim is to create a Batch/Streaming/ML/WebApp stack where you can test your jobs locally or to submit them to the Yarn resource manager. We are using Docker to build the environment and Docker-Compose to provision it with the required components (Next step using Kubernetes). Along with the infrastructure, We are check that it works with 4 projects that just probes everything is working as expected. The boilerplate is based on a sample search flight Web application.\n\n## Installation\nIf you are on mac then, you can use package manager like `brew` to install `sbt` on your machine:\n\n```bash\n$ brew install sbt\n```\n\nFor other systems, you can refer to manual instructions from `sbt` website http://www.scala-sbt.org/0.13/tutorial/Manual-Installation.html. \n\nIf you are on mac then, you can use package manager like `brew` to install `maven` on your machine:\n```bash\n$ brew install maven\n```\nFor other systems, you can refer to manual instructions from `maven` website https://maven.apache.org/install.html. \n\nInstall Docker by following the instructions for \u003ca href='https://docs.docker.com/mac/step_one/'\u003emac\u003c/a\u003e, \u003ca href='https://docs.docker.com/linux/step_one/'\u003elinux\u003c/a\u003e, or \u003ca href='https://docs.docker.com/windows/step_one/'\u003ewindows\u003c/a\u003e.\n\n```\ndocker network create vnet\nnpm install yarn -g\ncd webapp \u0026\u0026 yarn \u0026\u0026 cd client \u0026\u0026 yarn \u0026\u0026 cd ../server \u0026\u0026 yarn \u0026\u0026 cd ../ \u0026\u0026 npm run build:dev \u0026\u0026 cd ../\ncd batch/spark \u0026\u0026 sbt clean package assembly \u0026\u0026 cd ../..\n\ncd batch/hadoop \u0026\u0026 mvn clean package \u0026\u0026 cd ../..\ncd streaming/spark \u0026\u0026 sbt clean assembly \u0026\u0026 cd ../..\ncd streaming/flink \u0026\u0026 sbt clean assembly \u0026\u0026 cd ../..\ncd streaming/storm \u0026\u0026 mvn clean package \u0026\u0026 cd ../..\ncd docker\ndocker-compose -f mongo.yml -f zookeeper.yml -f kafka.yml -f hadoop-hbase.yml -f flink.yml up -d\ndocker-compose -f dev/webapp.yml up -d\ndocker-compose -f dev/batch-spark.yml up -d\ndocker-compose -f dev/batch-hadoop.yml up -d\ndocker-compose -f dev/streaming-spark.yml up -d\ndocker-compose -f dev/streaming-flink.yml up -d\ndocker-compose -f dev/streaming-storm.yml up -d\n```\nCreate your Twitter app on https://apps.twitter.com\n```\nexport TWITTER_CONSUMER_KEY=\u003cTWITTER_CONSUMER_KEY\u003e\nexport TWITTER_CONSUMER_SECRET=\u003cTWITTER_CONSUMER_SECRET\u003e\nexport TWITTER_CONSUMER_ACCESS_TOKEN=\u003cTWITTER_CONSUMER_ACCESS_TOKEN\u003e\nexport TWITTER_CONSUMER_ACCESS_TOKEN_SECRET=\u003cTWITTER_CONSUMER_ACCESS_TOKEN_SECRET\u003e\ndocker-compose -f dev/ml-spark.yml up -d\n```\n\n## Interactions / OnGoing\n\u003cimg src='https://image.ibb.co/eOuL5H/search_flight_simple_v4.png'/\u003e\n\n## Contributing\n`Pull requests` are welcome.\n\n## Support\nPlease raise tickets for issues and improvements at https://github.com/Chabane/bigdata-playground/issues\n\n## License\nThis example is released under version 2.0 of the [Apache License](LICENSE).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchabane%2Fbigdata-playground","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchabane%2Fbigdata-playground","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchabane%2Fbigdata-playground/lists"}