{"id":20710421,"url":"https://github.com/mobiletelesystems/hadoop-docker","last_synced_at":"2025-10-15T02:21:04.766Z","repository":{"id":154167464,"uuid":"630924016","full_name":"MobileTeleSystems/hadoop-docker","owner":"MobileTeleSystems","description":"Docker image with Hadoop cluster","archived":false,"fork":false,"pushed_at":"2025-06-30T19:16:47.000Z","size":94,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-06-30T20:26:35.118Z","etag":null,"topics":["docker-compose-template","docker-image","hadoop"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MobileTeleSystems.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-04-21T13:27:10.000Z","updated_at":"2025-06-30T19:16:50.000Z","dependencies_parsed_at":"2023-12-25T21:09:49.049Z","dependency_job_id":"edf274af-c8be-4f68-8c19-c3686d691d97","html_url":"https://github.com/MobileTeleSystems/hadoop-docker","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/MobileTeleSystems/hadoop-docker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MobileTeleSystems%2Fhadoop-docker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MobileTeleSystems%2Fhadoop-docker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MobileTeleSystems%2Fhadoop-docker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MobileTeleSystems%2Fhadoop-docker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MobileTeleSystems","download_url":"https://codeload.github.com/MobileTeleSystems/hadoop-docker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MobileTeleSystems%2Fhadoop-docker/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265967094,"owners_count":23857065,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker-compose-template","docker-image","hadoop"],"created_at":"2024-11-17T02:11:54.480Z","updated_at":"2025-10-15T02:20:59.733Z","avatar_url":"https://github.com/MobileTeleSystems.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [Hadoop docker image](https://github.com/MobileTeleSystems/hadoop-docker)\n\n[![Repo Status](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n[![Build Status](https://github.com/MobileTeleSystems/hadoop-docker/workflows/Test%20build/badge.svg)](https://github.com/MobileTeleSystems/hadoop-docker/actions)\n[![Docker Pulls](https://img.shields.io/docker/pulls/mtsrus/hadoop)](https://hub.docker.com/r/mtsrus/hadoop)\n\n**Test purpose only!**\n\n## HDFS\n\nAll-in-one HDFS container with:\n\n* HDFS namenode\n* HDFS secondary namenode\n* HDFS datanode\n\n### Versions\n\n* `mtsrus/hadoop:hadoop2.7.3-hdfs`\n* `mtsrus/hadoop:hadoop2-hdfs` - same as above\n\n* `mtsrus/hadoop:hadoop3.3.6-hdfs`\n* `mtsrus/hadoop:hadoop3-hdfs` - same as above\n\n### Prerequisites\n\nMinimal resources could start with are:\n\n* 200m CPU\n* 700Mb RAM\n* 1Gb storage\n\n### Examples\n\nSee [docker-compose.yml](hdfs/docker-compose.yml).\n\n### Port numbers\n\nNOTE: Hadoop 2 image uses the same port numbers as Hadoop 3:\n\n- `9820:9820` - HDFS IPC\n- `9870:9870` - WebHDFS\n\n### Configuration\n\n#### `/var/hadoop/conf/*.xml` files\n\n##### Defaults\n\n* [/var/hadoop/conf/core-site.xml](hdfs/conf/hadoop/core-site.xml)\n* [/var/hadoop/conf/hdfs-site.xml](hdfs/conf/hadoop/hdfs-site.xml)\n\nYou can mount custom config files to `/var/hadoop/conf` directory inside container to override default Hadoop configuration.\n\n##### Substitutions\n\nThe following substitutions are replaced with proper values:\n\n* `{{hostname}}` - current hostname\n\n#### Container env variables\n\n* `WAIT_TIMEOUT_SECONDS=120` - timeout in seconds after starting each service to check if it is alive\n\n#### `/var/hadoop/conf/hadoop-env.sh` environment variables\n\n##### Defaults\n\n* `export HADOOP_HEAPSIZE=512` - max JVM memory in megabytes, applied for all Hadoop components (if no overrides)\n\nIf container fails with `OutOfMemory`, you should increase this value, e.g. up to `1024` or `2048`.\n\n##### Variables per service\n\n* `export HADOOP_NAMENODE_OPTS=-Xmx2048m` - max JVM memory for Namenode\n* `export HADOOP_SECONDARYNAMENODE_OPTS=-Xmx2048m` - max JVM memory for Secondary Namenode\n* `export HADOOP_DATANODE_OPTS=-Xmx1024m` - max JVM memory for Datanode\n\nSee https://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_Environment_of_Hadoop_Daemons\n\n## Yarn\n\nAll-in-one Yarn container with:\n\n* HDFS namenode\n* HDFS secondary namenode\n* HDFS datanode\n* Yarn ResourceManager\n* Yarn NodeManager\n* MapReduce JobHistory server (if `WITH_JOBHISTORY_SERVER=true`)\n\n### Versions\n\n* `mtsrus/hadoop:hadoop2.7.3-yarn`\n* `mtsrus/hadoop:hadoop2-yarn` - same as above\n\n* `mtsrus/hadoop:hadoop3.3.6-yarn`\n* `mtsrus/hadoop:hadoop3-yarn` - same as above\n\n### Prerequisites\n\nMinimal resources could start with are:\n\n* 400m CPU\n* 1Gb RAM\n* 1Gb storage\n\n### Examples\n\nSee [docker-compose.yml](yarn/docker-compose.yml).\n\n### Port numbers\n\nNOTE: Hadoop 2 image uses the same port numbers as Hadoop 3:\n\n- `9820:9820` - HDFS IPC\n- `9870:9870` - HDFS WebHDFS\n- `8042:8042` - NodeManager UI\n- `8088:8088` - Yarn UI\n\nif `WITH_JOBHISTORY_SERVER=true`:\n- `10020:10020` - MapReduce JobServer\n- `19888:19888` - MapReduce JobServer History\n\n### Configuration\n\n#### `/var/hadoop/conf/*.xml` files\n\n##### Default\n\n* [/var/hadoop/conf/core-site.xml](hdfs/conf/hadoop/core-site.xml)\n* [/var/hadoop/conf/hdfs-site.xml](hdfs/conf/hadoop/hdfs-site.xml)\n* [/var/hadoop/conf/yarn-site.xml](yarn/conf/hadoop/yarn-site.xml)\n* [/var/hadoop/conf/capacity-scheduler.xml](yarn/conf/hadoop/capacity-scheduler.xml)\n* [/var/hadoop/conf/mapred-site.xml](yarn/conf/hadoop/mapred-site.xml)\n\nYou can mount custom config files to `/var/hadoop/conf` directory inside container to override default Hadoop configuration.\n\n##### Substitutions\n\nThe following substitutions are replaced with proper values:\n\n* `{{hostname}}` - current hostname\n\n#### Container env variables\n\n* `WAIT_TIMEOUT_SECONDS=120` - ti_meout in seconds after starting each service to check if it is alive\n* `WITH_JOBHISTORY_SERVER=false` - set to `true` to start MapReduce JobHistory server\n\n#### `/var/hadoop/conf/hadoop-env.sh` environment variables\n\n\nSee HDFS image documentation.\n\n#### `/var/hadoop/conf/yarn-env.sh` environment variables\n\n* `export YARN_RESOURCEMANAGER_OPTS=-Xmx1024m` - max JVM memory for Yarn ResourceManager\n* `export YARN_NODEMANAGER_OPTS=-Xmx1024m` - max JVM memory for NodeManager\n* `export HADOOP_JOB_HISTORYSERVER_OPTS=-Xmx1024m` - max JVM memory for MapReduce JobHistory server\n\nSee https://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_Environment_of_Hadoop_Daemons\n\n\n## Hive\n\nAll-in-one Hive container with:\n\n* HDFS namenode\n* HDFS secondary namenode\n* HDFS datanode\n* Yarn ResourceManager\n* Yarn NodeManager\n* MapReduce JobHistory server\n* Hive server\n* Hive Metastore server\n\n### Versions\n\n* `mtsrus/hadoop:hadoop2.7.3-hive2.3.10`\n* `mtsrus/hadoop:hadoop2-hive` - same as above\n\n* `mtsrus/hadoop:hadoop3.3.6-hive3.1.3`\n* `mtsrus/hadoop:hadoop3-hive` - same as above\n\n### Prerequisites\n\nMinimal resources could start with are:\n\n* 500m CPU\n* 2Gb RAM\n* 1Gb storage\n* Running RDBMS (e.g. Postgres) instance to operate Metastore\n\n### Examples\n\nSee [docker-compose.yml](hive/docker-compose.yml).\n\n### Port numbers\n\nNOTE: Hadoop 2 image uses the same port numbers as Hadoop 3:\n\n- `9820:9820` - HDFS IPC\n- `9870:9870` - HDFS WebHDFS\n\nif `WITH_HIVE_SERVER=true`:\n  - `8042:8042` - NodeManager UI\n  - `8088:8088` - Yarn UI\n  - `19888:19888` - MapReduce JobServer History\n  - `10000:10000` - Hive server\n  - `10002:10002` - Hive Admin UI\n\nif `WITH_HIVE_METASTORE_SERVER=true`:\n  - `9083:9083` - Hive Metastore server\n\n### Configuration\n\n#### `/var/hive/conf/*.xml` and `var/hadoop/conf/*.xml` files\n\n##### Defaults\n\n* [/var/hadoop/conf/hive-site.xml](hive/conf/hive/hive-site.xml)\n\nYou can mount custom config files to `/var/hive/conf` directory inside container to override default Hive configuration.\n\nHDFS and Yarn configs still can be passed to `var/hadoop/conf` directory.\n\n##### Substitutions\n\nThe following substitutions are replaced with proper values:\n\n* `{{hostname}}` - current hostname\n* `{{HIVE_METASTORE_DB_URL}}` - `HIVE_METASTORE_DB_URL` env variable (default `jdbc:postgresql://postgres:5432/metastore`)\n* `{{HIVE_METASTORE_DB_DRIVER}}` - `HIVE_METASTORE_DB_DRIVER` env variable (default `org.postgresql.Driver`)\n* `{{HIVE_METASTORE_DB_USER}}` - `HIVE_METASTORE_DB_USER` env variable (default `hive`)\n* `{{HIVE_METASTORE_DB_PASSWORD}}` - `HIVE_METASTORE_DB_PASSWORD` env variable (default `hive`)\n\n#### Metastore database\n\nHive stores metadata in `{{HIVE_METASTORE_DB_URL}}` using driver from `{{HIVE_METASTORE_DB_DRIVER}}`. By default, Postgres is used.\n\nYou can change URL components by setting environment variables mentioned above, or replace the entire URL by updating the `/var/hive/conf/hive-site.xml` file.\n\nYou can also use any other supported RDMBS, like MySQL, by changing connection URL and embedding/mounting JDBC driver to `/opt/hive/lib/drivername.jar` path inside container. Postgres JDBC driver is already embedded into image.\n\n#### Container env variables\n\n* `WAIT_TIMEOUT_SECONDS=120` - timeout in seconds after starting each service to check if it is alive\n* `WITH_HIVE_SERVER=true` - set to `false` to disable Hive server\n* `WITH_HIVE_METASTORE_SERVER=true` - set to `false` to disable Hive metastore server\n\n#### `/var/hadoop/conf/hadoop-env.sh` environment variables\n\nSee HDFS image documentation.\n\n#### `/var/hadoop/conf/yarn-env.sh` environment variables\n\nSee Yarn image documentation.\n\n#### `/var/hive/conf/hive-env.sh` environment variables\n\n* `export HIVE_SERVER2_HEAPSIZE=256` - max JVM memory in megabytes for Hive server\n* `export HIVE_METASTORE_HEAPSIZE=256` - max JVM memory in megabytes for Hive metastore server\n\nSee https://www.alibabacloud.com/help/en/emr/emr-on-ecs/user-guide/modify-the-memory-parameters-of-hive\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmobiletelesystems%2Fhadoop-docker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmobiletelesystems%2Fhadoop-docker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmobiletelesystems%2Fhadoop-docker/lists"}