{"id":20129544,"url":"https://github.com/databricks/docker-spark-iceberg","last_synced_at":"2025-04-08T08:17:43.568Z","repository":{"id":40380788,"uuid":"417250991","full_name":"databricks/docker-spark-iceberg","owner":"databricks","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-30T11:34:16.000Z","size":18640,"stargazers_count":296,"open_issues_count":14,"forks_count":155,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-04-01T07:51:16.373Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databricks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-14T19:03:28.000Z","updated_at":"2025-03-30T00:48:34.000Z","dependencies_parsed_at":"2023-12-10T13:26:35.787Z","dependency_job_id":"81548838-c7d9-49b3-a680-9e718d0b8413","html_url":"https://github.com/databricks/docker-spark-iceberg","commit_stats":null,"previous_names":["databricks/docker-spark-iceberg","tabular-io/docker-spark-iceberg"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fdocker-spark-iceberg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fdocker-spark-iceberg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fdocker-spark-iceberg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fdocker-spark-iceberg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databricks","download_url":"https://codeload.github.com/databricks/docker-spark-iceberg/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247801175,"owners_count":20998339,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T20:34:59.841Z","updated_at":"2025-04-08T08:17:43.528Z","avatar_url":"https://github.com/databricks.png","language":"Jupyter Notebook","readme":"\u003c!--\n Licensed to the Apache Software Foundation (ASF) under one\n or more contributor license agreements.  See the NOTICE file\n distributed with this work for additional information\n regarding copyright ownership.  The ASF licenses this file\n to you under the Apache License, Version 2.0 (the\n \"License\"); you may not use this file except in compliance\n with the License.  You may obtain a copy of the License at\n\n   http://www.apache.org/licenses/LICENSE-2.0\n\n Unless required by applicable law or agreed to in writing,\n software distributed under the License is distributed on an\n \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n KIND, either express or implied.  See the License for the\n specific language governing permissions and limitations\n under the License.\n--\u003e\n\n# Spark + Iceberg Quickstart Image\n\nThis is a docker compose environment to quickly get up and running with a Spark environment and a local REST\ncatalog, and MinIO as a storage backend.\n\n**note**: If you don't have docker installed, you can head over to the [Get Docker](https://docs.docker.com/get-docker/)\npage for installation instructions.\n\n## Usage\nStart up the notebook server by running the following.\n```\ndocker-compose up\n```\n\nThe notebook server will then be available at http://localhost:8888\n\nWhile the notebook server is running, you can use any of the following commands if you prefer to use spark-shell, spark-sql, or pyspark.\n```\ndocker exec -it spark-iceberg spark-shell\n```\n```\ndocker exec -it spark-iceberg spark-sql\n```\n```\ndocker exec -it spark-iceberg pyspark\n```\n\nTo stop everything, just run `docker-compose down`.\n\n## Troubleshooting \u0026 Maintenance\n\n### Refreshing Docker Image\n\nThe prebuilt spark image is uploaded to Dockerhub. Out of convenience, the image tag defaults to `latest`.\n\nIf you have an older version of the image, you might need to remove it to upgrade.\n```bash\ndocker image rm tabulario/spark-iceberg \u0026\u0026 docker-compose pull\n```\n\n### Building the Docker Image locally\n\nIf you want to make changes to the local files, and test them out, you can build the image locally and use that instead:\n\n```bash\ndocker image rm tabulario/spark-iceberg \u0026\u0026 docker-compose build\n```\n\n### Use `Dockerfile` In This Repo\n\nTo directly use the Dockerfile in this repo (as opposed to pulling the pre-build `tabulario/spark-iceberg` image), you can use `docker-compose build`.\n\n### Deploying Changes\n\nTo deploy changes to the hosted docker image `tabulario/spark-iceberg`, run the following. (Requires access to the tabulario docker hub account)\n\n```sh\ncd spark\ndocker buildx build -t tabulario/spark-iceberg --platform=linux/amd64,linux/arm64 . --push\n```\n\n---\n\nFor more information on getting started with using Iceberg, checkout\nthe [Quickstart](https://iceberg.apache.org/spark-quickstart/) guide in the official docs.\n\nThe repository for the docker image is [located on dockerhub](https://hub.docker.com/r/tabulario/spark-iceberg).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fdocker-spark-iceberg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabricks%2Fdocker-spark-iceberg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fdocker-spark-iceberg/lists"}