{"id":19705540,"url":"https://github.com/treeverse/lakefs-samples","last_synced_at":"2025-04-29T15:30:50.870Z","repository":{"id":49387017,"uuid":"462471122","full_name":"treeverse/lakeFS-samples","owner":"treeverse","description":"lakefs-samples repository","archived":false,"fork":false,"pushed_at":"2024-05-22T21:13:37.000Z","size":221758,"stargazers_count":64,"open_issues_count":3,"forks_count":22,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-05-22T21:34:23.844Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/treeverse.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-22T20:54:30.000Z","updated_at":"2024-06-05T04:39:03.136Z","dependencies_parsed_at":"2023-10-20T16:22:37.465Z","dependency_job_id":"fb6391cf-c4e3-45a0-902c-cc2333e1d69e","html_url":"https://github.com/treeverse/lakeFS-samples","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/treeverse%2FlakeFS-samples","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/treeverse%2FlakeFS-samples/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/treeverse%2FlakeFS-samples/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/treeverse%2FlakeFS-samples/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/treeverse","download_url":"https://codeload.github.com/treeverse/lakeFS-samples/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224178266,"owners_count":17268852,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T21:28:50.052Z","updated_at":"2024-11-11T21:28:50.790Z","avatar_url":"https://github.com/treeverse.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# lakefs-samples\n\n[![Check notebooks](https://github.com/treeverse/lakeFS-samples/actions/workflows/check-notebooks.yml/badge.svg?branch=main)](https://github.com/treeverse/lakeFS-samples/actions/workflows/check-notebooks.yml?query=branch:main)\n\n_Incorporating the Docker Compose formally known as **Everything Bagel**._\n\n![lakeFS logo](images/logo.png)\n\n**This sample repository captures a collection of notebooks, dockerized applications and code snippets that demonstrate how to use lakeFS.**\n\n_lakeFS is a popular open-source solution for managing data. It provides a consistent and scalable data management layer on top of cloud storage, such as Amazon S3, Azure Blob Storage, or Google Cloud Storage. It allows users to create and manage data in a version-controlled and immutable manner, and offers features such as data governance, data lineage, and data access controls. lakeFS is compatible with a wide range of data processing frameworks and tools._\n\n### **Go to [lakefs_enterprise](./02_lakefs_enterprise/) folder if you want to use [lakeFS Enterprise](https://docs.lakefs.io/understand/enterprise/) instead of lakeFS open source**\n\n\n## Let's Get Started 👩🏻‍💻\n\nClone this repository\n\n```bash\ngit clone https://github.com/treeverse/lakeFS-samples.git\ncd lakeFS-samples\n```\n\nYou now have two options: \n\n### **Run a Notebook server with your existing lakeFS Server**\n\nIf you have already [installed lakeFS](https://docs.lakefs.io/deploy/) or are utilizing [lakeFS cloud](https://lakefs.cloud/), all you need to run is the Jupyter notebook server:\n\n```bash\ndocker compose up\n```\n\nOnce the stack's up and running, open the Jupyter Notebook (http://localhost:8888) and check out the [catalog of sample notebooks](./00_notebooks/00_index.ipynb) to explore lakeFS. \n\nOnce you've finished, run the following to remove all the containers: \n\n```bash\ndocker compose down\n```\n\n### **Don't have a lakeFS Server or Object Store?**\n\nIf you want to provision a lakeFS server as well as MinIO for your object store, plus Jupyter then bring up the full stack:\n\n```bash\ndocker compose --profile local-lakefs up\n```\n\nAs above, open the Jupyter Notebook (http://localhost:8888) peruse the [catalog of sample notebooks](./00_notebooks/00_index.ipynb) to explore lakeFS. \n\n\n## Environment Details\n\n* **Jupyter Notebook** is based on the [Jupyter PySpark notebook](https://hub.docker.com/r/jupyter/pyspark-notebook/) and provides an interactive environment in which to explore lakeFS using Python and PySpark. \n* **lakeFS** can be provisioned as part of this environment, or provided by [lakeFS cloud](http://https://lakefs.cloud/) or your [own installation](https://docs.lakefs.io/deploy/).\n* If you run lakeFS as part of this environment, **MinIO** is provided as an S3-compatible object store. If you run lakeFS yourself you can use other S3-compatible object stores include S3, GCS, as well as MinIO\n\n### Containers\n\n![](images/containers.excalidraw.png)\n\n### URLs and login details\n\n* Jupyter http://localhost:8888/\n\nIf you've brought up the full stack you'll also have: \n\n* LakeFS http://localhost:8000/ (`AKIAIOSFOLKFSSAMPLES` / `wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY`)\n* MinIO http://localhost:9001/ (`minioadmin`/`minioadmin`)\n* Spark UI http://localhost:4040/\n\n## Other Examples\n\nUnder the [standalone_examples](./01_standalone_examples/) folder are a set of examples that need to be run on their own. Some use the repository's Docker Compose file and extend it, and others are self-contained and use their own Dockerfile. \n\n* [Airflow (1)](./01_standalone_examples/airflow-01/) - Four examples of using lakeFS with Airflow: \n    * Versioning DAGs and running pipeline from hooks using a configurable version of DAGs \n    * Isolating Airflow job run and atomic promotion to production\n    * Integration of lakeFS with Airflow via Hooks\n    * Troubleshooting production issues\n    * Integration of lakeFS with Airflow and Databricks\n    * Integration of lakeFS with Airflow and Iceberg\n* [Airflow (2)](./01_standalone_examples/airflow-02/) - lakeFS + Airflow\n* [Azure Databricks](./01_standalone_examples/azure-databricks/)\n* [AWS Databricks](./01_standalone_examples/aws-databricks/)\n* [Databricks CI/CD](./01_standalone_examples/databricks-ci-cd/)\n* [AWS Glue and Athena](./01_standalone_examples/aws-glue-athena/)\n* [AWS Glue and Trino](./01_standalone_examples/aws-glue-trino/)\n* [AWS Glue and Iceberg](./01_standalone_examples/aws-glue-iceberg/)\n* [lakeFS + Dagster](./01_standalone_examples/dagster-integration/)\n* [lakeFS + Prefect](./01_standalone_examples/prefect-integration/)\n* [Image Segmentation Demo: ML Data Version Control and Reproducibility at Scale](./01_standalone_examples/image-segmentation/)\n* [Labelbox integration](./01_standalone_examples/labelbox-integration/)\n* [Kafka integration](./01_standalone_examples/kafka/)\n* [Flink integration](./01_standalone_examples/flink/)\n* [How to migrate or clone a repo](./01_standalone_examples/migrate-or-clone-repo/)\n* [Running lakeFS with PostgreSQL as K/V store](./01_standalone_examples/docker-compose-with-postgres/)\n\n## Got Questions or Want to Chat?\n\n👉🏻 Join the lakeFS Slack group - https://lakefs.io/slack\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftreeverse%2Flakefs-samples","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftreeverse%2Flakefs-samples","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftreeverse%2Flakefs-samples/lists"}