{"id":20971109,"url":"https://github.com/datalayer/examples","last_synced_at":"2025-05-14T11:33:50.181Z","repository":{"id":206213152,"uuid":"705234667","full_name":"datalayer/examples","owner":"datalayer","description":"Ξ Examples for Datalayer.","archived":false,"fork":false,"pushed_at":"2025-03-23T14:05:12.000Z","size":16484,"stargazers_count":7,"open_issues_count":1,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-02T17:38:32.467Z","etag":null,"topics":["datalayer","examples"],"latest_commit_sha":null,"homepage":"https://datalayer.io","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datalayer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["datalayer"]}},"created_at":"2023-10-15T12:51:46.000Z","updated_at":"2025-03-23T14:05:17.000Z","dependencies_parsed_at":null,"dependency_job_id":"eeb539a5-c26d-4306-bdab-37212fb43436","html_url":"https://github.com/datalayer/examples","commit_stats":null,"previous_names":["datalayer/examples"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datalayer%2Fexamples","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datalayer%2Fexamples/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datalayer%2Fexamples/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datalayer%2Fexamples/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datalayer","download_url":"https://codeload.github.com/datalayer/examples/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254131888,"owners_count":22020035,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datalayer","examples"],"created_at":"2024-11-19T04:00:52.597Z","updated_at":"2025-05-14T11:33:45.148Z","avatar_url":"https://github.com/datalayer.png","language":"Jupyter Notebook","funding_links":["https://github.com/sponsors/datalayer"],"categories":[],"sub_categories":[],"readme":"[![Datalayer](https://assets.datalayer.tech/datalayer-25.svg)](https://datalayer.io)\n\n[![Become a Sponsor](https://img.shields.io/static/v1?label=Become%20a%20Sponsor\u0026message=%E2%9D%A4\u0026logo=GitHub\u0026style=flat\u0026color=1ABC9C)](https://github.com/sponsors/datalayer)\n\n# Ξ Datalayer Examples\n\nThis repository contains Jupyter notebook examples showcasing scenarios where [Datalayer](https://datalayer.io) proves highly beneficial. Datalayer allows you to **scale Jupyter Kernels** from your local JupyterLab or CLI to the cloud, providing the capability to run your code on **powerful GPU(s) and CPU(s)**. 🚀\n\nThe [Technical validation](#technical-validation) section delves into system checks and performance benchmarks to ensure optimal GPU and CPU utilization, while the [Use cases](#use-cases) section explores typical AI scenarios where scaling proves essential.\n\n💡 Note that you can use any notebook within Datalayer without requiring any code changes.\n\n## Getting started \n\n```bash\npip install datalayer jupyterlab\ngit clone https://github.com/datalayer/examples.git datalayer-examples\ncd datalayer-examples\njupyter lab\n```\n\nRead the [documentation website](https://docs.datalayer.io) to know more about how setup Datalayer. Don't worry, it is easy 👍 \u003cbr /\u003eYou just need to install the package, open JupyterLab, click on the `Jupyter kernels` tile in the JupyterLab launcher,  create an account, wait a bit for your Kernels to be ready, and then just assign a Remote Kernel from any Notebook kernel picker.\n\n\u003cimg alt=\"Notebook remote execution\" src=\"https://datalayer-assets.s3.us-west-2.amazonaws.com/examples/user-flow-1.png\" width=\"900\" /\u003e\n\n## Technical validation\n\n1. [GPU sanity checks](#gpu-sanity-checks)\n1. [Performance comparison of CPU and GPU serial and parallel execution](#performance-comparison-of-cpu-and-gpu-serial-and-parallel-execution)\n\n### 1. [GPU sanity checks](https://github.com/datalayer/examples/tree/main/gpu-check)\n\nThis notebook contains scripts and tests to perform GPU sanity checks using PyTorch and CUDA. The primary goal of these checks is to **ensure** that the **GPU resources meet the expected requirements**.\n\n### 2. [Performance comparison of CPU and GPU serial and parallel execution](https://github.com/datalayer/examples/tree/main/parallel-comparison)\n\nThis notebook explores the performance **differences between serial and parallel execution on CPU and GPU** using PyTorch. We'll compare the execution times of **intensive computational tasks** performed sequentially on CPU and GPU, as well as in parallel configurations.\n\n## Use cases\n\n1. [Face detection on YouTube video with OpenCV](#opencv-face-detection)\n1. [Image classification model training with fast.ai](#image-classifier-with-fastai)\n1. ['Personalized' text-to-image model creation with Dreambooth](#dreambooth)\n1. [Text generation using the Transformers library](#text-generation-with-transformers)\n1. [Instruction tuning for Mistral 7B on Alpaca dataset](#mistral-instruction-tuning)\n\n### 1. [OpenCV Face Detection](https://github.com/datalayer/examples/tree/main/opencv-face-detection)\n\nThis example utilizes **OpenCV** for **detecting faces** in YouTube videos. It uses a traditional Haar Cascade model, which may have limitations in accuracy compared to modern deep learning-based models. It also utilizes **parallel computing across multiple CPUs** to accelerate face detection and video processing tasks, optimizing performance and efficiency. Datalayer further enhances this capability by enabling seamless scaling across multiple CPUs.\n\n\u003cdiv style=\"display: flex;\"\u003e\n    \u003cimg src=\"https://datalayer-assets.s3.us-west-2.amazonaws.com/examples/rick-ashley-1.png\" style=\"width: 20%;\"\u003e\n    \u003cimg src=\"https://datalayer-assets.s3.us-west-2.amazonaws.com/examples/rick-ashley-2.png\" style=\"width: 20%;\"\u003e\n\u003c/div\u003e\n\n### 2. [Image Classifier with Fast.ai](https://github.com/datalayer/examples/tree/main/fastai-classifier)\n\nThis example demonstrates how to build a model that **distinguishes cats from dogs** in pictures using the fast.ai library. Due to the computational demands of training a model, a **GPU is required**. \n\n\u003cimg src=\"https://miro.medium.com/v2/resize:fit:1400/format:webp/1*rAbCk0T4rksShBcPQjWC0A.gif\" width=\"400\"/\u003e\n\n### 3. [Dreambooth](https://github.com/datalayer/examples/tree/main/dreambooth)\n\nThis example uses the Dreambooth method which takes as input a few images (typically 3-5 images suffice) of a subject (e.g., a specific dog) and the corresponding class name (e.g. \"dog\"), and returns a **fine-tuned/'personalized' text-to-image model** (source: [Dreambooth](https://dreambooth.github.io/)). To do this fune-tuning process, **GPU is required**.\n\n\u003cimg src=\"https://dreambooth.github.io/DreamBooth_files/accessories.png\" width=\"500\"/\u003e\n\n### 4. [Text Generation with Transformers](https://github.com/datalayer/examples/tree/main/transformers-text-generation)\n\nThose notebook examples demonstrate how to leverage Datalayer's **GPU kernels** to accelerate text generation using **Gemma** model and the HuggingFace Transformers library.\n\n\u003cimg src=\"https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo-with-title.png\" width=\"200\"/\u003e\n\n#### [Transformers Text Generation](https://github.com/datalayer/examples/tree/main/transformers-text-generation/transformers-text-generation.ipynb)\n\nThis notebook uses Gemma-7b and Gemma-7b-it which is the instruct fine-tuned version of Gemma-7b.\n\n#### [Sentiment Analysis with Gemma](https://github.com/datalayer/examples/tree/main/transformers-text-generation/gemma-sentiment-analysis.ipynb)\n\nThis example demonstrates how you can leverage Datalayer's [**Cell Kernels**](https://github.com/datalayer/examples?tab=readme-ov-file#cell-kernel) feature on JupyterLab to **offload specific tasks**, such as sentiment analysis, **to a remote GPU** while keeping the rest of your code running locally. By selectively using remote resources, you can **optimize both performance and cost**. This hybrid approach is perfect for tasks like sentiment analysis via llm where some parts of the code require more computational resources than others. For a detailed explanation and step-by-step guide on using Cell Kernels, check out our [blog post](https://datalayer.blog/2024/08/23/cell-kernels) on this specific example.\n\n### 5. [Mistral Instruction Tuning](https://github.com/datalayer/examples/tree/main/mistral-instruct-tuning)\n\n**Mistral 7B** is a large language model (LLM) that contains 7.3 billion parameters and is one of the most powerful models for its size. However, this base model is not instruction-tuned, meaning it may struggle to follow instructions and perform specific tasks. By fine-tuning Mistral 7B on the Alpaca dataset using [**torchtune**](https://github.com/pytorch/torchtune), the model will significantly improve its capabilities to perform tasks such as conversation and answering questions accurately. Due to the computational demands of fine-tuning a model, a **GPU is required**.\n\n\u003cimg src=\"https://assets.datalayer.tech/examples/llm-fine-tuning.png\" width=\"500\"/\u003e\n\n## Datalayer Advanced Features\n\n### CLI Execution\n\nDatalayer supports the remote execution of code using the **CLI**. Refer to this [page](https://docs.datalayer.io/cli/) for more information.\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ci\u003eCLI Remote Execution\u003c/i\u003e\u003c/summary\u003e\n\n\u003cimg alt=\"CLI remote execution\" src=\"https://datalayer-assets.s3.us-west-2.amazonaws.com/examples/CLI.png\" width=\"800\" /\u003e\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ci\u003eSharing State between Notebook and CLI\u003c/i\u003e\u003c/summary\u003e\n\n\u003cimg alt=\"Remote Notebook Execution\" src=\"https://datalayer-assets.s3.us-west-2.amazonaws.com/examples/SharingState.png\" width=\"800\" /\u003e\n\nWhen using the same Kernel, variables defined in a notebook can be used in the CLI and vice versa. This holds also true when using multiple notebooks connected to the same kernel, for example.\n\n\u003c/details\u003e\n\n### Cell Kernel\n\nDatalayer offers the possibility to use **cell-specific Kernels**, allowing you to execute specific cells with different kernels. This feature **optimizes costs** by enabling you to, for example, leverage the local CPU for data preparation and reserving the powerful (and often more expensive) GPU resources for intensive computations. \n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ci\u003eCell Kernel execution\u003c/i\u003e\u003c/summary\u003e\n\n\u003cimg alt=\"Cell Kernel Execution\" src=\"https://assets.datalayer.tech/examples/cell-picker.gif\" width=\"800\" /\u003e\n\nThe remote GPU Kernel is utilized only for the duration of the cell computation, minimizing costs.\n\n\u003c/details\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatalayer%2Fexamples","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatalayer%2Fexamples","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatalayer%2Fexamples/lists"}