{"id":18429087,"url":"https://github.com/lancedb/lance-deeplearning-recipes","last_synced_at":"2025-04-07T17:32:35.164Z","repository":{"id":232170828,"uuid":"777215598","full_name":"lancedb/lance-deeplearning-recipes","owner":"lancedb","description":"Deep Learning how-to's using Lance file format","archived":false,"fork":false,"pushed_at":"2024-09-18T11:11:57.000Z","size":932,"stargazers_count":15,"open_issues_count":3,"forks_count":5,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-22T21:51:09.599Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lancedb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-25T12:36:49.000Z","updated_at":"2025-01-02T07:51:20.000Z","dependencies_parsed_at":"2024-04-15T17:43:22.636Z","dependency_job_id":"b1342080-f4f3-4303-a31e-a2cc7190c910","html_url":"https://github.com/lancedb/lance-deeplearning-recipes","commit_stats":null,"previous_names":["lancedb/lance-deeplearning-recipes"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flance-deeplearning-recipes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flance-deeplearning-recipes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flance-deeplearning-recipes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flance-deeplearning-recipes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lancedb","download_url":"https://codeload.github.com/lancedb/lance-deeplearning-recipes/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247697926,"owners_count":20981273,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T05:15:47.230Z","updated_at":"2025-04-07T17:32:34.801Z","avatar_url":"https://github.com/lancedb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Lance Deep Learning - recipes\n\u003cbr /\u003e\nDive into building Deep learning pipelines using Lance datasets!\nThis repository contains examples to help you use Lance datasets for your Deep learning projects.\n\n- These are built using Lance, a free, open-source, columnar data format that **requires no setup**.\n\n- High-performance random access: More than **1000x faster** than Parquet.\n\n- Zero-copy, automatic versioning: manage versions of your data automatically, and reduce redundancy with zero-copy logic built-in.\n![318060905-d284accb-24b9-4404-8605-56483160e579](https://github.com/lancedb/lance-deeplearning-recipes/assets/15766192/8b350bf9-726e-45b8-ba23-dc8f2043c8aa)\n\n\u003cbr /\u003e\nJoin our community for support - \u003ca href=\"https://discord.gg/zMM32dvNtd\"\u003eDiscord\u003c/a\u003e •\n\u003ca href=\"https://twitter.com/lancedb\"\u003eTwitter\u003c/a\u003e\n\n---\n\u003ch3\u003e Why Lance \u003c/h3\u003e\n\u003cb\u003eConvinience\u003c/b\u003e \u003cbr /\u003e\nLance columnar file format is designed for large scale DL workloads. Columnar format allows you to easily and efficiently manage complex and unstructred multi-modal datasets Updation, filtering and zero-copy versioning allow you to iterate faster on large datasets. It’s designed to be used with images, videos, 3D point clouds, audio and of course tabular data. It supports any POSIX file systems, and cloud storage like AWS S3 and Google Cloud Storage\n\n\u003cbr /\u003e\u003cb\u003e Performance \u003c/b\u003e \u003cbr /\u003e\nLance format supports fast read/writes making your training time data loading significantly faster.\n\n## Dataset Examples\nExamples on how to convert existing datasets to Lance format.\n\n| Example \u0026nbsp; | Scripts \u0026nbsp; | Read The Blog!\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;|\n|-------- | ------------- | -------------   |\n| [Creating text dataset for LLM pre-training](/examples/wikitext-llm-dataset/) | \u003ca href=\"https://colab.research.google.com/github/lancedb/lance-deeplearning-recipes/blob/main/examples/wikitext-llm-dataset/wikitext-llm-dataset.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"\u003e\u003c/a\u003e | [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge\u0026logo=ghost\u0026logoColor=%23F7DF1E)](https://blog.lancedb.com/custom-dataset-for-llm-training-using-lance/)|\n| [Creating Instruction dataset for LLM fine-tuning](/examples/alpaca-dataset/) | \u003ca href=\"https://colab.research.google.com/github/lancedb/lance-deeplearning-recipes/blob/main/examples/alpaca-dataset/alpaca-dataset.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"\u003e\u003c/a\u003e |\n| [Creating Image Captioning Dataset for Multi-Modal Model Training](/examples/flickr8k-dataset/) | \u003ca href=\"https://colab.research.google.com/github/lancedb/lance-deeplearning-recipes/blob/main/examples/flickr8k-dataset/flickr8k-dataset.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"\u003e\u003c/a\u003e |\n\n\n## Training Examples\nPractical examples showcasing how to adapt your Lance dataset to popular deep learning projects. \n\n| Example \u0026nbsp; | Notebook \u0026 Scripts \u0026nbsp; |\n|-------- | ------------- |\n| [PEFT Supervised Fine-tuning of Gemma using Huggingface Trainer](/examples/sft-gemma-hindi/) | \u003ca href=\"https://colab.research.google.com/github/lancedb/lance-deeplearning-recipes/blob/main/examples/sft-gemma-hindi/sft_gemma_hindi.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"\u003e\u003c/a\u003e |\n| [LLM pre-training](/examples/llm-pretraining/) | \u003ca href=\"https://colab.research.google.com/github/lancedb/lance-deeplearning-recipes/blob/main/examples/llm-pretraining/llm-pretraining.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"\u003e\u003c/a\u003e |\n| [COCO Image segmentation](/examples/image-segmentation/) | \u003ca href=\"https://colab.research.google.com/github/lancedb/lance-deeplearning-recipes/blob/main/examples/image-segmentation/image-segmentation.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"\u003e\u003c/a\u003e |\n| [FSDP LLM pre-training](/examples/fsdp-llm-pretraining/) |\n| [Wikiart Diffusion Training](/examples/diffusion-training/) | \u003ca href=\"https://colab.research.google.com/github/lancedb/lance-deeplearning-recipes/blob/main/examples/diffusion-training/diffusion-training.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"\u003e\u003c/a\u003e |\n| [CLIP Training](/examples/clip-training/) | \u003ca href=\"https://colab.research.google.com/github/lancedb/lance-deeplearning-recipes/blob/main/examples/clip-training/clip-training.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"\u003e\u003c/a\u003e |\n| [Image Classification](/examples/image-classification/) | \u003ca href=\"https://colab.research.google.com/github/lancedb/lance-deeplearning-recipes/blob/main/examples/image-classification/image-classification.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"\u003e\u003c/a\u003e |\n| [Training a Variational AutoEncoder from scratch with Lance file format](/examples/variational-autoencoder/) | \u003ca href=\"https://colab.research.google.com/github/lancedb/lance-deeplearning-recipes/blob/main/examples/variational-autoencoder/vae-training.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"\u003e\u003c/a\u003e |\n\n## Contributing Examples\nIf you're working on some cool deep learning examples using Lance that you'd like to add to this repo, please open a PR! More detailed instructions on contributing can be found on the [CONTRIBUTING.md](https://github.com/lancedb/lance-deeplearning-recipes/blob/main/CONTRIBUTING.md) page.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancedb%2Flance-deeplearning-recipes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flancedb%2Flance-deeplearning-recipes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancedb%2Flance-deeplearning-recipes/lists"}