An open API service indexing awesome lists of open source software.

https://github.com/lancedb/lance-deeplearning-recipes

Deep Learning how-to's using Lance file format
https://github.com/lancedb/lance-deeplearning-recipes

Last synced: 9 months ago
JSON representation

Deep Learning how-to's using Lance file format

Awesome Lists containing this project

README

          

# Lance Deep Learning - recipes


Dive into building Deep learning pipelines using Lance datasets!
This repository contains examples to help you use Lance datasets for your Deep learning projects.

- These are built using Lance, a free, open-source, columnar data format that **requires no setup**.

- High-performance random access: More than **1000x faster** than Parquet.

- Zero-copy, automatic versioning: manage versions of your data automatically, and reduce redundancy with zero-copy logic built-in.
![318060905-d284accb-24b9-4404-8605-56483160e579](https://github.com/lancedb/lance-deeplearning-recipes/assets/15766192/8b350bf9-726e-45b8-ba23-dc8f2043c8aa)



Join our community for support - Discord
Twitter

---

Why Lance


Convinience

Lance columnar file format is designed for large scale DL workloads. Columnar format allows you to easily and efficiently manage complex and unstructred multi-modal datasets Updation, filtering and zero-copy versioning allow you to iterate faster on large datasets. It’s designed to be used with images, videos, 3D point clouds, audio and of course tabular data. It supports any POSIX file systems, and cloud storage like AWS S3 and Google Cloud Storage


Performance

Lance format supports fast read/writes making your training time data loading significantly faster.

## Dataset Examples
Examples on how to convert existing datasets to Lance format.

| Example   | Scripts   | Read The Blog!       |
|-------- | ------------- | ------------- |
| [Creating text dataset for LLM pre-training](/examples/wikitext-llm-dataset/) | Open In Colab | [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/custom-dataset-for-llm-training-using-lance/)|
| [Creating Instruction dataset for LLM fine-tuning](/examples/alpaca-dataset/) | Open In Colab |
| [Creating Image Captioning Dataset for Multi-Modal Model Training](/examples/flickr8k-dataset/) | Open In Colab |

## Training Examples
Practical examples showcasing how to adapt your Lance dataset to popular deep learning projects.

| Example   | Notebook & Scripts   |
|-------- | ------------- |
| [PEFT Supervised Fine-tuning of Gemma using Huggingface Trainer](/examples/sft-gemma-hindi/) | Open In Colab |
| [LLM pre-training](/examples/llm-pretraining/) | Open In Colab |
| [COCO Image segmentation](/examples/image-segmentation/) | Open In Colab |
| [FSDP LLM pre-training](/examples/fsdp-llm-pretraining/) |
| [Wikiart Diffusion Training](/examples/diffusion-training/) | Open In Colab |
| [CLIP Training](/examples/clip-training/) | Open In Colab |
| [Image Classification](/examples/image-classification/) | Open In Colab |
| [Training a Variational AutoEncoder from scratch with Lance file format](/examples/variational-autoencoder/) | Open In Colab |

## Contributing Examples
If you're working on some cool deep learning examples using Lance that you'd like to add to this repo, please open a PR! More detailed instructions on contributing can be found on the [CONTRIBUTING.md](https://github.com/lancedb/lance-deeplearning-recipes/blob/main/CONTRIBUTING.md) page.