{"id":24446879,"url":"https://github.com/mvinyard/lightning-tutorial","last_synced_at":"2025-04-30T11:49:25.247Z","repository":{"id":59315452,"uuid":"533099826","full_name":"mvinyard/lightning-tutorial","owner":"mvinyard","description":"PyTorch-Lightning Tutorial","archived":false,"fork":false,"pushed_at":"2022-09-16T14:32:32.000Z","size":40216,"stargazers_count":9,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-01T03:22:56.380Z","etag":null,"topics":["ai","deep-learning","deep-neural-networks","machine-learning","pytorch","pytorch-lightning","tutorial"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mvinyard.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-09-06T00:11:22.000Z","updated_at":"2022-10-24T02:18:48.000Z","dependencies_parsed_at":"2023-01-18T10:17:41.902Z","dependency_job_id":null,"html_url":"https://github.com/mvinyard/lightning-tutorial","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvinyard%2Flightning-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvinyard%2Flightning-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvinyard%2Flightning-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvinyard%2Flightning-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mvinyard","download_url":"https://codeload.github.com/mvinyard/lightning-tutorial/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243505898,"owners_count":20301619,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","deep-learning","deep-neural-networks","machine-learning","pytorch","pytorch-lightning","tutorial"],"created_at":"2025-01-21T00:01:25.334Z","updated_at":"2025-03-14T01:19:12.182Z","avatar_url":"https://github.com/mvinyard.png","language":"Jupyter Notebook","readme":"# ⚡ lightning-tutorial\n\n[![PyPI pyversions](https://img.shields.io/pypi/pyversions/lightning-tutorial.svg)](https://pypi.python.org/pypi/lightning-tutorial/)\n[![PyPI version](https://badge.fury.io/py/lightning-tutorial.svg)](https://badge.fury.io/py/lightning-tutorial)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n### Installation of the partner package\n\n```BASH\npip install lightning-tutorial\n```\n\n### Table of contents\n\n* [PyTorch Datasets and DataLoaders](#pytorch-datasets-and-dataloaders)\n    * Key module: `torch.utils.data.Dataset`\n    * Key module: `torch.utils.data.DataLoader`\n    * Other essential functions\n    \n* [Single-cell data structures meet pytorch: `torch-adata`](#single-cell-data-structures-meet-pytorch-torch-adata)\n* [Lightning basics and the `LightningModule`](#lightning-basics-and-the-lightningmodule)\n* [`LightningDataModule`](#lightningdatamodule)\n\n\n## PyTorch Datasets and DataLoaders\n\n### Key module: `torch.utils.data.Dataset`\n\nThe `Dataset` module is an overwritable python module. You can modify it at will as long as you maintain the following three class methods:\n1. `__init__`\n2. `__len__`\n3. `__getitem__`\n\nThese are name-specific handles used by `torch` under the hood when passing data through a model.\n\n```python\nfrom torch.utils.data import Dataset\n\nclass TurtleData(Dataset):\n    def __init__(self):\n        \"\"\"\n        here we should pass requisite arguments\n        that enable __len__() and __getitem__()\n        \"\"\"\n        \n    def __len__(self):\n        \"\"\"\n        Returns the length/size/# of samples in the dataset.\n        e.g., a 20,000 cell dataset would return `20_000`.\n        \"\"\"\n        return # len\n    \n    def __getitem__(self, idx):\n        \"\"\"\n        Subset and return a batch of the data.\n        \n        `idx` is the batch index (# of idx values = batch size). \n        Maximum `idx` passed is \u003c= `self.__len__()`\n        \"\"\"\n        return # sampled data\n```\n\n* [Fantastic PyTorch `Dataset` tutorial from Stanford](https://stanford.edu/~shervine/blog/pytorch-how-to-generate-data-parallel)\n\n* **Try it for yourself!** [**Colab `Dataset` tutorial notebook**](https://colab.research.google.com/github/mvinyard/lightning-tutorial/blob/main/notebooks/tutorial_nb.01.pytorch_datasets.ipynb)\n\n\n### Key module: `torch.utils.data.DataLoader`\n\nSimilar to the usefulness of `AnnData`, the `Dataset` module creates a base unit for distributing and handling data. We can then take advantage of several torch built-ins to enable not only more organized, but faster data processing.\n\n```python\nfrom torch.utils.data import DataLoader\n\ndataset = TurtleData()\ndata_size = dataset.__len__()\nprint(data_size)\n```\n```\n20_000\n```\n\n### Other essential functions\n\n```python\nfrom torch.utils.data import random_split\n\ntrain_dataset, val_dataset = random_split(dataset, [18_000, 2_000])\n\n# this can then be fed to a DataLoader, as above\ntrain_loader = DataLoader(train_dataset)\nval_loader = DataLoader(val_dataset)\n```\n\n### Useful tutorials and documentation\n\n* **Parent module**: [`torch.utils.data`](https://pytorch.org/docs/stable/data.html)\n* **[Datasets and DataLoaders tutorial](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html)**\n\n[☝️ back to table of contents](#table-of-contents)\n\n## Single-cell data structures meet pytorch: `torch-adata`\n# ![torch-adata-logo](https://github.com/mvinyard/torch-adata/blob/main/docs/imgs/torch-adata.logo.large.svg)\n\n*Create pytorch Datasets from* [`AnnData`](https://anndata.readthedocs.io/en/latest/)\n\n### Installation\n- **Note**: This is already done for you, if you've installed this tutorials associated package\n```\npip install torch-adata\n```\n\n\u003ca href=\"https://github.com/mvinyard/torch-adata/\" \u003e\u003cimg alt=\"torch-adata-concept-overview\" src=\"https://github.com/mvinyard/torch-adata/blob/main/docs/imgs/torch-adata.concept_overview.svg\" width=\"600\" /\u003e\u003c/a\u003e\n\n### Example use of the base class\n\nThe base class, `AnnDataset` is a subclass of the widely-used `torch.utils.data.Dataset`. \n\n```python\nimport anndata as a\nimport torch_adata\n\nadata = a.read_h5ad(\"/path/to/data.h5ad\")\ndataset = torch_adata.AnnDataset(adata)\n```\n\nReturns sampled data `X_batch` as a `torch.Tensor`.\n```python\n# create a dummy index\nidx = np.random.choice(range(dataset.__len__()), 5)\nX_batch = dataset.__getitem__(idx)\n```\n\n#### `TimeResolvedAnnDataset`\n\nSpecialized class for time-resolved datasets. A subclass of the class, `AnnDataset`.\n\n```python\nimport anndata as a\nimport torch_adata as ta\n\nadata = a.read_h5ad(\"/path/to/data.h5ad\")\ndataset = torch_adata.TimeResolvedAnnDataset(adata, time_key=\"Time point\")\n```\n\n[☝️ back to table of contents](#table-of-contents)\n\n\n## Lightning basics and the [`LightningModule`](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html)\n\n\n```python\nfrom pytorch_lightning imoport LightningModule\n\nclass YourSOTAModel(LightningModule):\n    def __init__(self,\n                 net,\n                 optimizer_kwargs={\"lr\":1e-3},\n                 scheduler_kwargs={},\n                ):\n        super().__init__()\n        \n        self.net = net\n        self.optimizer_kwargs = optimizer_kwargs\n        self.scheduler_kwargs = scheduler_kwargs\n        \n        \n    def forward(self, batch):\n        \n        x, y = batch\n        \n        y_hat = self.net(x)\n        loss  = LossFunc(y_hat, y)\n        \n        return y_hat, loss\n        \n    def training_step(self, batch, batch_idx):\n        \n        y_hat, loss = self.forward(batch)\n        \n        return loss.sum()\n    \n    def validation_step(self, batch, batch_idx):\n        \n        y_hat, loss = self.forward(batch)\n        \n        return loss.sum()\n    \n    def test_step(self, batch, batch_idx):\n        \n        y_hat, loss = self.forward(batch)\n        \n        return loss.sum()\n    \n    def configure_optimizers(self):\n        optimizer = torch.optim.Adam(self.parameters(), **self._optim_kwargs)\n        scheduler = torch.optim.lr_scheduler.StepLR(optimizer(), **self._scheduler_kwargs)\n        \n        return [optimizer, ...], [scheduler, ...]\n```\n\n* **Try it for yourself!** [**Lightning Classifier tutorial notebook**](https://colab.research.google.com/github/mvinyard/lightning-tutorial/blob/main/notebooks/tutorial_nb.02.LightningClassifier.ipynb)\n\n\n#### Additional useful documentation and standalone tutorials\n\n* [Lightning in 15 minutes](https://pytorch-lightning.readthedocs.io/en/stable/starter/introduction.html)\n* [Logging metrics at each epoch](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#train-epoch-level-metrics)\n\n[☝️ back to table of contents](#table-of-contents)\n\n\n## [`LightningDataModule`](https://pytorch-lightning.readthedocs.io/en/stable/notebooks/lightning_examples/datamodules.html)\n\n**Purpose**: Make your model independent of a given dataset, while at the same time making your dataset reproducible and perhaps just as important: **easily shareable**.\n\n```python\nfrom pytorch_lightning import LightningDataModule\nfrom torch.data.utils import DataLoader\n\nclass YourDataModule(LightningDataModule):\n    \n    def __init__(self):\n        # define any setup computations\n        \n    def prepare_data(self):        \n        # download data if applicable\n        \n    def setup(self, stage):\n        # assign data to `Dataset`(s)\n        \n    def train_dataloader(self):\n        return DataLoader(self.train_dataset, batch_size=self.batch_size, num_workers=self.num_workers)\n        \n    def val_dataloader(self):\n        return DataLoader(self.val_dataset, batch_size=self.batch_size, num_workers=self.num_workers)\n        \n    def test_dataloader(self):\n        return DataLoader(self.test_dataset, batch_size=self.batch_size, num_workers=self.num_workers)\n        \n```\n\n* **Try it for yourself!** [**LightningDataModule tutorial notebook**](https://colab.research.google.com/github/mvinyard/lightning-tutorial/blob/main/notebooks/tutorial_nb.03.LightningDataModule.ipynb)\n\nWhen it comes to actually using one of these, it looks something like the following:\n\n```python\n# Init the LightningDataModule as well as the LightningModel\ndata = YourDataModule()\nmodel = YourLightningModel()\n\n# Define trainer\ntrainer = Trainer(accelerator=\"auto\", devices=1)\n\n# Ultimately, both  model and data are passed as an arg to trainer.fit\ntrainer.fit(model, data)\n```\n\n* **Try it for yourself!** [**LightningGAN tutorial notebook**](https://colab.research.google.com/github/mvinyard/lightning-tutorial/blob/main/notebooks/tutorial_nb.04.LightningGAN.ipynb)\n\n* [Official `LightningDataModule` documentation](https://pytorch-lightning.readthedocs.io/en/stable/notebooks/lightning_examples/datamodules.html)\n\n\nHere's an example of a `LightningDataModule` implemented in practice, using the LARRY single-cell dataset: [**link**](https://github.com/mvinyard/LARRY-dataset). Initial downloading and formatting occurs only once but takes several minutes so we will leave it outside the scope of this tutorial.\n\n[☝️ back to table of contents](#table-of-contents)\n\n## Questions or suggestions?\n\nI'd love to get in touch. Send me an [**email**](mailto:vinyard@g.harvard.edu) or open an [**issue**](https://github.com/mvinyard/lightning-tutorial/issues/new)!\n\n⚡\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmvinyard%2Flightning-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmvinyard%2Flightning-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmvinyard%2Flightning-tutorial/lists"}