{"id":13641492,"url":"https://github.com/DavidLandup0/deepvision","last_synced_at":"2025-04-20T11:31:05.489Z","repository":{"id":65828924,"uuid":"595609147","full_name":"DavidLandup0/deepvision","owner":"DavidLandup0","description":"PyTorch and TensorFlow/Keras image models with automatic weight conversions and equal API/implementations - Vision Transformer (ViT), ResNetV2, EfficientNetV2, NeRF, SegFormer, MixTransformer, (planned...) DeepLabV3+, ConvNeXtV2, YOLO, etc.","archived":false,"fork":false,"pushed_at":"2023-07-01T13:02:32.000Z","size":10172,"stargazers_count":33,"open_issues_count":32,"forks_count":7,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-09T12:32:35.214Z","etag":null,"topics":["cnn-classification","computer-vision","deep-learning","efficientnetv2","evaluation","image-classification","keras","nerf","neural-radiance-fields","pretrained-models","pytorch","pytorch-lightning","resnet","segformer","semantic-segmentation","tensorflow","vision-transformer","visualization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DavidLandup0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-01-31T12:49:51.000Z","updated_at":"2024-08-20T11:09:22.000Z","dependencies_parsed_at":"2023-06-19T18:36:39.780Z","dependency_job_id":"da2f2594-6a37-4de5-8fe0-6d8d3708cffd","html_url":"https://github.com/DavidLandup0/deepvision","commit_stats":{"total_commits":95,"total_committers":2,"mean_commits":47.5,"dds":0.04210526315789476,"last_synced_commit":"f45e928a45d208c8f3026ba598b67459213cdb29"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidLandup0%2Fdeepvision","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidLandup0%2Fdeepvision/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidLandup0%2Fdeepvision/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidLandup0%2Fdeepvision/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DavidLandup0","download_url":"https://codeload.github.com/DavidLandup0/deepvision/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248974662,"owners_count":21192186,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cnn-classification","computer-vision","deep-learning","efficientnetv2","evaluation","image-classification","keras","nerf","neural-radiance-fields","pretrained-models","pytorch","pytorch-lightning","resnet","segformer","semantic-segmentation","tensorflow","vision-transformer","visualization"],"created_at":"2024-08-02T01:01:21.194Z","updated_at":"2025-04-20T11:31:05.449Z","avatar_url":"https://github.com/DavidLandup0.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\r\n\r\n# DeepVision - Unifying Computer Vision\r\n\r\n\u003cp align=\"center\"\u003e\r\n  \u003ca href=\"https://github.com/DavidLandup0/deepvision/tree/main/examples\"\u003eExamples\u003c/a\u003e •\r\n  \u003ca href=\"https://github.com/DavidLandup0/deepvision/blob/main/LICENSE.md\"\u003eLicense\u003c/a\u003e\r\n\u003c/p\u003e\r\n\r\n[![PyPI Version](https://badge.fury.io/py/deepvision-toolkit.svg)](https://badge.fury.io/py/deepvision-toolkit)\r\n[![Downloads](https://static.pepy.tech/badge/deepvision-toolkit/month)](https://pepy.tech/project/deepvision-toolkit)\r\n\r\n______________________________________________________________________\r\n\r\n```console\r\n$ pip install deepvision-toolkit\r\n```\r\n\r\n\u003c/div\u003e\r\n\r\n- ✔️ TensorFlow **and** PyTorch implementations\r\n- ✔️ Pure `tf.keras.Model` and `torch.nn.Module`s, as well as PyTorch Lightning modules ready for training pipelines\r\n- ✔️ Automatic weight conversion between DeepVision models (train and fine-tune `.h5` and `.pt` checkpoints interchangeably in either framework)\r\n- ✔️ Explainability and analysis modules\r\n- ✔️ TensorFlow/PyTorch duality on multiple levels (model-level and component-level are backend agnostic and weights are transferable on model-level and component-level)\r\n- ✔️ Identical, readable implementations, with the **same API**, code structure and style\r\n- ✔️ Layered API with exposed building blocks (`TransformerEncoder`, `MBConv`, etc.)\r\n- ✔️ Image classification, semantic segmentation, NeRFs (object detection, instance/panoptic segmentation, etc. coming soon)\r\n- ✔️ Mixed-precision, TPU and XLA training support\r\n\r\n______________________________________________________________________\r\n\r\n\r\n### Introduction\r\n\r\nDeepVision is a (yet another) computer vision library, aimed at bringing Deep Learning to the hands of the masses. Why another library?\r\n\r\nThe computer vision engineering toolkit is segmented. Amazing libraries exist, but a practicioner oftentimes needs to make decisions on which ones to use based on their compatabilities.\r\n\r\n\u003e DeepVision tries to bridge the compatability issues, allowing you to focus on *what matters* - engineering, and seamlessly switching between ecosystems and backends.\r\n\r\nDeepVision:\r\n\r\n- ❤️ KerasCV and how readable and well-structured it is.\r\n- ❤️ `timm` and how up-to-date it is.\r\n- ❤️ HuggingFace and how diverse it is.\r\n- ❤️ Kornia and how practical it is.\r\n\r\nTo that end, DeepVision takes cues, API and structure inspiration from these libraries. A huge kudos and acknowledgement goes to every contributor in their respective repositories. At the same time, DeepVision provides the *same API* across the board, so you no longer have to switch between APIs and styles.\r\n\r\n\u003e Different teams and projects use different tech stacks, and nobody likes switching from their preferred library for a new project. Furthermore, different libraries implement models in different ways. Whether it's code conventions, code structure or model flavors. When it comes to foundational models like ResNets, some libraries default to flavors such as ResNet 1.5, some default to ResNet-B, etc.\r\n\r\nWith DeepVision, you don't need to switch the library - you just change the backend with a single argument. Additionally, all implementations will strive to be *as equal as possible* between supported backends, providing the same number of parameters, through the same coding style and structure to enhance readability.\r\n\r\n## Basic Usage\r\n\r\nDeepVision is deeply integrated with TensorFlow and PyTorch. You can switch between backends by specifying the backend during initialization:\r\n\r\n```python\r\nimport deepvision\r\n\r\n# TF-Based ViTB16 operating on `tf.Tensor`s\r\ntf_model = deepvision.models.ViTB16(include_top=True,\r\n                                    classes=10,\r\n                                    input_shape=(224, 224, 3),\r\n                                    backend='tensorflow')\r\n                                     \r\n# PyTorch-Based ViTB16 operating on `torch.Tensor`s\r\npt_model = deepvision.models.ViTB16(include_top=True,\r\n                                    classes=10,\r\n                                    input_shape=(3, 224, 224),\r\n                                    backend='pytorch')\r\n```\r\n\r\n**All models will share the same API, regardless of the backend**. With DeepVision, you can rest assured that training performance between PyTorch and TensorFlow models isn't due to the specific implementation.\r\n\r\n### TensorFlow Training Pipeline Example\r\n\r\nAny model returned as a TensorFlow model is a `tf.keras.Model`, making it fit for use out-of-the-box, with a straightforward compatability with `tf.data` and training on `tf.data.Dataset`s:\r\n\r\n```python\r\nimport deepvision\r\nimport tensorflow as tf\r\nimport tensorflow_datasets as tfds\r\n\r\n(train_set, test_set), info = tfds.load(\"imagenette\", \r\n                                           split=[\"train\", \"validation\"],\r\n                                           as_supervised=True, with_info=True)\r\n                                           \r\nn_classes = info.features[\"label\"].num_classes\r\n\r\ndef preprocess_img(img, label):\r\n    img = tf.image.resize(img, (224, 224))\r\n    return img, label\r\n\r\ntrain_set = train_set.map(preprocess_img).batch(32).prefetch(tf.data.AUTOTUNE)\r\ntest_set = test_set.map(preprocess_img).batch(32).prefetch(tf.data.AUTOTUNE)\r\n\r\ntf_model = deepvision.models.ResNet18V2(include_top=True,\r\n                                        classes=n_classes,\r\n                                        input_shape=(224, 224, 3),\r\n                                        backend='tensorflow')\r\n\r\ntf_model.compile(\r\n  loss=tf.keras.losses.SparseCategoricalCrossentropy(),\r\n  optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),\r\n  metrics=['accuracy']\r\n)\r\n\r\nhistory = tf_model.fit(train_set, epochs=1, validation_data=test_set)\r\n```\r\n\r\n### PyTorch Training Pipeline Example\r\n\r\nAny model returned as a PyTorch model is a `pl.LightningModule`, which is a `torch.nn.Module`. You may decide to use it manually, as you'd use any `torch.nn.Module`:\r\n\r\n```python\r\npt_model = deepvision.models.ResNet50V2(include_top=True,\r\n                                        classes=10,\r\n                                        input_shape=(3, 224, 224),\r\n                                        backend='pytorch')\r\n# Optimizer, loss function, etc.\r\nfor epoch in epochs:\r\n    for batch in train_loader:\r\n        optimizer.zero_grad()\r\n        \r\n        inputs, labels = batch\r\n        outputs = model(inputs)\r\n        loss = criterion(outputs, labels)\r\n        loss.backward()\r\n        optimizer.step()\r\n        # ...\r\n```\r\n\r\nOr you may `compile()` a model, and use the PyTorch Lightning `Trainer` given a dataset:\r\n\r\n```python\r\nimport deepvision\r\nimport torch\r\n\r\nfrom torchvision import transforms\r\nfrom torchvision.datasets import CIFAR10\r\nfrom torch.utils.data import DataLoader\r\nimport pytorch_lightning as pl\r\n\r\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\r\n\r\ntransform=transforms.Compose([transforms.ToTensor(),\r\n                              transforms.Resize([224, 224])])\r\n\r\ncifar_train = CIFAR10('cifar10', train=True, download=True, transform=transform)\r\ncifar_test = CIFAR10('cifar10', train=False, download=True, transform=transform)\r\n\r\ntrain_dataloader = DataLoader(cifar_train, batch_size=32)\r\nval_dataloader = DataLoader(cifar_test, batch_size=32)\r\n\r\npt_model = deepvision.models.ResNet18V2(include_top=True,\r\n                                        classes=10,\r\n                                        input_shape=(3, 224, 224),\r\n                                        backend='pytorch')\r\n\r\nloss = torch.nn.CrossEntropyLoss()\r\noptimizer = torch.optim.Adam(pt_model.parameters(), 1e-4)\r\n\r\npt_model.compile(loss=loss, optimizer=optimizer)\r\n\r\ntrainer = pl.Trainer(accelerator=device, max_epochs=1)\r\ntrainer.fit(pt_model, train_dataloader, val_dataloader)\r\n```\r\n\r\n## Automatic PyTorch-TensorFlow Weight Conversion with DeepVision\r\n\r\nAs models between PyTorch and TensorFlow implementations are equal and to encourage cross-framework collaboration - DeepVision provides you with the option of *porting weights* between the frameworks. This means that *Person 1* can train a model with a *TensorFlow pipeline*, and *Person 2* can then take that checkpoint and fine-tune it with a *PyTorch pipeline*, **and vice-versa**.\r\n\r\nWhile still in beta, the feature will come for each model, and currently works for EfficientNets. \r\n\r\n\u003e For end-to-end examples, take a look at the [_\"Automatic Weight Conversion with DeepVision\"_](https://colab.research.google.com/drive/1_nUpqsjg8sOW5eylyedGsGQZjNmHA6GY#scrollTo=fcyT9KNwclfB)\r\n\r\n#### TensorFlow-to-PyTorch Automatic Weight Conversion\r\n\r\n```python\r\ndummy_input_tf = tf.ones([1, 224, 224, 3])\r\ndummy_input_torch = torch.ones(1, 3, 224, 224)\r\n\r\ntf_model = deepvision.models.EfficientNetV2B0(include_top=False,\r\n                                          pooling='avg',\r\n                                          input_shape=(224, 224, 3),\r\n                                          backend='tensorflow')\r\n\r\ntf_model.save('effnet.h5')\r\n\r\nfrom deepvision.models.classification.efficientnet import efficientnet_weight_mapper\r\npt_model = efficientnet_weight_mapper.load_tf_to_pt(filepath='effnet.h5', dummy_input=dummy_input_tf)\r\n\r\nprint(tf_model(dummy_input_tf)['output'].numpy())\r\nprint(pt_model(dummy_input_torch).detach().cpu().numpy())\r\n# True\r\nnp.allclose(tf_model(dummy_input_tf)['output'].numpy(), pt_model(dummy_input_torch).detach().cpu().numpy())\r\n```\r\n\r\n#### PyTorch-to-TensorFlow Automatic Weight Conversion\r\n\r\n```python\r\npt_model = deepvision.models.EfficientNetV2B0(include_top=False,\r\n                                          pooling='avg',\r\n                                          input_shape=(3, 224, 224),\r\n                                          backend='pytorch')\r\ntorch.save(pt_model.state_dict(), 'effnet.pt')\r\n\r\nfrom deepvision.models.classification.efficientnet import efficientnet_weight_mapper\r\n\r\nkwargs = {'include_top': False, 'pooling':'avg', 'input_shape':(3, 224, 224)}\r\ntf_model = efficientnet_weight_mapper.load_pt_to_tf(filepath='effnet.pt',\r\n                                architecture='EfficientNetV2B0',\r\n                                kwargs=kwargs,\r\n                                dummy_input=dummy_input_torch)\r\n\r\n\r\npt_model.eval()\r\nprint(pt_model(dummy_input_torch).detach().cpu().numpy())\r\nprint(tf_model(dummy_input_tf)['output'].numpy())\r\n# True\r\nnp.allclose(tf_model(dummy_input_tf)['output'].numpy(), pt_model(dummy_input_torch).detach().cpu().numpy())\r\n```\r\n\r\n#### Component-Level Weight Conversion\r\n\r\nEach distinct block that offers a public API, such as the commonly used `MBConv` and `FusedMBConv` blocks also offer weight porting between them:\r\n\r\n```python\r\ndummy_input_tf = tf.ones([1, 224, 224, 3])\r\ndummy_input_torch = torch.ones(1, 3, 224, 224)\r\n\r\nlayer = deepvision.layers.FusedMBConv(3, 32, expand_ratio=2, se_ratio=0.25, backend='tensorflow')\r\nlayer(dummy_input_tf);\r\n\r\npt_layer = deepvision.layers.fused_mbconv.tf_to_pt(layer)\r\npt_layer.eval();\r\n\r\nlayer(dummy_input_tf).numpy()[0][0][0]\r\n\"\"\"\r\narray([ 0.07588673, -0.00770299, -0.03178375, -0.06809437, -0.02139765,\r\n        0.06691956,  0.05638139, -0.00669611, -0.01785627,  0.08565219,\r\n       -0.11967321,  0.01648926, -0.01665686, -0.07395031, -0.05677428,\r\n       -0.13836852,  0.10357075,  0.00552578, -0.02682608,  0.10316402,\r\n       -0.05773047,  0.08470275,  0.02989118, -0.11372866,  0.07361417,\r\n        0.04321364, -0.06806802,  0.06685358,  0.10110974,  0.03804607,\r\n        0.04943493, -0.03414273], dtype=float32)\r\n\"\"\"\r\n\r\n# Reshape so the outputs are easily comparable\r\npt_layer(dummy_input_torch).detach().cpu().numpy().transpose(0, 2, 3, 1)[0][0][0]\r\n\"\"\"\r\narray([ 0.07595398, -0.00769612, -0.03179125, -0.06815705, -0.021454  ,\r\n        0.06697321,  0.05642046, -0.00668627, -0.01784784,  0.08573981,\r\n       -0.11977906,  0.01648908, -0.01665735, -0.07405862, -0.05680554,\r\n       -0.13849407,  0.10368796,  0.00552754, -0.02683712,  0.10324436,\r\n       -0.0578215 ,  0.08479469,  0.0299269 , -0.11383523,  0.07365884,\r\n        0.04328319, -0.06810313,  0.06690993,  0.10120884,  0.03805522,\r\n        0.04951007, -0.03417065], dtype=float32)\r\n\"\"\"\r\n```\r\n\r\n## DeepVision as an Evaluation Library\r\n\r\nWe want DeepVision to host a suite of visualization and explainability tools, from activation maps, to learned feature analysis through clustering algorithms:\r\n\r\n- `FeatureAnalyzer` - a class used to analyze the learned features of a model, and evaluate the predictions\r\n- `ActivationMaps` - a class used to plot activation maps for Convolutional Neural Networks, based on the GradCam++ algorithm.\r\n- ...\r\n\r\n### Learned Feature Analysis - PCA and t-SNE with `FeatureAnalyzer`\r\n\r\nAlready trained a model and you want to evaluate it? Whether it's a DeepVision model, or a model from another library, as long as a model is either a `tf.keras.Model` or `torch.nn.Module` that can produce an output vector, be it the fully connected top layers or exposed feature maps - you can explore the learned feature space using DeepVision:\r\n\r\n```python\r\nimport deepvision\r\n\r\ntf_model = deepvision.models.ViTTiny16(include_top=True,\r\n                                       classes=10,\r\n                                       input_shape=(224, 224, 3),\r\n                                       backend='tensorflow')\r\n                                       \r\n# Train...\r\n\r\nfeature_analysis = deepvision.evaluation.FeatureAnalyzer(tf_model,               # DeepVision TF Model\r\n                                                         train_set,              # `tf.data.Dataset` returning (img, label)\r\n                                                         limit_batches=500,      # Limit the number of batches to go over in the dataset\r\n                                                         classnames=class_names, # Optionally supply classnames for plotting\r\n                                                         backend='tensorflow')   # Specify backend\r\n\r\nfeature_analysis.extract_features()\r\nfeature_analysis.feature_analysis(components=2)\r\n```\r\n\r\n![image](https://user-images.githubusercontent.com/60978046/216820223-2a674edb-90ca-4a27-8701-2f9904bad0f6.png)\r\n\r\n**Note:** All TensorFlow-based DeepVision models are *Functional Subclassing* models - i.e. have a *dictionary output*, which contains `1..n` keys, and the standard output contains an `output` key that corresponds to the `tf.Tensor` output value. The `FeatureAnalyzer` accepts any TensorFlow-based model that can produce a `tf.Tensor` output *or* produces a dictionary output with an `'output':tf.Tensor` key-value pair.\r\n\r\nThe `FeatureAnalyzer` class iterates over the supplied dataset, extracting the features (outputs) of the supplied model, when `extract_features()` is called. This expensive operation is called only once, and all subsequent calls, until a new `extract_features()` call, re-use the same features. The `feature_analysis()` method performs _Principal Component Analysis (PCA)_ and _t-distributed Stochastic Neighbor Embeddings (t-SNE)_ on the extracted features, and visualizes them using Matplotlib. The `components` parameter is the `n_components` used for PCA and t-SNE transformations, and naturally has to be in the range of `[2..3]` for 2D and 3D plots respectively.\r\n\r\n```python\r\nimport deepvision\r\n\r\npt_model = deepvision.models.ResNet18V2(include_top=True,\r\n                                        classes=10,\r\n                                        input_shape=(3, 224, 224),\r\n                                        backend='pytorch')\r\n# Train...\r\n                                       \r\nclassnames = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']\r\nfeature_analysis = deepvision.evaluation.FeatureAnalyzer(pt_model,              # DeepVision PT Model\r\n                                                         train_dataloader,      # `torch.utils.Dataloader` returning (img, label)\r\n                                                         limit_batches=500,     # Limit the number of batches to go over in the dataset \r\n                                                         classnames=classnames, # Optionally supply classnames for plotting\r\n                                                         backend='pytorch')    # Specify backend\r\n                                                         \r\nfeature_analysis.extract_features()\r\nfeature_analysis.feature_analysis(components=3, figsize=(20, 20))\r\n```\r\n\r\n![image](https://user-images.githubusercontent.com/60978046/216826476-65911f69-cbc4-4428-97a5-4892f6125978.png)\r\n\r\n\u003e For more, take a look at the [_\"DeepVision Training and Feature Analysis\"_](https://colab.research.google.com/drive/1j8g0Urtko6pbRDKmU02cKnkyANGA0qdH#scrollTo=K5nW6HjgaKwZ) notebook.\r\n\r\n\r\n## DeepVision as a Model Zoo\r\n\r\nWe want DeepVision to host a model zoo across a wide variety of domains:\r\n\r\n- Image Classification and Backbones\r\n- Object Detection\r\n- Semantic, Instance and Panoptic Segmentation\r\n- Object Tracking and MOT\r\n- 3D Reconstruction\r\n- Image Restoration\r\n\r\nCurrently, these models are supported (parameter counts are *equal* between backends):\r\n\r\n- EfficientNetV2 Family\r\n\r\n| Architecture     | Parameters  | FLOPs | Size (MB) |\r\n|------------------|-------------|-------|-----------|\r\n| EfficientNetV2B0 | 7,200,312   |       |           |\r\n| EfficientNetV2B1 | 8,212,124   |       |           |\r\n| EfficientNetV2B2 | 10,178,374  |       |           |\r\n| EfficientNetV2B3 | 14,486,374  |       |           |\r\n| EfficientNetV2S  | 21,612,360  |       |           |\r\n| EfficientNetV2M  | 54,431,388  |       |           |\r\n| EfficientNetV2L  | 119,027,848 |       |           |\r\n\r\n- Vision Transformer (ViT) Family\r\n\r\n| Architecture | Parameters  | FLOPs | Size (MB) |\r\n|--------------|-------------|-------|-----------|\r\n| ViTTiny16    | 5,717,416   |       |           |\r\n| ViTS16       | 22,050,664  |       |           |\r\n| ViTB16       | 86,567,656  |       |           |\r\n| ViTL16       | 304,326,632 |       |           |\r\n| ViTTiny32    | 6,131,560   |       |           |\r\n| ViTS32       | 22,878,952  |       |           |\r\n| ViTB32       | 88,224,232  |       |           |\r\n| ViTL32       | 306,535,400 |       |           |\r\n\r\n- ResNetV2 Family\r\n\r\n| Architecture | Parameters | FLOPs | Size (MB) |\r\n|--------------|------------|-------|-----------|\r\n| ResNet18V2   | 11,696,488 |       |           |\r\n| ResNet34V2   | 21,812,072 |       |           |\r\n| ResNet50V2   | 25,613,800 |       |           |\r\n| ResNet101V2  | 44,675,560 |       |           |\r\n| ResNet152V2  | 60,380,648 |       |           |\r\n\r\n- SegFormer Family\r\n\r\n| Architecture | Parameters | FLOPs | Size (MB) |\r\n|--------------|------------|-------|-----------|\r\n| SegFormerB0  | 3,714,915  |       |           |\r\n| SegFormerB1  | 13,678,019 |       |           |\r\n| SegFormerB2  | 27,348,931 |       |           |\r\n| SegFormerB3  | 47,224,771 |       |           |\r\n| SegFormerB4  | 63,995,331 |       |           |\r\n| SegFormerB5  | 84,595,651 |       |           |\r\n\r\n- Mix-Transformer (MiT) Family:\r\n\r\n| Architecture | Parameters | FLOPs | Size (MB) |\r\n|--------------|------------|-------|-----------|\r\n| MiTB0        | 3,321,962  |       |           |\r\n| MiTB1        | 13,156,554 |       |           |\r\n| MiTB2        | 24,201,418 |       |           |\r\n| MiTB3        | 44,077,258 |       |           |\r\n| MiTB4        | 60,847,818 |       |           |\r\n| MiTB5        | 81,448,138 |       |           |\r\n\r\n#### PyTorch-Only Models\r\n\r\n| Architecture | Parameters  | FLOPs | Size (MB) |\r\n|--------------|-------------|-------|-----------|\r\n| SAM_B        | 93,735,472  |       |           |\r\n| SAM_L        | 312,342,832 |       |           |\r\n| SAM_H        | 641,090,608 |       |           |\r\n\r\n## DeepVision as a Components Provider\r\n\r\nModels and architectures are built on top of each other. VGGNets begat ResNets, which begat a plethora of other architectures, with incremental improvements, small changes and new ideas building on top of already accepted ideas to bring about new advances. To make architectures more approachable, as well as easily buildable, more readable and to make experimentation and building new architectures simpler - we want to expose as many internal building blocks as possible, as part of the general DeepVision API. If an architecture uses a certain block repeatedly, it's likely going to be exposed as part of the public API.\r\n\r\n**Most importantly, all blocks share the same API, and are agnostic to the backend, with an identical implementation.**\r\n\r\nYou can prototype and debug in PyTorch, and then move onto TensorFlow or vice versa to build a model. For instance, a generic `TransformerEncoder` deals with the same arguments, in the same order, and performs the same operation on both backends:\r\n\r\n```python\r\ntensor = torch.rand(1, 197, 1024)\r\ntrans_encoded = deepvision.layers.TransformerEncoder(project_dim=1024,\r\n                                                     mlp_dim=3072,\r\n                                                     num_heads=8,\r\n                                                     backend='pytorch')(tensor)\r\nprint(trans_encoded.shape) # torch.Size([1, 197, 1024])\r\n\r\ntensor = tf.random.normal([1, 197, 1024])\r\ntrans_encoded = deepvision.layers.TransformerEncoder(project_dim=1024,\r\n                                                     mlp_dim=3072,\r\n                                                     num_heads=8,\r\n                                                     backend='tensorflow')(tensor)\r\nprint(trans_encoded.shape) # TensorShape([1, 197, 1024])\r\n```\r\n\r\nSimilarly, you can create something funky with the building blocks! Say, pass an image through an `MBConv` block (MobileNet and EfficientNet style), and through a `PatchingAndEmbedding`/`TransformerEncoder` (ViT style) duo, and add the results together:\r\n\r\n```python\r\ninputs = torch.rand(1, 3, 224, 224)\r\n\r\nx = deepvision.layers.MBConv(input_filters=3, \r\n                             output_filters=32, \r\n                             backend='pytorch')(inputs)\r\n\r\ny = deepvision.layers.PatchingAndEmbedding(project_dim=32,\r\n                                           patch_size=16,\r\n                                           input_shape=(3, 224, 224),\r\n                                           backend='pytorch')(inputs)\r\n\r\ny = deepvision.layers.TransformerEncoder(project_dim=32,\r\n                                         num_heads=8,\r\n                                         mlp_dim = 64,\r\n                                         backend='pytorch')(y)\r\ny = y.mean(1)\r\ny = y.reshape(y.shape[0], y.shape[1], 1, 1)\r\n\r\nadd = x+y\r\n\r\nprint(add.shape) # torch.Size([1, 32, 224, 224])\r\n```\r\n\r\nWould this make sense in an architecture? Maybe. Maybe not. Your imagination is your limit.\r\n\r\n## DeepVision as a Dataset Library\r\n\r\nWe want DeepVision to host a suite of datasets and data loading utilities that can be easily used in production, as well as to host datasets that are suited for use with DeepVision models as well as vanilla PyTorch and vanilla TensorFlow models, in an attempt to lower the barrier to entry for some domains of computer vision:\r\n\r\nFor instance, you can easily load the Tiny NeRF dataset used to train Neural Radiance Fields with DeepVision, as both a `tf.data.Dataset` or `torch.utils.data.Dataset`:\r\n```python\r\nimport deepvision\r\n\r\ntrain_ds, valid_ds = deepvision.datasets.load_tiny_nerf(save_path='tiny_nerf.npz',\r\n                                                        validation_split=0.2,\r\n                                                        backend='tensorflow')\r\n\r\nprint('Train dataset length:', len(train_ds)) # Train dataset length: 84\r\ntrain_ds # \u003cZipDataset element_spec=(TensorSpec(shape=(100, 100, 3), dtype=tf.float32, name=None), \r\n#                                   (TensorSpec(shape=(320000, 99), dtype=tf.float32, name=None), TensorSpec(shape=(100, 100, 32), dtype=tf.float32, name=None)))\u003e\r\n\r\nprint('Valid dataset length:', len(valid_ds)) # Valid dataset length: 22\r\nvalid_ds # \u003cZipDataset element_spec=(TensorSpec(shape=(100, 100, 3), dtype=tf.float32, name=None), \r\n#                                   (TensorSpec(shape=(320000, 99), dtype=tf.float32, name=None), TensorSpec(shape=(100, 100, 32), dtype=tf.float32, name=None)))\u003e\r\n```\r\n\r\n```python\r\nimport torch\r\ntrain_ds, valid_ds = deepvision.datasets.load_tiny_nerf(save_path='tiny_nerf.npz',\r\n                                                        validation_split=0.2,\r\n                                                        backend='pytorch')\r\n\r\ntrain_loader = torch.utils.data.DataLoader(train_ds, batch_size=16, drop_last=True)\r\nvalid_loader = torch.utils.data.DataLoader(valid_ds, batch_size=16, drop_last=True)\r\n\r\nprint('Train dataset length:', len(train_ds)) # Train dataset length: 84\r\ntrain_ds # \u003cdeepvision.datasets.tiny_nerf.tiny_nerf_pt.TinyNerfDataset at 0x25e97f4dfd0\u003e\r\n\r\nprint('Valid dataset length:', len(valid_ds)) # Valid dataset length: 22\r\nvalid_ds # \u003cdeepvision.datasets.tiny_nerf.tiny_nerf_pt.TinyNerfDataset at 0x25e94939080\u003e\r\n```\r\n\r\n\u003e If you'd like to take a look at an example of training NeRFs with PyTorch and TensorFlow, take a look at the [_\"Training Neural Radiance Field (NeRF) Models with DeepVision\"_](https://colab.research.google.com/drive/1RbpsfUj0tTbx6hS1xcPd6XaF7tkdWjeF) notebook.\r\n\r\n\r\n## DeepVision as a Training Library\r\n\r\nWe want DeepVision to host a suite of training frameworks, from classic supervised, to weakly-supervised and unsupervised learning. These frameworks would serve as a high-level API that you can optionally use, while still focusing on non-proprietary classes and architectures _you're used to_, such as pure `tf.keras.Model`s and `torch.nn.Module`s.\r\n\r\n## DeepVision as a Utility Library\r\n\r\nWe want DeepVision to host easy backend-agnostic image operations (resizing, colorspace conversion, etc) and data augmentation layers, losses and metrics.\r\n\r\n## Citing DeepVision\r\n\r\nIf DeepVision plays a part of your research, we'd really appreciate a citation!\r\n\r\n```\r\n@misc{landup2023deepvision,\r\n  title={DeepVision},\r\n  author={David Landup},\r\n  year={2023},\r\n  howpublished={\\url{https://github.com/DavidLandup0/deepvision/}},\r\n}\r\n```\r\n","funding_links":[],"categories":["Other Versions of YOLO"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDavidLandup0%2Fdeepvision","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDavidLandup0%2Fdeepvision","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDavidLandup0%2Fdeepvision/lists"}