{"id":13737641,"url":"https://github.com/victorca25/augmennt","last_synced_at":"2025-05-08T15:30:41.610Z","repository":{"id":80038347,"uuid":"257937810","full_name":"victorca25/augmennt","owner":"victorca25","description":"Augmentations for Neural Networks. Implementation of Torchvision's transforms using OpenCV and additional augmentations for super-resolution,  restoration and image to image translation.","archived":false,"fork":false,"pushed_at":"2021-08-14T09:31:59.000Z","size":2096,"stargazers_count":25,"open_issues_count":0,"forks_count":7,"subscribers_count":0,"default_branch":"master","last_synced_at":"2024-08-04T03:10:06.852Z","etag":null,"topics":["anisotropic","bsrgan","computer-vision","data-augmentation","deblur","denoise","image-processing","opencv","pytorch","real-esrgan","super-resolution","superpixels","unprocessing-images"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/victorca25.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-04-22T15:12:24.000Z","updated_at":"2024-07-29T16:58:46.000Z","dependencies_parsed_at":null,"dependency_job_id":"868f9077-6cfe-4b00-8502-43eae7a45963","html_url":"https://github.com/victorca25/augmennt","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victorca25%2Faugmennt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victorca25%2Faugmennt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victorca25%2Faugmennt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victorca25%2Faugmennt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/victorca25","download_url":"https://codeload.github.com/victorca25/augmennt/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224742098,"owners_count":17362228,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anisotropic","bsrgan","computer-vision","data-augmentation","deblur","denoise","image-processing","opencv","pytorch","real-esrgan","super-resolution","superpixels","unprocessing-images"],"created_at":"2024-08-03T03:01:55.904Z","updated_at":"2024-11-15T06:30:42.024Z","avatar_url":"https://github.com/victorca25.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# augmeNNt\n\nThis repository is intended first as a faster drop-in replacement of Pytorch's Torchvision default augmentations in the \"transforms\" [package](https://github.com/pytorch/vision/tree/master/torchvision/transforms), based on NumPy and OpenCV (PIL-free) for computer vision pipelines. Additionally, many useful functions and augmentations for image to image translation, super-resolution and restoration (deblur, denoise, etc) are also available.\n\n## Supported Augmentations\n\nMost functions from the original Torchvision transforms are reimplemented, with some considerations:\n1.  ToPILImage is not implemented or needed, we use OpenCV instead (ToCVImage). However, the original ToPILImage in ~transforms can be used to save the tensor as a PIL image if required. Once transformed into tensor format, images have RGB channel order in both cases.\n2.  OpenCV images are Numpy arrays. OpenCV supports uint8, int8, uint16, int16, int32, float32, float64. Certain operations (like `cv.CvtColor()`) do require to convert the arrays to OpenCV type (with `cv.fromarray()`).\n3.  The affine transform in the original one only has 5 degrees of freedom, YU-Zhiyang implemented an Affine transform with 6 degress of freedom called `RandomAffine6` (can be found in [transforms.py](augmennt/transforms.py)). The original method `RandomAffine` is also available and reimplemented with OpenCV.\n4.  The rotate function is clockwise, however the original one is anticlockwise.\n5.  Some new augmentations have been added, in comparison to Torchvision's, refer to the list below.\n6.  **The outputs of the OpenCV versions are almost the same as the original one's (it's possible to test by running [test.py](/test.py)) directly with test images**.\n\nThese are the basic transforms, equivalent to torchvision's:\n\n-   `Compose`, `ToTensor`, `ToCVImage`, `Normalize`,\n-   `Resize`, `CenterCrop`, `Pad`,\n-   `Lambda` (see [note](#attention)),\n-   `RandomApply`, `RandomOrder`, `RandomChoice`, `RandomCrop`,\n-   `RandomHorizontalFlip`, `RandomVerticalFlip`, `RandomResizedCrop`,\n-   `FiveCrop`, `TenCrop`, `LinearTransformation`, `ColorJitter`,\n-   `RandomRotation`, `RandomAffine`,\n-   `Grayscale`, `RandomGrayscale`, `RandomErasing`,\n\nThe additional transforms can be used to train models such as [Noise2Noise](https://arxiv.org/pdf/1803.04189.pdf), [BSRGAN](https://arxiv.org/pdf/2103.14006v1.pdf), [Real-ESRGAN](https://arxiv.org/pdf/2107.10833.pdf), [White-box Cartoonization](https://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_Learning_to_Cartoonize_Using_White-Box_Cartoon_Representations_CVPR_2020_paper.pdf) and [EdgeConnect](https://openaccess.thecvf.com/content_ICCVW_2019/papers/AIM/Nazeri_EdgeConnect_Structure_Guided_Image_Inpainting_using_Edge_Prediction_ICCVW_2019_paper.pdf), among others. There are some general augmentations:\n-   `RandomAffine6`, `Cutout`, `RandomPerspective`,\n\nNoise augmentations, with options for artificial noises and realistic noise generation:\n-   `RandomGaussianNoise`, `RandomPoissonNoise`, `RandomSPNoise`,\n-   `RandomSpeckleNoise`, `RandomCompression`,\n-   `BayerDitherNoise`, `FSDitherNoise`, `AverageBWDitherNoise`,`BayerBWDitherNoise`,\n-   `BinBWDitherNoise`, `FSBWDitherNoise`, `RandomBWDitherNoise`,\n-   `RandomCameraNoise`, `RandomChromaticAberration`\n\nBlurs and different kind of kernels generation and use, with standard blurs, isotropic and anisotropic Gaussian filters and simple and complex motion blur kernels:\n-   `RandomAverageBlur`, `RandomBilateralBlur`, `RandomBoxBlur`,\n-   `RandomGaussianBlur`, `RandomMedianBlur`,\n-   `RandomMotionBlur`, `RandomComplexMotionBlur`,\n-   `RandomAnIsoBlur`, `AlignedDownsample`, `ApplyKernel`,\n-   `RandomSincBlur`\n\nFilters to modify the images, including color quantization, superpixel segmentation and CLAHE:\n-   `FilterMaxRGB`, `FilterColorBalance`, `FilterUnsharp`,\n-   `SimpleQuantize`, `RandomQuantize`, `RandomQuantizeSOM`,\n-   `CLAHE`, `RandomGamma`, `Superpixels`\n\nEdge filters:\n-   `FilterCanny`,\n\n## Requirements\n\n-   python \u003e= 3.5.2\n-   numpy \u003e= 1.10 ('@' operator may not be overloaded before this version)\n-   pytorch \u003e= 0.4.1\n-   A working installation of OpenCV. **Tested with OpenCV version 3.4.2, 4.1.0**\n-   Tested on Windows 10 and Ubuntu 18.04.\n\n## Optional requirements\n\n-   torchvision \u003e= 0.2.1\n\nIn order to use the additional Superpixel options (skimage SLIC and Felzenszwalb algorithms), segments reduction algorithms (selective search and RAG merging), the Menon demosaicing algorithm and the sinc filter, there are additional requirements:\n-   scikit-image \u003e= 0.17.2\n-   scipy \u003e= 1.6.2\n\n## Usage\n\n1.  git clone \u003chttps://github.com/victorca25/augmennt.git\u003e .\n2.  Add `augmennt` to your python path.\n3.  Add `from augmennt import augmennt as transforms` in your python file.\n4.  From here, almost everything should work exactly as the original `transforms`.\n\n### Example: Image resizing\n\n```python\nimport numpy as np\nfrom augmennt import augmennt as transforms\nimage = np.random.randint(low=0, high=255, size=(1024, 2048, 3))\nresize = transforms.Resize(size=(256,256))\nimage = resize(image)\n```\n\nShould be 1.5 to 10 times faster than PIL. See benchmarks\n\n### Example: Composing transformations\n\n```py\ntransform = transforms.Compose([\n   transforms.RandomAffine(degrees=10, translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 0)),\n   transforms.Resize(size=(350, 350), interpolation=\"BILINEAR\"),\n   transforms.ToTensor(),\n   transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),\n])\n```\n\nMore examples can be found in the  official Pytorch [tutorials](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html).\n\n## Attention\n\nThe multiprocessing used in Pytorch's dataloader may have issues with lambda functions (using `Lambda` in [transforms.py](torchvision/transforms/transforms.py)) in Windows, as lambda functions can't be pickled (\u003chttps://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled\u003e). This issue also happens with Torchvision's `Lambda` function.\n\nThese issues happen when using, `num_workers \u003e 0` in a Pytorch `DataLoader` class when the transformations are initialized in the class init. The issue can be prevented either by using proper functions (not lambda) when composing the transformations or by initializing it in the `DataLoader` call instead.\n\n## Performance\n\nThe following are the performance tests as executed by jbohnslav. \n\n-   Most transformations are between 1.5X and ~4X faster in OpenCV. Large image resizes are up to 10 times faster in OpenCV.\n-   To reproduce the following benchmarks, download the [Cityscapes dataset](https://www.cityscapes-dataset.com/). \n-   An example benchmarking file that jbohnslav used can be found in the notebook **benchmarking_v2.ipynb** where the Cityscapes default directories are wrapped with a HDF5 file for even faster reading (Note: this file has not been updated or tested for a very long time, but can serve as a reference).\n\n![resize](benchmarks/benchmarking_Resize.png)\n![random crop](benchmarks/benchmarking_Random_crop_quarter_size.png)\n![change brightness](benchmarks/benchmarking_Color_brightness_only.png)\n![change brightness and contrast](benchmarks/benchmarking_Color_constrast_and_brightness.png)\n![change contrast only](benchmarks/benchmarking_Color_contrast_only.png)\n![random horizontal flips](benchmarks/benchmarking_Random_horizontal_flip.png)\n\nThe changes start to add up when you compose multiple transformations together.\n![composed transformations](benchmarks/benchmarking_Resize_flip_brightness_contrast_rotate.png)\n\nCompared to regular Pillow, cv2 is around three times faster than PIL, as shown in this [article](https://www.kaggle.com/vfdev5/pil-vs-opencv).\n\nAdditionally, the [Albumentations project](https://github.com/albumentations-team/albumentations), mostly based on Numpy and OpenCV also has shown better performance than other options, including torchvision with a fast Pillow-SIMD backend.\n\nBut it can also be the case that Pillow-SIMD can be faster in some cases, as tested in this [article](https://python-pillow.org/pillow-perf/)\n\n## Alternatives\n\nThere are multiple image augmentation and manipulation frameworks available, each with its own strengths and limitations. Some of these alternatives are:\n\n-   [Torchvision](https://github.com/pytorch/vision): Based on [Pillow (default)](https://python-pillow.org/), [Pillow-SIMD](https://github.com/uploadcare/pillow-simd), [accimage](https://github.com/pytorch/accimage), [libpng](http://www.libpng.org/pub/png/libpng.html), [libjpeg](http://ijg.org/) or [libjpeg-turbo](https://libjpeg-turbo.org/)\n-   [Kornia](https://github.com/kornia/kornia): Inspired by OpenCV, for differentiable tensor image functions\n-   [Albumentations](https://github.com/albumentations-team/albumentations): Based on pure NumPy, [OpenCV](https://github.com/opencv/opencv) and [imgaug](https://github.com/aleju/imgaug), with a large variety of transformations\n-   [Rising](https://github.com/PhoenixDL/rising): For differentiable 2D and 3D image functions\n-   [TorchIO](https://github.com/fepegar/torchio): For 3D medical imaging\n\n## Postscript\n-   This repository originally merged [jbohnslav](https://github.com/jbohnslav/opencv_transforms) and [YU-Zhiyang](https://github.com/YU-Zhiyang/opencv_transforms_torchvision) repositories (which had the same purpose), and my own OpenCV-based augmentations from [BasicSR](https://github.com/victorca25/BasicSR), in order to allow to refactor the project's data flow and streamline to use the Torchvision's API as a standard. This enables changing or combining different base frameworks (OpenCV, Pillow/Pillow-SIMD, etc) to add more augmentations only by modifying the imported library and also to easily switch to other replacements like [Kornia](https://github.com/kornia/kornia), [Albumentations](https://github.com/albumentations-team/albumentations), or [Rising](https://github.com/PhoenixDL/rising), based on the user's needs. An example with a backend change is DinJerr's [fork](https://github.com/DinJerr/BasicSR), using [wand](https://github.com/emcconville/wand)+[ImageMagick](https://imagemagick.org/) for augmentations.\n-   Each backend has it's pros and cons, but important points to consider when choosing are: available augmentation types, performance, external dependencies, features (for example, Kornia's differentiable augmentations) and user preference (all previous points being equal).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvictorca25%2Faugmennt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvictorca25%2Faugmennt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvictorca25%2Faugmennt/lists"}