{"id":15651568,"url":"https://github.com/graykode/vision-tutorial","last_synced_at":"2025-07-31T16:06:58.419Z","repository":{"id":110173300,"uuid":"171792276","full_name":"graykode/vision-tutorial","owner":"graykode","description":"Computer Vision Tutorial for Deep Learning Researchers","archived":false,"fork":false,"pushed_at":"2019-08-14T03:31:04.000Z","size":989,"stargazers_count":33,"open_issues_count":0,"forks_count":9,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-30T20:03:36.888Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/graykode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-02-21T03:18:43.000Z","updated_at":"2023-06-09T16:52:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"c90b0815-08bd-437b-96db-31b2e129b19e","html_url":"https://github.com/graykode/vision-tutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fvision-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fvision-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fvision-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fvision-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/graykode","download_url":"https://codeload.github.com/graykode/vision-tutorial/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251774895,"owners_count":21641731,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-03T12:39:08.170Z","updated_at":"2025-04-30T20:05:13.354Z","avatar_url":"https://github.com/graykode.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## vision-tutorial\n\n\u003cp align=\"center\"\u003e\u003cimg width=\"100\" src=\"https://media-thumbs.golden.com/OLqzmrmwAzY1P7Sl29k2T9WjJdM=/200x200/smart/golden-storage-production.s3.amazonaws.com/topic_images/e08914afa10a4179893eeb07cb5e4713.png\" /\u003e\u003cimg width=\"100\" src=\"https://keras.io/img/keras-logo-small-wb.png\" /\u003e\u003c/p\u003e\n\n`vision-tutorial` is a tutorial for who is studying `Computer Vision Basic Architectures` using **Pytorch** and **Keras**. Most of the models about Vision were implemented with less than **100 lines** of code(except comments or blank lines). The list of these papers is a list that Professor [Sung Kim](https://github.com/hunkim) recommended.\n\n- Data was used as overfitting to show simple model learning. [One image about Cat or Dog](https://github.com/graykode/vision-tutorial/tree/master/data)\n\n- The accuracy of the model is not important in this project because it is affected by data. I recommend that you **focus on the structure of the model, the number of parameters, the learning process and paper detailed implementation. **\n\n  \n\n## SOTA Basic Vision Models - Introduction\n\n- How to handle image in Pytorch and Keras\n\n  - Image Resizing, Cropping\n\n- Introduction CNN(Convolutional Neural Networks) in Pytorch and Keras\n\n  - How does number of channels, filter size (=kernel), grid, and padding affect Convolution?\n\n  - Paper : [Object Recognition with Gradient-Based Learning](http://yann.lecun.com/exdb/publis/pdf/lecun-99.pdf)\n\n- AlexNet(2012.09)\n\n  - Paper : [ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)\n\n  - Model\n\n    ![](2.AlexNet/model.jpg)\n\n- ZFNet(2013.11)\n\n  - Paper : [Visualizing and Understanding Convolutional Networks](https://arxiv.org/abs/1311.2901)\n\n- VGG16(2014.09)\n\n  - Paper : [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556)\n\n- Inception.v1(a.k.a GoogLeNet)(2014.09)\n\n  - Paper : [Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842)\n\n- Inception.v2, v3(2015.12)\n\n  - Paper : [Rethinking the Inception Architecture for Computer Vision](https://arxiv.org/abs/1512.00567)\n\n- ResNet(2015.12)\n\n  - Paper : [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)\n  - Model\n    ![](7.ResNet/model.jpeg)\n\n- Inception.v4(2016.02)\n\n  - Paper : [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/abs/1602.07261)\n\n- DenseNet(2016.08)\n\n  - Paper : [Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993)\n  - Model\n    ![](9.DenseNet/model.jpg)\n\n- Xception(2016.10)\n\n  - Paper : [Xception: Deep Learning with Depthwise Separable Convolutions](https://arxiv.org/abs/1610.02357)\n\n- MobileNet(2017.04)\n\n  - Paper : [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)\n\n- SENet(2017.09)\n\n  - Paper : [Squeeze-and-Excitation Networks](https://arxiv.org/abs/1709.01507)\n\n\n\n## To be Continue Implementation in Other Repository\n\n#### v Semantic Segmentation\n\n- FCN(2014.11) : [Fully Convolutional Networks for Semantic Segmentation](https://arxiv.org/abs/1411.4038)\n- U-Net(2015.05) : [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597)](https://arxiv.org/abs/1606.00915)\n- SegNet(2015.11) : [SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation](https://arxiv.org/abs/1511.00561)\n- DeepLab(2016.06) : [DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs](https://arxiv.org/abs/1606.00915)\n- ENet(2016.07) : [ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation](https://arxiv.org/abs/1606.02147)\n- PSPNet(2016.12) : [Pyramid Scene Parsing Network](https://arxiv.org/abs/1612.01105)\n- ICNet(2017.04) : [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)\n\n\n\n#### v Generative adversarial networks\n\n- GAN(2014.06) : [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661)\n- DCGAN(2015.11) : [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434)\n- Pix2Pix(2016.11) : [Image-to-Image Translation with Conditional Adversarial Networks](https://arxiv.org/abs/1611.07004)\n- WGAN(2017.01) : [Wasserstein GAN](https://arxiv.org/abs/1701.07875)\n- CycleGAN(2017.05) : [Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593)\n\n\n\n#### v Object Detection\n\n- RCNN(2013.11) : [Rich feature hierarchies for accurate object detection and semantic segmentation](https://arxiv.org/abs/1311.2524)\n- Fast-RCNN(2015.04) : [Fast R-CNN](https://arxiv.org/abs/1504.08083)\n- Faster-RCNN(2015.06) : [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497)\n- YOLO(2015.06) : [You Only Look Once: Unified, Real-Time Object Detection](https://arxiv.org/abs/1506.02640)\n- SSD(2015.12) : [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325)\n- YOLO9000(2016.12) : [YOLO9000: Better, Faster, Stronger](https://arxiv.org/abs/1612.08242)\n- Mask R-CNN(2017.05) : [Mask R-CNN](https://arxiv.org/abs/1703.06870)\n- RetinaNet(2017.08):  [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002)\n\n\n\n## Author\n\n- Tae Hwan Jung(Jeff Jung) @graykode\n- Author Email : [nlkey2022@gmail.com](mailto:nlkey2022@gmail.com)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraykode%2Fvision-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgraykode%2Fvision-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraykode%2Fvision-tutorial/lists"}