{"id":19178973,"url":"https://github.com/drsleep/tensorflow-deeplab-lfov","last_synced_at":"2025-05-10T02:33:09.979Z","repository":{"id":79073781,"uuid":"75184486","full_name":"DrSleep/tensorflow-deeplab-lfov","owner":"DrSleep","description":"DeepLab-LargeFOV implemented in tensorflow","archived":true,"fork":false,"pushed_at":"2021-10-11T04:54:28.000Z","size":766,"stargazers_count":220,"open_issues_count":15,"forks_count":80,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-03-22T01:26:46.438Z","etag":null,"topics":["deeplab-largefov","deeplab-tensorflow","pascal-voc","semantic-segmentation","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DrSleep.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-11-30T12:23:15.000Z","updated_at":"2024-09-16T08:51:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"cde37d76-61a1-4e04-ba22-cd6ef43a273f","html_url":"https://github.com/DrSleep/tensorflow-deeplab-lfov","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrSleep%2Ftensorflow-deeplab-lfov","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrSleep%2Ftensorflow-deeplab-lfov/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrSleep%2Ftensorflow-deeplab-lfov/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrSleep%2Ftensorflow-deeplab-lfov/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DrSleep","download_url":"https://codeload.github.com/DrSleep/tensorflow-deeplab-lfov/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253354489,"owners_count":21895436,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deeplab-largefov","deeplab-tensorflow","pascal-voc","semantic-segmentation","tensorflow"],"created_at":"2024-11-09T10:41:38.612Z","updated_at":"2025-05-10T02:33:09.138Z","avatar_url":"https://github.com/DrSleep.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DeepLab-TensorFlow\n\nThis is an implementation of [DeepLab-LargeFOV](http://ccvl.stat.ucla.edu/deeplab-models/deeplab-largefov/) in TensorFlow for semantic image segmentation on [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/).\n\n## Model Description\n\nThe DeepLab-LargeFOV is built on a fully convolutional variant of the [VGG-16 net](http://www.robots.ox.ac.uk/~vgg/research/very_deep/) with several modifications: first, it exploits [atrous (dilated) convolutions](https://github.com/fyu/dilation) to increase the field-of-view; second, the number of filters in the last layers is reduced from \u003ccode\u003e4096\u003c/code\u003e to \u003ccode\u003e1024\u003c/code\u003e in order to decrease the memory consumption and the time spent on performing one forward-backward pass; third, it omits the last pooling layers to keep the downsampling ratio of \u003ccode\u003e8\u003c/code\u003e.\n\nThe model is trained on a mini-batch of images and corresponding ground truth masks with the softmax classifier on the top. During training, the masks are downsampled to match the size of the output from the network; during inference, to acquire the output of the same size as the input, bilinear upsampling is applied. The final segmentation mask is acquired using argmax over unnormalised log scores from the network.\nOptionally, a fully-connected probabilistic graphical model, namely, CRF, can be applied to refine the final predictions.\nOn the test set of PASCAL VOC, the model shows \u003ccode\u003e70.3%\u003c/code\u003e of mean intersection-over-union.\n\nFor more details on the underlying model please refer to the following paper:\n\n\n    @article{CP2016Deeplab,\n      title={DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs},\n      author={Liang-Chieh Chen and George Papandreou and Iasonas Kokkinos and Kevin Murphy and Alan L Yuille},\n      journal={arXiv:1606.00915},\n      year={2016}\n    }\n\n\n\n## Requirements\n\nTensorFlow needs to be installed before running the scripts.\nTensorFlow\u003e=0.11 is supported.\n\nTo install the required python packages (except TensorFlow), run\n```bash\npip install -r requirements.txt\n```\nor for a local installation\n```bash\npip install -user -r requirements.txt\n```\n\n## Caffe to TensorFlow conversion\n\nTo imitate the structure of the model, we have used `.caffemodel` files provided by the [authors](http://ccvl.stat.ucla.edu/deeplab-models/deeplab-largefov/). The `.util/extract_params.py` script saves the structure of the network, i.e. the name of the parameters with their corresponding shapes (in TF 'HNWC' format), as well as the weights of those parameters (again, in the TF format). These weights can be used to initialise the variables in the model; otherwise, the filters will be initialised using the Xavier initialisation scheme, and biases will be initiliased as 0s. \nTo use this script you will need to install [Caffe](https://github.com/bvlc/caffe). It is optional, and you can download two already converted models (`model.ckpt-init` and `model.ckpt-pretrained`) [here](https://drive.google.com/drive/folders/0B_rootXHuswsTF90M1NWQmFYelU?resourcekey=0-b0RbHoejk01RS2dVS1UbPg).\n\n## Dataset\n\nTo train the network, we use the augmented PASCAL VOC 2012 dataset with \u003ccode\u003e10582\u003c/code\u003e images for training and \u003ccode\u003e1449\u003c/code\u003e images for validation. \n\n## Training\n\nWe initialised the network from the `.caffemodel` file provided by the authors. In that model, the last classification layer is randomly initialised using the Xavier scheme with biases set to zeros. The loss function is the pixel-wise softmax loss, and it is optimised using Adam. No weight decay is used. \n\nThe `train.py` script provides an ability to monitor model performance by snapshotting current results:\n\u003cimg src=\"images/train.png\"\u003e\u003c/img\u003e\nBesides that, one can change the input size and augment data with random scaling.\n\nTo see the documentation on each of the training settings run the following:\n```bash\npython train.py --help\n```\n## Evaluation\n\nAfter the training, the model shows \u003ccode\u003e57%\u003c/code\u003e mIoU on the Pascal VOC 2012 validation dataset. The model initialised from the pre-trained `.caffemodel` shows \u003ccode\u003e67%\u003c/code\u003e mIoU on the same dataset. Note that in the original DeepLab each image is padded so that the input is of size \u003ccode\u003e513x513\u003c/code\u003e and CRF is used, which can be one of the reason of the lower score (\u003ccode\u003e~70.3%\u003c/code\u003e mIoU).\n\nTo see the documentation on each of the evaluation settings run the following:\n```bash\npython evaluate.py --help\n```\n\n## Inference\n\nTo perform inference over your own images, use the following command:\n```bash\npython inference.py /path/to/your/image /path/to/ckpt/file\n```\nThis will run the forward pass and save the resulted mask with this colour map:\n\u003cimg src=\"images/colour_scheme.png\" height=\"75\"\u003e\u003c/img\u003e\n\u003cimg src=\"images/mask.png\"\u003e\u003c/img\u003e\n\n## Missing features\n\nAt the moment, the post-processing step with CRF is not implemented. Besides that, the weight decay is missing, as well.\n\n## Other implementations\n* [DeepLab-ResNet in TensorFlow](https://github.com/DrSleep/tensorflow-deeplab-resnet)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrsleep%2Ftensorflow-deeplab-lfov","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdrsleep%2Ftensorflow-deeplab-lfov","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrsleep%2Ftensorflow-deeplab-lfov/lists"}