{"id":21693534,"url":"https://github.com/kleinyuan/train-crfasrnn","last_synced_at":"2026-01-04T09:05:20.547Z","repository":{"id":87912045,"uuid":"91818819","full_name":"KleinYuan/train-crfasrnn","owner":"KleinYuan","description":"Detailed guide to help you understand how to train CRF as RNN","archived":false,"fork":false,"pushed_at":"2017-05-26T19:21:22.000Z","size":8,"stargazers_count":10,"open_issues_count":2,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-01-25T13:26:14.378Z","etag":null,"topics":["computer-vision","crf","crf-as-rnn","crfasrnn","deep-learning","deeplab","machine-learning","rnn","segmentation"],"latest_commit_sha":null,"homepage":null,"language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KleinYuan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-19T15:17:35.000Z","updated_at":"2020-07-25T22:19:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"600adce2-0355-46a4-a5bd-5189bbba0501","html_url":"https://github.com/KleinYuan/train-crfasrnn","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KleinYuan%2Ftrain-crfasrnn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KleinYuan%2Ftrain-crfasrnn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KleinYuan%2Ftrain-crfasrnn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KleinYuan%2Ftrain-crfasrnn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KleinYuan","download_url":"https://codeload.github.com/KleinYuan/train-crfasrnn/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244627458,"owners_count":20483804,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","crf","crf-as-rnn","crfasrnn","deep-learning","deeplab","machine-learning","rnn","segmentation"],"created_at":"2024-11-25T18:20:40.016Z","updated_at":"2026-01-04T09:05:20.521Z","avatar_url":"https://github.com/KleinYuan.png","language":"Makefile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Slack\n*Join slack and enter CRF-as-RNN channel to discuss*\n\nhttps://deep-learning-geeks-slack.herokuapp.com/\n\n# Intro and Clarifications\n\nThis repo is just to \n\n- [X] show you how to train [CRF as RNN](https://github.com/torrvision/crfasrnn) with PASCAL VOC datasets (20 classes + background)\n\n- [X] be a well maintained place to communicate with this methods\n\n- [X] try to rewrite CRF as RNN with Caffe2 (join the slack team and let's discuss together)\n\n\n# Step by Step\n\n\n## Pick Up a Machine\n\n1. Single GPU\n\t\n    AWS `p2.xlarge` instance (spot instance ~0.2$/hour, with Tesla K80 GPU, 12G Memory) will be enough for the training purpose.\n\n    Equivalent setup may work.\n\n2. Multipile GPUs\n    \n    You need to make some changes to achieve that and I haven't succeeded doing this for once (tried with 3GPUs with 18G memory intotal but failed) but I will update if I see any.\n\n\n## Prepare Environment\n\nPlease refer to this [repo](https://github.com/KleinYuan/easy-yolo#b-environment-gpu) for all the details and commands. And it may take you around half -\u003e one hour.\n\n- [X] OpenCV (You may do not need OpenCV)\n\n- [X] NVIDIA Driver\n\n- [X] CUDA\n\n- [X] CUDNN \n\n## Build CRF as RNN Caffe\n\n\n1. Get CRF as RNN and navigate to the correct branch\n\n\t```\n\t\u003e$ git clone https://github.com/torrvision/caffe.git\n\t\u003e$ git checkout crfrnn\n\t```\n\n\tYou will have a repo called `caffe` lying on whatever you run the command above.\n\n\n2. Change some source code to optimize memory consumption (!IMPORTANT)\n\n\tIf you just begin to build, you probably will meet this [issue](https://github.com/torrvision/crfasrnn/issues/79), which sucks.\n\n\tTherefore, you need to change some Caffe source code with this [PR](https://github.com/BVLC/caffe/pull/2016).\n\n\tDetails:\n\n\t- [X] Change `caffe/src/caffe/layers/base_conv_layer.cpp`:\n\n\t\t ```\n\t\t line 12: Blob\u003cDtype\u003e BaseConvolutionLayer\u003cDtype\u003e::col_buffer_;\n\t     line 13: template \u003ctypename Dtype\u003e\n\t\t ```\n\n\t- [X] Change `caffe/include/caffe/layers/base_conv_layer.hpp`:\n\n\t\t ```\n\t\t line168: static Blob\u003cDtype\u003e col_buffer_;\n\t\t ```\n\n\tThis can reduce the GPU memory consumption via sharing memory but with a known but ignorable [bug](https://github.com/BVLC/caffe/pull/2016#issuecomment-77509575).\n\n\n3.  Configure Make\n\n\tIn the root folder of `caffe`, there's a file called `Makefile.config.example`.\n\n\tCopy and Paste and then rename it to `Makefile.config` or just run `cp Makefile.config.example Makefile.config`\n\n\t- [X] If you install OpenCV separately, uncomment [`USE_PKG_CONFIG := 1`](https://github.com/torrvision/caffe/blob/crfrnn/Makefile.config.example#L107)\n\n\t- [X] If you want to train with Multiple GPU, you need to uncomment [`USE_NCCL := 1`](https://github.com/torrvision/caffe/blob/crfrnn/Makefile.config.example#L103) and install [NCCL](https://github.com/KleinYuan/train-crfasrnn#trial-on-train-with-multiple-gpu)\n\n\t- [X] If you want to use `OpenBlas` which works more efficiently with multiple CPUs instead of `ATLAS`, you probably wanna change [`BLAS := atlas`](https://github.com/torrvision/caffe/blob/crfrnn/Makefile.config.example#L50) to `BLAS := open`, which I don't think is necessary\n\n\t- [X] Comment out all the `60, 61` [arch options](https://github.com/torrvision/caffe/blob/crfrnn/Makefile.config.example#L42) since the machine you are using is probably not going to support them unless you have a machine can\n\n\n\tThen in root folder of `caffe` just run :\n\n\t```\n\tmake all\n\t```\n\n\tand in case you fucked up, you can run \n\n\t```\n\tmake clean\n\t```\n\tto clean everything and re-make\n\n\n\tThe process of make may take a while, like 10~min.\n\n\n## Prepare DataSets\n\nThe whole idea is that, we need to \n\n- [X] Download PASCAL VOC dataset (which is very large)\n\n- [X] Label them\n\n- [X] Create LMDB for Caffe to easy access\n\nSo, I found this [repo](https://github.com/remz1337/train-CRF-RNN) is doing good job on this step.\n\nTherefore, you need to:\n\n1. Clone this [repo](https://github.com/remz1337/train-CRF-RNN)\n\n2. Prepare\n\n\tFollow the step from [Prepare dataset for training](https://github.com/remz1337/train-CRF-RNN#prepare-dataset-for-training) to [Create LMDB database](https://github.com/remz1337/train-CRF-RNN#create-lmdb-database) and stop here. \n\n\tYou will have trouble on executing last [step](https://github.com/remz1337/train-CRF-RNN#create-lmdb-database) because this script needs a very small [functionality](https://github.com/remz1337/train-CRF-RNN/blob/master/data2lmdb.py#L14) to [dump](https://github.com/remz1337/train-CRF-RNN/blob/master/data2lmdb.py#L156) img into datum.\n\n\tTherefore, you have two options:\n\n\ta. In the root folder of this repo, also clone a Caffe (can be any version) and just build it like above except that you don't do any changes, basically:\n\n\t- [X] Clone Caffe\n\t- [X] run:\n\t\t```\n\t\tcp Makefile.config.example Makefile.config\n\t\tmake all\n\t\tmake pycaffe #!IMPORTANT and we didn't do that above because we didn't need this while you need this here\n\t\t``` \n\n\tb. Go to the caffe we built above and continue build pycaffe and change the file path:\n\n\t- [X] run command: make pycaffe\n\t- [X] add caffe root like [this example](https://github.com/remz1337/train-CRF-RNN/blob/master/crfasrnn.py#L6) after this[line](https://github.com/remz1337/train-CRF-RNN/blob/master/data2lmdb.py#L6) with the actual caffe relative/absolute(recommended) path\n\n\tAfter you've done those two above, you should be able to finish all the labeling step and thereby, in the folder of `train-CRF-RNN`, you will be able to see those folders: \n\t\t\n\t- [X] train_images_20_lmdb\n\t- [X] train_labels_20_lmdb\n\t- [X] test_images_20_lmdb\n\t- [X] test_labels_20_lmdb\n\n\n## Prepare Training Proto buffer files\n\n1. Clone this repo in a different place\n\n\n\t```\n\tgit clone https://github.com/KleinYuan/train-crfasrnn.git\n\n\t```\n\n2. Edit`trainKit/CRFRNN_train.prototxt`\n\n\tReplace `${PATH}` in line7/19/31/41 with actual *absolute* path of those folders I listed above.\n\n3. Edit Make file of this repo\n\n\tReplace `${CAFFE_PATH}` with the root path of the caffe we built above and replace `${TRAIN_CRF_RNN_PATH}` with root path of this repo.\n\n\n## Download Pre-trained Models\n\nIf you check the Makefile, you will see I offer you four choices:\n\n- [X] Train-single-gpu-from-0\n\n- [X] Train-multiple-gpus-from-0\n\n- [X] Train-single-gpu-fine-tuning\n\n- [X] Train-multiple-gpus-fine-tuning\n\nIf you wanna train a model from sratch, you need to download the FCN-8s model and put it in the folder of Makefile by just run:\n\n```\nwget http://dl.caffe.berkeleyvision.org/fcn-8s-pascal.caffemodel\t\n```\n\nIf you wanna train a model based on the pre-trained model, you need to download the `TVG_CRFRNN_COCO_VOC.caffemodel` (be aware of the LICENCE of this model, it's not free for commercial usage):\n\n```\nwget http://goo.gl/j7PrPZ -O TVG_CRFRNN_COCO_VOC.caffemodel\n```\n\n\n## Train!\n\nFinally, you can train the model based on your purpose with the Makefile by running one of following command:\n\n```\nmake Train-single-gpu-from-0\n```\n\nor \n\n```\nmake Train-multiple-gpus-from-0\n```\n\nor\n\n```\nTrain-single-gpu-fine-tuning\n```\n\nor \n\n```\nTrain-multiple-gpus-fine-tuning\n```\n\nAlso, if you wanan do multiple GPUs and have many GPUs, just keep adding `2, 3, 4...` on `0, 1` after the `-gpu` flag.\n\n## Trial on Train with Multiple GPU\n\nSo, for Multiple GPUs training with Caffe, it's very picky for environment.\n\nThose dependencies／changes are necessary to achieve this:\n\n- [X] [NCCL](https://github.com/NVIDIA/nccl), which is subjected to CUDA version and be aware of the branch\n\n- [X] Caffe Makefile uncomment [`USE_NCCL := 1`](https://github.com/torrvision/caffe/blob/crfrnn/Makefile.config.example#L103)\n\n\n## Debug and Validation\n\nPotential problems you will meet are:\n\n1. `Check failed: error == cudaSuccess (2 vs. 0) out of memory`  ----\u003e it means what it says, your memory is not enough\n\n2. `error==cudaSuccess (77 vs. 0) an illegal memory access was encountered` -----\u003e it means that either the shape is not correct or your cuda version is not correct, check [here](https://github.com/BVLC/caffe/issues/4169)\n\n3. `iteration 0` stuck for a long time ----\u003e it's normal, just chill and drink Coffee\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkleinyuan%2Ftrain-crfasrnn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkleinyuan%2Ftrain-crfasrnn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkleinyuan%2Ftrain-crfasrnn/lists"}