{"id":19178619,"url":"https://github.com/cloud-cv/visual-chatbot","last_synced_at":"2025-07-23T08:33:20.636Z","repository":{"id":20099555,"uuid":"88799771","full_name":"Cloud-CV/visual-chatbot","owner":"Cloud-CV","description":":cloud: :eyes: :speech_balloon: Visual Chatbot","archived":false,"fork":false,"pushed_at":"2022-11-22T03:03:10.000Z","size":1058,"stargazers_count":190,"open_issues_count":13,"forks_count":60,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-04-02T22:06:32.348Z","etag":null,"topics":["channels","deep-learning","django","javascript","machine-learning","python","rabbitmq","redis","web-sockets"],"latest_commit_sha":null,"homepage":"http://visualchatbot.cloudcv.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Cloud-CV.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-04-19T23:24:57.000Z","updated_at":"2025-01-28T13:03:38.000Z","dependencies_parsed_at":"2022-08-29T00:20:55.750Z","dependency_job_id":null,"html_url":"https://github.com/Cloud-CV/visual-chatbot","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cloud-CV%2Fvisual-chatbot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cloud-CV%2Fvisual-chatbot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cloud-CV%2Fvisual-chatbot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cloud-CV%2Fvisual-chatbot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Cloud-CV","download_url":"https://codeload.github.com/Cloud-CV/visual-chatbot/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248131662,"owners_count":21052892,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["channels","deep-learning","django","javascript","machine-learning","python","rabbitmq","redis","web-sockets"],"created_at":"2024-11-09T10:40:15.484Z","updated_at":"2025-04-10T00:19:51.388Z","avatar_url":"https://github.com/Cloud-CV.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\nVisual Chatbot\n============\nDemo for the paper (**Now upgraded to Pytorch, for the Lua-Torch version please see [commit](https://github.com/Cloud-CV/visual-chatbot/tree/f5db5a099cba044a6dee3830fa25a66ef4f1b08b)**). \n\n**[Visual Dialog][1]**  (CVPR 2017 [Spotlight][4]) \u003c/br\u003e\nAbhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra  \nArxiv Link: [arxiv.org/abs/1611.08669][1]  \nLive demo: http://visualchatbot.cloudcv.org\n\n[![Visual Chatbot](chat/static/images/screenshot.png)](http://www.youtube.com/watch?v=SztC8VOWwRQ\u0026t=13s \"Visual Chatbot\")\n\nIntroduction\n---------------\n**Visual Dialog** requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Given an image, dialog history, and a follow-up question about the image, the AI agent has to answer the question. Putting it all together, we demonstrate the first ‘visual chatbot’!\n\nWhat has changed since the last version?\n---------------------------------------------------\nThe model-building code is completely shifted to Pytorch, we have put in a much improved [Bottom Up Top Down][12] captioning model from [Pythia][10] and Mask-RCNN feature extractor from [maskrcnn-benchmark][13]. The Visdial model is borrowed from [visdial-challenge-starter][14] code. \n\nPlease follow the instructions below to get the demo running on your local machine. For the previous version of this repository which supports Torch-Lua based models see [commit](https://github.com/Cloud-CV/visual-chatbot/tree/f5db5a099cba044a6dee3830fa25a66ef4f1b08b). \n\nSetup and Dependencies\n------------------------------\nStart with installing the Build Essentials , [Redis Server][5] and [RabbiMQ Server][6].\n```sh\nsudo apt-get update\n\n# download and install build essentials\nsudo apt-get install -y git python-pip python-dev\nsudo apt-get install -y autoconf automake libtool \nsudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev\nsudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler\n\n# download and install redis-server and rabbitmq-server\nsudo apt-get install -y redis-server rabbitmq-server\nsudo rabbitmq-plugins enable rabbitmq_management\nsudo service rabbitmq-server restart \nsudo service redis-server restart\n```\n\n#### Environment Setup\n\nYou can use Anaconda or Miniconda to setup this code base. Download and install Anaconda or Miniconda distribution based on Python3+ from their [downloads page][17] and proceed below. \n\n\n```sh\n# clone and download submodules\ngit clone https://github.com/Cloud-CV/visual-chatbot.git\ngit submodule update init --recursive\n\n# create and activate new environment\nconda create -n vischat python=3.6.8\nconda activate vischat\n\n# install the requirements of chatbot and visdial-starter code\ncd visual-chatbot/\npip install -r requirements.txt\n```\n\n#### Downloads\nDownload the BUTD, Mask-RCNN and VisDial model checkpoints and their configuration files.\n```sh\nsh viscap/download_models.sh\n```\n\n#### Install Submodules\nInstall Pythia to use BUTD captioning model and maskrcnn-benchmark for feature extraction. \n```sh\n# install fastText (dependency of pythia)\ncd viscap/captioning/fastText\npip install -e .\n\n# install pythia for using butd model\ncd ../pythia/\nsed -i '/torch/d' requirements.txt\npip install -e .\n\n# install maskrcnn-benchmark for feature extraction\ncd ../vqa-maskrcnn-benchmark/\npython setup.py build\npython setup.py develop\ncd ../../../\n```\n#### Cuda Installation\n\nNote: CUDA and cuDNN is only required if you are going to use GPU. Download and install CUDA and cuDNN from [nvidia website][18].  \n\n#### NLTK\nWe use `PunktSentenceTokenizer` from nltk, download it if you haven't already. \n```sh\npython -c \"import nltk; nltk.download('punkt')\"\n```\n\n\n## Let's run this now! \n#### Setup the database\n```\n# create the database\npython manage.py makemigrations chat\npython manage.py migrate\n```\n#### Run server and worker\nLaunch two separate terminals and run worker and server code.   \n```sh\n# run rabbitmq worker on first terminal\n# warning: on the first-run glove file ~ 860 Mb is downloaded, this is a one-time thing\npython worker_viscap.py\n\n# run development server on second terminal\npython manage.py runserver\n```\nYou are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully.\n\n## Issues\nIf you run into incompatibility issues, please take a look [here][7] and [here][8]. \n\n## Model Checkpoint and Features Used\nPerformance on `v1.0 test-std` (trained on `v1.0` train + val):\n\n  Model  |  R@1   |  R@5   |  R@10  | MeanR  |  MRR   |  NDCG  |\n ------- | ------ | ------ | ------ | ------ | ------ | ------ |\n[lf-gen-mask-rcnn-x101-demo][20]  | 0.3930 | 0.5757 | 0.6404 | 18.4950| 0.4863 | 0.5967 |\n\nExtracted features from `VisDial v1.0` used to train the above model are here: \n\n- [features_mask_rcnn_x101_train.h5][21]: Mask-RCNN features with 100 proposals per image train split.\n- [features_mask_rcnn_x101_val.h5][22]: Mask-RCNN features with 100 proposals per image val split.\n- [features_mask_rcnn_x101_test.h5][23]: Mask-RCNN features with 100 proposals per image test split.\n\n*Note*: Above features have key `image_id` (from earlier versions) renamed as `image_ids`.\n\n## Cite this work\n\nIf you find this code useful, consider citing our work:\n\n```\n@inproceedings{visdial,\n  title={{V}isual {D}ialog},\n  author={Abhishek Das and Satwik Kottur and Khushi Gupta and Avi Singh\n    and Deshraj Yadav and Jos\\'e M.F. Moura and Devi Parikh and Dhruv Batra},\n  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},\n  year={2017}\n}\n```\n\n## Contributors\n* [Rishabh Jain][24] (rishabhjain@gatech.edu)\n* [Yash Kant][19] (ysh.kant@gmail.com)\n* [Deshraj Yadav][2] (deshraj@gatech.edu)\n* [Abhishek Das][3] (abhshkdz@gatech.edu)\n\n## License\n\nBSD\n\n## Credits and Acknowledgements\n\n- Visual Chatbot Image: \"[Robot-clip-art-book-covers-feJCV3-clipart](https://commons.wikimedia.org/wiki/File:Robot-clip-art-book-covers-feJCV3-clipart.png)\" by [Wikimedia Commons](https://commons.wikimedia.org) is licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en)\n- The beam-search implementation was borrowed as it is from [AllenNLP](15).\n- The vqa-maskrcnn-benchmark code used was forked from @meetshah1995's [fork](16) of the original repository.\n- The VisDial model is borrowed from [visdial-starter-challenge ][14].\n- The BUTD captioning model comes from this awesome repository [Pythia][10].\n\n[1]: https://arxiv.org/abs/1611.08669\n[2]: http://deshraj.github.io\n[3]: https://abhishekdas.com\n[4]: http://cvpr2017.thecvf.com/\n[5]: https://redis.io/\n[6]: https://www.rabbitmq.com/\n[7]: https://github.com/unbit/uwsgi/issues/1770\n[8]: https://stackoverflow.com/questions/41335478/importerror-no-module-named-asgiref-base-layer\n[9]: https://gitlab.com/yashkant/vqa-maskrcnn-benchmark](https://gitlab.com/yashkant/vqa-maskrcnn-benchmark)\n[10]: https://github.com/facebookresearch/pythia/\n[11]: https://github.com/facebookresearch/fastText/\n[12]: https://arxiv.org/abs/1707.07998\n[13]: https://github.com/facebookresearch/maskrcnn-benchmark\n[14]: https://github.com/batra-mlp-lab/visdial-challenge-starter-pytorch/\n[15]: https://www.github.com/allenai/allennlp\n[16]: https://gitlab.com/meetshah1995/vqa-maskrcnn-benchmark/\n[17]: https://conda.io/docs/user-guide/install/download.html\n[18]: https://developer.nvidia.com/cuda-downloads\n[19]: https://github.com/yashkant\n[20]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/lf_gen_mask_rcnn_x101_train_demo.pth\n[21]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_train.h5\n[22]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_val.h5\n[23]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_test.h5\n[24]: https://rishabhjain2018.github.io/\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloud-cv%2Fvisual-chatbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcloud-cv%2Fvisual-chatbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloud-cv%2Fvisual-chatbot/lists"}