https://github.com/paulcccccch/multimodal-categorization-of-crisis-events-in-social-media
An unofficial implementation of the CVPR 2020 paper Multimodal Categorization of Crisis Events in Social Media
https://github.com/paulcccccch/multimodal-categorization-of-crisis-events-in-social-media
crisismmd cvpr cvpr2020 deep-learning multimodal
Last synced: 10 days ago
JSON representation
An unofficial implementation of the CVPR 2020 paper Multimodal Categorization of Crisis Events in Social Media
- Host: GitHub
- URL: https://github.com/paulcccccch/multimodal-categorization-of-crisis-events-in-social-media
- Owner: PaulCCCCCCH
- Created: 2021-10-26T17:14:19.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-12-08T22:19:15.000Z (over 3 years ago)
- Last Synced: 2025-04-13T13:07:45.123Z (10 days ago)
- Topics: crisismmd, cvpr, cvpr2020, deep-learning, multimodal
- Language: Python
- Homepage:
- Size: 72.3 KB
- Stars: 16
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CVPR 2020: Multimodal Categorization of Crisis Events in Social Media
This is an unofficial implementation for the CVPR 2020 paper [*Multimodal Categorization of Crisis Events in Social Media*](https://openaccess.thecvf.com/content_CVPR_2020/papers/Abavisani_Multimodal_Categorization_of_Crisis_Events_in_Social_Media_CVPR_2020_paper.pdf).
> Abavisani, Mahdi, et al. "Multimodal categorization of crisis events in social media." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
To cite the paper:
```
@inproceedings{abavisani2020multimodal,
title={Multimodal categorization of crisis events in social media},
author={Abavisani, Mahdi and Wu, Liwei and Hu, Shengli and Tetreault, Joel and Jaimes, Alejandro},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14679--14689},
year={2020}
}
```## Note
This implementation follows the original paper whenever possible. Due to our urgent need for experiment results, we haven't had time to make it super configurable with clean handlers.## To Run
- Initialize by running `bash setup.sh`
- Run the pipeline with `python main.py`## Stats
We applied mixed-precision training, so it runs fast on GPUs with tensorcores (e.g. V100). The default configuration consumes about 13GB of GPU memory, and each epoch takes 3 minites on an Amazon `g4dn-xlarge` instance (with V100 GPU).**Warning: Model is saved for each epoch, which means it consumes 400MB of disk every 3 minutes. Take this into consideration.**
## Confusions
### Equation 4
The authors stated that $$\alpha_{v_i}$$ was completely dependent on $$e_i$$, and $$\alpha_{e_i}$$ was completely dependent on $$\alpha_{v_i}$$, while the equations meant the opposite. The implementation will stick to the text instead of the equations.### Self-Attention in Fully Connected Layers
After obtaining a multimodal representation that incorporates both visual and textual information, the authors used fully-connected layers to perform classification. Here the authors wrote> We add self-attention in the fully-connected networks.
We assumed that they meant 'we added a fully-connected layer as self-attention'.
### DenseNet
The authors did not give the size of the DenseNet they used.## Todos
- Setting `num_workers > 1` deadlocks the dataloader.