https://github.com/khuyentran1401/real-or-not
Kaggle competition to predict which Tweets are about real disasters and which ones are not
https://github.com/khuyentran1401/real-or-not
glove natural-language-processing neuralnetwork nlp pytorch pytorch-nlp tf-idf twitter word2vec wordembeddings
Last synced: 7 months ago
JSON representation
Kaggle competition to predict which Tweets are about real disasters and which ones are not
- Host: GitHub
- URL: https://github.com/khuyentran1401/real-or-not
- Owner: khuyentran1401
- Created: 2020-04-05T13:47:20.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-04-05T15:19:47.000Z (over 5 years ago)
- Last Synced: 2025-01-26T01:15:21.289Z (9 months ago)
- Topics: glove, natural-language-processing, neuralnetwork, nlp, pytorch, pytorch-nlp, tf-idf, twitter, word2vec, wordembeddings
- Language: Jupyter Notebook
- Size: 56.6 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# About this Project
Kaggle competition to predict which Tweets are about real disasters and which ones are not# Dataset
The dataset from this repository can be found in [Kaggle](https://www.kaggle.com/c/nlp-getting-started)
# Methods
* Data exploration
* Preprocessing
* Model training
* Tf-Idf (with Select K-Best)
* Tf-Idf with N-gram (Characters and Words)
* Binary Vectorizer (with SelectKbest)
* Word2Vec (with Twitter word vectors from Glove)
* Combination of binary vectorizer and word2vec
* Neural Network with PyTorch
* Convolutional Neural Network (with w2v embedding)# Result
Best f1 score is .8. Tf_Idf vectorizer and binary vectorizer perform better than other methods
. | precision | recall | f1-score | support
------------ | ------------- | ------------- | ------------- | -------------
0 | 0.82 | 0.85 | 0.84 | 1762
1 | 0.79 | 0.75 | 0.7 | 1284
accuracy | _ | _ | 0.81 | 3046
macro avg | 0.81 | 0.80 | 0.80 | 3046
weighted avg | 0.81 | 0.81 | 0.81 | 3046