https://github.com/khuyentran1401/real-or-not

Kaggle competition to predict which Tweets are about real disasters and which ones are not
https://github.com/khuyentran1401/real-or-not

glove natural-language-processing neuralnetwork nlp pytorch pytorch-nlp tf-idf twitter word2vec wordembeddings

Last synced: 7 months ago
JSON representation

Kaggle competition to predict which Tweets are about real disasters and which ones are not

Host: GitHub
URL: https://github.com/khuyentran1401/real-or-not
Owner: khuyentran1401
Created: 2020-04-05T13:47:20.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-04-05T15:19:47.000Z (over 5 years ago)
Last Synced: 2025-01-26T01:15:21.289Z (9 months ago)
Topics: glove, natural-language-processing, neuralnetwork, nlp, pytorch, pytorch-nlp, tf-idf, twitter, word2vec, wordembeddings
Language: Jupyter Notebook
Size: 56.6 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # About this Project

Kaggle competition to predict which Tweets are about real disasters and which ones are not

# Dataset

The dataset from this repository can be found in [Kaggle](https://www.kaggle.com/c/nlp-getting-started)

# Methods

* Data exploration

* Preprocessing

* Model training

  * Tf-Idf (with Select K-Best)

  * Tf-Idf with N-gram (Characters and Words)

  * Binary Vectorizer (with SelectKbest)

  * Word2Vec (with Twitter word vectors from Glove)

  * Combination of binary vectorizer and word2vec

  * Neural Network with PyTorch

  * Convolutional Neural Network (with w2v embedding)

# Result

Best f1 score is .8. Tf_Idf vectorizer and binary vectorizer perform better than other methods

 

. | precision | recall | f1-score | support

------------ | ------------- | ------------- | ------------- | ------------- 

0 | 0.82 | 0.85 | 0.84 | 1762

1 | 0.79 | 0.75 | 0.7 | 1284

accuracy | _ | _ | 0.81 | 3046

macro avg | 0.81 | 0.80 | 0.80 | 3046

weighted avg | 0.81 | 0.81 | 0.81 | 3046

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/khuyentran1401/real-or-not

Awesome Lists containing this project

README