https://github.com/vigneshs10/binary-sequence-classification-using-bert
Project code - StumbleUpon NLP Challenge
https://github.com/vigneshs10/binary-sequence-classification-using-bert
nlp nlp-keywords-extraction nlp-machine-learning
Last synced: 10 months ago
JSON representation
Project code - StumbleUpon NLP Challenge
- Host: GitHub
- URL: https://github.com/vigneshs10/binary-sequence-classification-using-bert
- Owner: VigneshS10
- Created: 2021-03-07T16:47:24.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2022-10-29T13:29:50.000Z (about 3 years ago)
- Last Synced: 2025-01-16T18:36:59.678Z (12 months ago)
- Topics: nlp, nlp-keywords-extraction, nlp-machine-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 23.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Binary Sequence Classification using BERT
---
> This is the official repository of my submission to the [StumbleUpon Evergreen Classification Challenge](https://www.kaggle.com/competitions/stumbleupon/overview).
---
This is a Binary sequence classification problem where we need to differentiate
a paragraph upon its ability to stay relevant through times i.e. evergreen or ephemeral.
The main approach I used to solve the problem is using BERT (Bidirectional Encoder Representations from Transformers)
for sequence classification which is a pretrained word embedding transformer in pytorch.
## Dataset:
1. The dataset can be downloaded from this [link](https://www.kaggle.com/competitions/stumbleupon/data).
## Training
1. Run the stumbleupon.ipynb file.
## Evaluation
1. For evaluation, the model is evaluated on the validation dataset and the corresponding
classification report is generated with precision and recall for each classes.
2. Running all the cell blocks from the stumbleupon.ipynb file which generate the results.
## Results
Class based classfication report (on the validation dataset):
|Class | Precision | Recall | F1 score | Supporting Samples|
--------------|-----------|--------|----------|-------------------|
|Non Evergreen|0.77 |0.83 |0.80 |540 |
|Evergreen |0.83 |0.77 |0.80 |570 |
## Contact
For any queries, feel free to contact at vignesh.nitt10@gmail.com.