An open API service indexing awesome lists of open source software.

https://github.com/daniel-furman/nlp-dataset-mixing-experiments

Data-source mixing for social media caption classification
https://github.com/daniel-furman/nlp-dataset-mixing-experiments

domain-adaptation machine-learning nlp

Last synced: 2 months ago
JSON representation

Data-source mixing for social media caption classification

Awesome Lists containing this project

README

        

# NLP-experiments-social-media

## Abstract
---

Domain shift often hinders the predictive performance of machine learning models where it counts most, on unseen data. However, social media datasets in NLP are in general inflexible to domain shift as they are commonly sourced solely from Twitter, which fails to capture the variation of natural language that exists across different social media platforms. Here, we examined the potential of multi-platform mixing for domain adaptation by combining Instagram captions in equal proportion to Tweets for two authorship analysis tasks. The resulting SVM and LR classifiers saw significant boosts in performance when compared to otherwise identical models constructed entirely from Tweets (6% average F1 increase), as measured in cross-domain testing on Facebook captions.

## Background Figures
---
Figure 1: Bert embeddings EDA | Figure 2: Dataframe head
:---------------------------------:|:----------------------------------------:
![](data/Fig1.png) | ![](data/Fig3.png)

## Results
---
Figure 3: Modeling experimental results
:---------------------------------:
![](data/Fig_2.png)