https://github.com/daniel-furman/nlp-dataset-mixing-experiments

Data-source mixing for social media caption classification
https://github.com/daniel-furman/nlp-dataset-mixing-experiments

domain-adaptation machine-learning nlp

Last synced: 2 months ago
JSON representation

Data-source mixing for social media caption classification

Host: GitHub
URL: https://github.com/daniel-furman/nlp-dataset-mixing-experiments
Owner: daniel-furman
License: mit
Created: 2022-04-09T02:09:28.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2022-04-09T02:26:33.000Z (about 3 years ago)
Last Synced: 2025-01-29T19:24:31.512Z (4 months ago)
Topics: domain-adaptation, machine-learning, nlp
Language: Jupyter Notebook
Homepage:
Size: 7.37 MB
Stars: 0
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# NLP-experiments-social-media

## Abstract
---

Domain shift often hinders the predictive performance of machine learning models where it counts most, on unseen data. However, social media datasets in NLP are in general inflexible to domain shift as they are commonly sourced solely from Twitter, which fails to capture the variation of natural language that exists across different social media platforms. Here, we examined the potential of multi-platform mixing for domain adaptation by combining Instagram captions in equal proportion to Tweets for two authorship analysis tasks. The resulting SVM and LR classifiers saw significant boosts in performance when compared to otherwise identical models constructed entirely from Tweets (6% average F1 increase), as measured in cross-domain testing on Facebook captions.

## Background Figures
---
Figure 1: Bert embeddings EDA | Figure 2: Dataframe head
:---------------------------------:|:----------------------------------------:
![](data/Fig1.png) | ![](data/Fig3.png)

## Results
---
Figure 3: Modeling experimental results
:---------------------------------:
![](data/Fig_2.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/daniel-furman/nlp-dataset-mixing-experiments

Awesome Lists containing this project

README