https://github.com/daniel-furman/nlp-dataset-mixing-experiments
Data-source mixing for social media caption classification
https://github.com/daniel-furman/nlp-dataset-mixing-experiments
domain-adaptation machine-learning nlp
Last synced: 2 months ago
JSON representation
Data-source mixing for social media caption classification
- Host: GitHub
- URL: https://github.com/daniel-furman/nlp-dataset-mixing-experiments
- Owner: daniel-furman
- License: mit
- Created: 2022-04-09T02:09:28.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-04-09T02:26:33.000Z (about 3 years ago)
- Last Synced: 2025-01-29T19:24:31.512Z (4 months ago)
- Topics: domain-adaptation, machine-learning, nlp
- Language: Jupyter Notebook
- Homepage:
- Size: 7.37 MB
- Stars: 0
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# NLP-experiments-social-media
## Abstract
---Domain shift often hinders the predictive performance of machine learning models where it counts most, on unseen data. However, social media datasets in NLP are in general inflexible to domain shift as they are commonly sourced solely from Twitter, which fails to capture the variation of natural language that exists across different social media platforms. Here, we examined the potential of multi-platform mixing for domain adaptation by combining Instagram captions in equal proportion to Tweets for two authorship analysis tasks. The resulting SVM and LR classifiers saw significant boosts in performance when compared to otherwise identical models constructed entirely from Tweets (6% average F1 increase), as measured in cross-domain testing on Facebook captions.
## Background Figures
---
Figure 1: Bert embeddings EDA | Figure 2: Dataframe head
:---------------------------------:|:----------------------------------------:
 | ## Results
---
Figure 3: Modeling experimental results
:---------------------------------:
