https://github.com/cutwell/data-mining-casestudy
NLP and data mining for insights into Twitch and Youtube brand perception within media and on Twitter
https://github.com/cutwell/data-mining-casestudy
Last synced: about 1 year ago
JSON representation
NLP and data mining for insights into Twitch and Youtube brand perception within media and on Twitter
- Host: GitHub
- URL: https://github.com/cutwell/data-mining-casestudy
- Owner: Cutwell
- License: mit
- Created: 2022-09-06T21:45:45.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-09-06T21:47:13.000Z (almost 4 years ago)
- Last Synced: 2025-01-29T11:28:19.247Z (over 1 year ago)
- Language: R
- Size: 775 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Casestudy: Twitch/Youtube sentiment in the media and on Twitter
NLP and data mining for insights into Twitch and Youtube brand perception within media and on Twitter.
## _Hypothesis 1: There is correlation between Follower Count and Tweet engagement_

* We show that, when engagement measured by favorite count is normalized proportional to the ratio of @Youtube and @Twitch brand account followers, the @Twitch brand account clearly has significantly higher relative engagement with its followers.

* We further show that, when engagement is adjusted relative to the total followers of the @Twitch and @Youtube brand accounts, while both brands engage actively with a small percentage of their overall followers, @Twitch has higher relative engagement over @Youtube.
## _Hypothesis 2: There is correlation between media sentiment and Twitter brand sentiment_
* Tweets from a brand account will share contents with media headlines for a given time period.
* The media will reflect a similar sentiment to the brand account.

* We show that there is negligible correlation between media and Twitter sentiment.
* This is likely due to article headlines taking a more neutral tone to appear factual, whilst Tweets are more emotive to engage with an audience.
## _Hypothesis 3: There is intersection between article headline and Twitter tweet text_

* We show that there is a high intersection between article headlines and a brand's Twitter feed.
* We also show that this correlation is relevant, as the intersection correlates with peaks in article count, indicating these intersections are not accidental.
* Notable examples for the @Twitch brand account include spikes for new features as well as controversies.
* There exists an increased volume of articles (and thus, increased intersection score) in recent weeks due to the limitations of Google's news feed.
## Limitations of collecting data using Google's RSS feed

* We collected article data for our media analysis using Google's RSS feed for the topic of "Twitch".
* Relative to the available data collected from Twitter for the @Twitch brand account, we found that the RSS feed articles were not evenly distributed over time, resulting in a concentration of articles within the last few weeks.
* Whilst this limited the scope of our analysis, the data was still provably useful.