Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/oulianov/finance-tweets
Exploration of a financial tweets dataset to learn about unsupervized machine learning applied to NLP.
https://github.com/oulianov/finance-tweets
dataset finance-tweets financial-tweets-dataset nlp tweets
Last synced: 10 days ago
JSON representation
Exploration of a financial tweets dataset to learn about unsupervized machine learning applied to NLP.
- Host: GitHub
- URL: https://github.com/oulianov/finance-tweets
- Owner: oulianov
- Created: 2020-02-11T09:51:42.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2021-01-19T12:14:56.000Z (about 4 years ago)
- Last Synced: 2024-12-14T10:43:10.341Z (about 1 month ago)
- Topics: dataset, finance-tweets, financial-tweets-dataset, nlp, tweets
- Language: HTML
- Homepage:
- Size: 2.67 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# finance-tweets
Exploration of a financial tweets dataset to learn about unsupervized machine learning applied to NLP.
## Presentation
In this project, I explore [a dataset of 28K financial tweets](https://www.kaggle.com/davidwallach/financial-tweets/data), provided with the courtesy of David Wallach.
**My goal for this educational project is to explore the dataset.** That is, to understand _what are the kind_ of tweets in this dataset.
To do that, I use Natural Language Processing (NLP) and machine learning techniques to clean and cluster the data. I also use my general knowledge of financial markets to analyze in human words the model's output.
I find that :
- This model allows us to find relevant tweets about news in the technology sector, and tweets about market predictions.
- Data collection is flawed : the dataset contains lots of promotional tweets for cryptocurrency exchanges.
- The model I built is susceptible to tweets spamming the same irrelevant keywords.I suggest ways to overcome these issues. At the end of this exploration, I gained valuable knowledge to build more robust data collection pipelines and NLP models.
## Instructions
The content of this exploration can be viewed online [by clicking on this link](https://htmlpreview.github.io/?https://github.com/oulianov/finance-tweets/blob/master/Finance%20tweets.html).