Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/oulianov/finance-tweets

Exploration of a financial tweets dataset to learn about unsupervized machine learning applied to NLP.
https://github.com/oulianov/finance-tweets

dataset finance-tweets financial-tweets-dataset nlp tweets

Last synced: 10 days ago
JSON representation

Exploration of a financial tweets dataset to learn about unsupervized machine learning applied to NLP.

Awesome Lists containing this project

README

        

# finance-tweets

Exploration of a financial tweets dataset to learn about unsupervized machine learning applied to NLP.

## Presentation

In this project, I explore [a dataset of 28K financial tweets](https://www.kaggle.com/davidwallach/financial-tweets/data), provided with the courtesy of David Wallach.

**My goal for this educational project is to explore the dataset.** That is, to understand _what are the kind_ of tweets in this dataset.

To do that, I use Natural Language Processing (NLP) and machine learning techniques to clean and cluster the data. I also use my general knowledge of financial markets to analyze in human words the model's output.

I find that :
- This model allows us to find relevant tweets about news in the technology sector, and tweets about market predictions.
- Data collection is flawed : the dataset contains lots of promotional tweets for cryptocurrency exchanges.
- The model I built is susceptible to tweets spamming the same irrelevant keywords.

I suggest ways to overcome these issues. At the end of this exploration, I gained valuable knowledge to build more robust data collection pipelines and NLP models.

## Instructions

The content of this exploration can be viewed online [by clicking on this link](https://htmlpreview.github.io/?https://github.com/oulianov/finance-tweets/blob/master/Finance%20tweets.html).