Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/oulianov/finance-tweets

Exploration of a financial tweets dataset to learn about unsupervized machine learning applied to NLP.
https://github.com/oulianov/finance-tweets

dataset finance-tweets financial-tweets-dataset nlp tweets

Last synced: 10 days ago
JSON representation

Exploration of a financial tweets dataset to learn about unsupervized machine learning applied to NLP.

Host: GitHub
URL: https://github.com/oulianov/finance-tweets
Owner: oulianov
Created: 2020-02-11T09:51:42.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2021-01-19T12:14:56.000Z (about 4 years ago)
Last Synced: 2024-12-14T10:43:10.341Z (about 1 month ago)
Topics: dataset, finance-tweets, financial-tweets-dataset, nlp, tweets
Language: HTML
Homepage:
Size: 2.67 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# finance-tweets

Exploration of a financial tweets dataset to learn about unsupervized machine learning applied to NLP.

## Presentation

In this project, I explore [a dataset of 28K financial tweets](https://www.kaggle.com/davidwallach/financial-tweets/data), provided with the courtesy of David Wallach.

**My goal for this educational project is to explore the dataset.** That is, to understand _what are the kind_ of tweets in this dataset.

To do that, I use Natural Language Processing (NLP) and machine learning techniques to clean and cluster the data. I also use my general knowledge of financial markets to analyze in human words the model's output.

I find that :
- This model allows us to find relevant tweets about news in the technology sector, and tweets about market predictions.
- Data collection is flawed : the dataset contains lots of promotional tweets for cryptocurrency exchanges.
- The model I built is susceptible to tweets spamming the same irrelevant keywords.

I suggest ways to overcome these issues. At the end of this exploration, I gained valuable knowledge to build more robust data collection pipelines and NLP models.

## Instructions

The content of this exploration can be viewed online [by clicking on this link](https://htmlpreview.github.io/?https://github.com/oulianov/finance-tweets/blob/master/Finance%20tweets.html).