Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/charlesyuan02/sentiment-analysis-stock-trader

finbert marketwatch reddit-api sentiment-analysis web-scraping

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/charlesyuan02/sentiment-analysis-stock-trader
Owner: CharlesYuan02
License: mit
Created: 2023-02-22T20:40:55.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-04-15T21:06:54.000Z (over 1 year ago)
Last Synced: 2024-10-10T19:10:07.876Z (2 months ago)
Topics: finbert, marketwatch, reddit-api, sentiment-analysis, web-scraping
Language: Python
Homepage:
Size: 682 KB
Stars: 4
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# sentiment-analysis-stock-trader

## Prerequisites
All code was written in Python 3.7.9. Please see requirements.txt for dependencies.
```
beautifulsoup4==4.12.0
pandas==1.2.3
praw==7.7.0
requests==2.28.1
snscrape==0.3.4
tqdm==4.56.2
numpy==1.24.2
scikit-learn==1.2.2
torch==2.0.0+cu117
torchaudio==2.0.1+cu117
torchvision==0.15.1+cu117
transformers==4.27.3
```

## Description of Files
### create_dataset.py
This file calls functions defined in the other files to create a dataset (this is not the final dataset that we will be using for sentiment analysis, just a preliminary proof of concept).

### dataset.csv
This is the example dataset created using create_dataset.py.

### finbert.py
This file uses the pretrained FinBERT model on the example dataset.

### scrape_headlines.py
This file contains functions to scrape S and P 500 stock tickers and names from Wikipedia, scrape news headlines for any S and P 500 stock from Yahoo Finance, and scrape news headlines for any S and P 500 stock from MarketWatch.

### scrape_reddit.py
This file contains functions to scrape titles and top comments of top posts from a specified subreddit on Reddit. Note that it requires you to have a file called info.txt saved in the same directory, with the first line of this file being your Reddit API client ID, the second line being your Reddit API client secret, and the third and final line of this file being your Reddit API user agent.

## License
This project is licensed under the MIT License - see the LICENSE file for details.