Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/j-w-yun/social_media_stock_data

Scrape historical Subreddit posts and Tweets related to stock symbols and company names.
https://github.com/j-w-yun/social_media_stock_data

Last synced: about 7 hours ago
JSON representation

Scrape historical Subreddit posts and Tweets related to stock symbols and company names.

Awesome Lists containing this project

README

        

# social_media_stock_data
Scrape historical Subreddit submissions and comments and Tweets related to stock symbols and company names.

Dependencies
```
pip install aiohttp==3.7.3
pip install aiohttp_socks==0.4.1
pip install async_timeout==3.0.1
pip install bs4==0.0.1
pip install elasticsearch==7.11.0
pip install fake_useragent==0.1.11
pip install geopy==2.1.0
pip install googletransx==2.4.2
pip install nltk==3.5
pip install pandas==1.2.2
pip install pysocks==1.7.1
pip install requests==2.25.1
pip install stem==1.8.0
```

Requires tor.

Without it to bypass rate limits, downloading this dataset could take months.

Set SOCKS proxy on port 9050.

Set controller on port 9051.
```
# Set controller password as environmental variable
export TOR_CONTROLLER_PW="your_password_here"
```

Set the Subreddit to scrape.
```
reddit = REDDIT(directory='reddit_data', subreddit='wallstreetbets')
```

Run to write/update `reddit_data`
```
python scrape_social.py -r
```

Run to write/update `twitter_data`
```
python scrape_social.py -t
```

Run to write/update both `reddit_data` and `twitter_data`
```
python scrape_social.py -a
```