Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/j-w-yun/social_media_stock_data
Scrape historical Subreddit posts and Tweets related to stock symbols and company names.
https://github.com/j-w-yun/social_media_stock_data
Last synced: about 7 hours ago
JSON representation
Scrape historical Subreddit posts and Tweets related to stock symbols and company names.
- Host: GitHub
- URL: https://github.com/j-w-yun/social_media_stock_data
- Owner: j-w-yun
- Created: 2021-02-24T01:46:24.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-03-01T04:57:26.000Z (over 3 years ago)
- Last Synced: 2024-03-01T05:07:40.157Z (9 months ago)
- Language: Python
- Size: 6.12 MB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# social_media_stock_data
Scrape historical Subreddit submissions and comments and Tweets related to stock symbols and company names.Dependencies
```
pip install aiohttp==3.7.3
pip install aiohttp_socks==0.4.1
pip install async_timeout==3.0.1
pip install bs4==0.0.1
pip install elasticsearch==7.11.0
pip install fake_useragent==0.1.11
pip install geopy==2.1.0
pip install googletransx==2.4.2
pip install nltk==3.5
pip install pandas==1.2.2
pip install pysocks==1.7.1
pip install requests==2.25.1
pip install stem==1.8.0
```Requires tor.
Without it to bypass rate limits, downloading this dataset could take months.
Set SOCKS proxy on port 9050.
Set controller on port 9051.
```
# Set controller password as environmental variable
export TOR_CONTROLLER_PW="your_password_here"
```Set the Subreddit to scrape.
```
reddit = REDDIT(directory='reddit_data', subreddit='wallstreetbets')
```Run to write/update `reddit_data`
```
python scrape_social.py -r
```Run to write/update `twitter_data`
```
python scrape_social.py -t
```Run to write/update both `reddit_data` and `twitter_data`
```
python scrape_social.py -a
```