Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aritter/twitter_download
Download scripts for distributing twitter data.
https://github.com/aritter/twitter_download
Last synced: 9 days ago
JSON representation
Download scripts for distributing twitter data.
- Host: GitHub
- URL: https://github.com/aritter/twitter_download
- Owner: aritter
- License: mit
- Created: 2014-01-22T22:23:13.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2023-03-10T03:30:19.000Z (over 1 year ago)
- Last Synced: 2024-08-01T13:37:38.374Z (3 months ago)
- Language: Python
- Size: 20.5 KB
- Stars: 60
- Watchers: 10
- Forks: 46
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Semeval Twitter data download script
====================For downloading tweets distributed using IDs to protect privacy. Uses the format of the [Semeval Twitter sentiment analysis dataset](http://www.cs.york.ac.uk/semeval-2013/task2/index.php?id=data)
Prerequisites:
--------------
[sixohsix/twitter](https://github.com/sixohsix/twitter)
[tqdm/tqdm](https://github.com/tqdm/tqdm)```
easy_install twitter
easy_install tqdm
```Usage:
--------------The first time you run this, it should open up a web browser, have you log into twitter, and show a PIN number for you to enter into a prompt generated by the script.
1. Login to Twitter with your user name in your *default* browser.
2. Run the script like this to download your credentials: `python download_tweets_api.py --dist=tweeti-a.dist.tsv`
3. Download tweets like so:
```
python download_tweets_api.py --dist=tweeti-a.dist.tsv --output=downloaded.tsv
```-Note that it takes about 18 hours to download the Semeval sentiment analysis training dataset.
Restarting after a partial download:
--------------
In case the script hangs in the middle of the download for whatever reason, use the --partial argument to specify the file containing partially downloaded results.
This way you won't have to start from scratch again:```
python download_tweets_api.py --dist=tweeti-a.dist.tsv --partial=downloaded.tsv --output=downloaded2.tsv
```Task A Mention Test Script
--------------
To print out the mentions and annotations from task A you can use the `testIndices.py` script like so:```
python testIndices.py downloaded.tsv
```
This just prints out the mentions with sentiment annotations for easier inspection.Notes:
--------------
- You may need to manually change the link that is printed out for authorization to use https:// instead of http://
- The time on your computer needs to be set accurately. Thanks to Canberk for noting this on the email list.