Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/aritter/twitter_download

Download scripts for distributing twitter data.
https://github.com/aritter/twitter_download

Last synced: 9 days ago
JSON representation

Download scripts for distributing twitter data.

Awesome Lists containing this project

README

        

Semeval Twitter data download script
====================

For downloading tweets distributed using IDs to protect privacy. Uses the format of the [Semeval Twitter sentiment analysis dataset](http://www.cs.york.ac.uk/semeval-2013/task2/index.php?id=data)

Prerequisites:
--------------
[sixohsix/twitter](https://github.com/sixohsix/twitter)
[tqdm/tqdm](https://github.com/tqdm/tqdm)

```
easy_install twitter
easy_install tqdm
```

Usage:
--------------

The first time you run this, it should open up a web browser, have you log into twitter, and show a PIN number for you to enter into a prompt generated by the script.

1. Login to Twitter with your user name in your *default* browser.
2. Run the script like this to download your credentials: `python download_tweets_api.py --dist=tweeti-a.dist.tsv`
3. Download tweets like so:
```
python download_tweets_api.py --dist=tweeti-a.dist.tsv --output=downloaded.tsv
```

-Note that it takes about 18 hours to download the Semeval sentiment analysis training dataset.

Restarting after a partial download:
--------------
In case the script hangs in the middle of the download for whatever reason, use the --partial argument to specify the file containing partially downloaded results.
This way you won't have to start from scratch again:

```
python download_tweets_api.py --dist=tweeti-a.dist.tsv --partial=downloaded.tsv --output=downloaded2.tsv
```

Task A Mention Test Script
--------------
To print out the mentions and annotations from task A you can use the `testIndices.py` script like so:

```
python testIndices.py downloaded.tsv
```
This just prints out the mentions with sentiment annotations for easier inspection.

Notes:
--------------
- You may need to manually change the link that is printed out for authorization to use https:// instead of http://
- The time on your computer needs to be set accurately. Thanks to Canberk for noting this on the email list.