Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Canicio/tweetstake

Application to get tweets of Twitter on specific topics. Specially designed for Big Data collection.
https://github.com/Canicio/tweetstake

bigdata docker mongodb tweets twitter

Last synced: 16 days ago
JSON representation

Application to get tweets of Twitter on specific topics. Specially designed for Big Data collection.

Host: GitHub
URL: https://github.com/Canicio/tweetstake
Owner: Canicio
License: mit
Created: 2017-11-22T11:55:59.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2022-12-08T00:34:54.000Z (almost 2 years ago)
Last Synced: 2024-08-20T19:10:24.629Z (3 months ago)
Topics: bigdata, docker, mongodb, tweets, twitter
Language: Python
Size: 27.3 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# TweetsTake

Application to get tweets of Twitter on specific topics. Specially designed for Big Data collection.

---

## Execution without installation (recommended)
Requirements: Docker Engine installed (https://docs.docker.com/engine/installation).

Below there is an example of running the application with Docker (*Examples* section)

## Installation (Linux and Mac OSX)
*NOTE: it is recommended to run the application without installing. Look above.*

### Requirements
* Python 3.6.1+
* python3.6-dev
* python3-setuptools
* python3-pip
* MongoDB database (local or remote)

### Installation steps
```sh
clone or download and unzip this project
$ cd tweetstake
$ sudo python3.6 setup.py install
$ tweetstake -h # -h show help
```

## Before starting

### Configure accounts csv file
You must go to [https://apps.twitter.com/](https://apps.twitter.com/), create an app with your account and generate tokens. If you are going to collect tweets with several search criteria at the same time it is preferable that you create several accounts with their respective apps and tokens.
Then, you must create a **.csv file** and write the tokens in the following format. The first line must be the same. The rest of the lines represent an account with their respective tokens. Each token must be separated by a comma.

*example_file.csv*
```sh
consumer_key,consumer_secret,token_key,token_secret
uGvo8uIN2wg5nKvWfmBuSjmTv,bx4yTUiav6dJvqkWo8VvxSORyrRHApUMPldrZrHcAmTg6AXl6X,150147078634094680-WItRgONsdhhZc6C7q8n9NWDvYG94aVB,qQ7qj6dbfhbqc69EPSVFzMvPpjy1Rl91RdiJ6WzzKUIas
```

### Ready Mongodb database

You must have a **mongodb** database available. By default the name of the database and the host are 'tweetscollector' and 'localhost:27017'. These values can be changed.

## Examples

**Collect tweets with '#hello' for 6 hours:**
```console
tweetstake -accounts_file ~/file2.csv -filter '#hello' -hours 6
```

**Collect tweets with '#hello' for 15 minutes, specifying parameters of the mongodb database:**
```console
tweetstake -accounts_file ~/file2.csv -filter '#hello' -minutes 15 -db_name 'mydbname' -db_host 'db1.example.net:27017'
```

**Collect tweets with '#hello' or '#bye' for 6 hours and 30 minutes:**
```console
tweetstake -accounts_file ~/file2.csv -filter '#hello' '#bye' -hours 6 -minutes 30
```

**Execution with Docker:**
```sh
docker run --net=host -v /path/csv/folder:/home canicio/tweetstake tweetstake -accounts_file /home/file.csv -filter '#hello' -minutes 15
```

## License
[MIT](LICENSE) (Massachusetts Institute of Technology)