Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Nedja995/twint_server
TWINT Flask-Celery Server. Optimized tweets scraping
https://github.com/Nedja995/twint_server
celery distributed elasticsearch flask flower optimized python scraper twint twitter
Last synced: about 2 months ago
JSON representation
TWINT Flask-Celery Server. Optimized tweets scraping
- Host: GitHub
- URL: https://github.com/Nedja995/twint_server
- Owner: Nedja995
- Created: 2019-04-06T17:35:02.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-06-02T15:10:50.000Z (over 5 years ago)
- Last Synced: 2024-08-02T05:08:24.275Z (5 months ago)
- Topics: celery, distributed, elasticsearch, flask, flower, optimized, python, scraper, twint, twitter
- Language: Python
- Homepage:
- Size: 11.7 KB
- Stars: 13
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### [TWINT](https://github.com/twintproject/twint) Flask-Celery Server
Optimized tweets scraping#### See also [Twint Kibana](https://github.com/Nedja995/twint_kibana)
#### Requirements
- Python3, [Twint](https://github.com/twintproject/twint), Flask, Celery
- Elasticsearch(v7)
- RabitMQ
- (optional) Flower#### Run server
1. Run Celery workers:
- `$ celery worker --app=worker.celery --hostname=worker.fetching@%h --queues=fetching --loglevel=info`
- (Optional) task for reporting progress if it is implemented `$ celery worker --app=worker.celery --hostname=worker.saving@%h --queues=saving --loglevel=info`2. Run Flask server: `$ python3 app.py`
- (Optional) Monitor Celery with Flower: `$ celery -A app.celery flower --broker='pyamqp://guest@localhost//'`
#### Use
1. Create ES index with [index-tweets.json](elasticsearch/index-tweets.json)
2. Start tweets fetching
- arguments are mapped to [twint config](https://github.com/twintproject/twint/blob/master/twint/config.py)
- I mainly use it with elasticsearch so I did not test with other arguments
- Since and Until and Search/User are required
```
POST http://localhost:5000/fetch
{
"Since": "2019-2-1",
"Until": "2019-3-1",
"Search": "",
// or
"User": ""
"Elasticsearch": "localhost:9200",
"Index_tweets": ""
}
```