https://github.com/ammirsm/data-grabber-cnn-twitter

Basic setup to get data from twitter and CNN with a keyword.
https://github.com/ammirsm/data-grabber-cnn-twitter

cnn crawler django scrapyd twitter

Last synced: 7 months ago
JSON representation

Basic setup to get data from twitter and CNN with a keyword.

Host: GitHub
URL: https://github.com/ammirsm/data-grabber-cnn-twitter
Owner: ammirsm
License: mit
Created: 2019-11-09T13:53:24.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2022-11-04T19:28:11.000Z (almost 3 years ago)
Last Synced: 2025-02-04T13:45:37.806Z (9 months ago)
Topics: cnn, crawler, django, scrapyd, twitter
Language: JavaScript
Homepage:
Size: 689 KB
Stars: 1
Watchers: 2
Forks: 1
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# CNN and Twitter Crawler for getting clear data
Basic setup to get data from twitter and CNN with a keyword.

* A crawler that crawls the latest 25 articles about Trump from CNN.com and his latest tweets
* A simple website that displays the titles of the crawled information
* A convenient way of displaying the information after I click on one of the titles
* Word Cloud of lastests news and tweet

## Setup
1 - Install requirements
````
$ pip install -r requirements.txt
````
2 - Configure the database
````
$ python manage.py migrate
````
## Start the project
In order to start this project you will need to have running Django and Scrapyd at the same time.

In order to run Django
````
$ python manage.py runserver
````
In order to run Scrapyd
````
$ cd scrapy_app
$ scrapyd
````

At this point you will be able to send job request to Scrapyd. This project is setup with a demo spider from the oficial tutorial of scrapy. To run it you must send a http request to Scrapyd with the job info

Project contains two spiders, 'icrawler' for crawling CNN and 'twitter' for crawling twitter.

````
curl http://localhost:6800/schedule.json -d project=default -d spider=icrawler
````

The crawled data will be automatically be saved in the Django models

An accessiblity have implemented in frontend for running these crawlers.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ammirsm/data-grabber-cnn-twitter

Awesome Lists containing this project

README