Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yall/scrapy-twitter
https://github.com/yall/scrapy-twitter
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/yall/scrapy-twitter
- Owner: yall
- Created: 2015-05-01T14:29:44.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-04-12T11:23:29.000Z (almost 9 years ago)
- Last Synced: 2024-08-05T17:42:27.317Z (5 months ago)
- Language: Python
- Size: 4.88 KB
- Stars: 45
- Watchers: 3
- Forks: 14
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-hacking-lists - yall/scrapy-twitter - (Python)
README
# scrapy-twitter
A lightweight wrapper over python-twitter library to use it in scrapy projects.
## Usage
Install
sudo pip install -e git+https://git.lab.bluestone.fr/jgs/scrapy-twitter.git#egg=scrapy_twitter
Set your API credentials and add TwitterDownloaderMiddleware in your scrapy project settings
```python
DOWNLOADER_MIDDLEWARES = {
'scrapy_twitter.TwitterDownloaderMiddleware': 10,
}
TWITTER_CONSUMER_KEY = 'xxxx'
TWITTER_CONSUMER_SECRET = 'xxxx'
TWITTER_ACCESS_TOKEN_KEY = 'xxxx'
TWITTER_ACCESS_TOKEN_SECRET = 'xxxx'
```## Spider examples
### User Timeline
This spider get all tweets of a user timeline, iterating with max_id while there are remaining tweets.
scrapy crawl user-timeline -a screen_name=zachbraff -o zb_tweets.json
```python
import scrapyfrom scrapy_twitter import TwitterUserTimelineRequest, to_item
class UserTimelineSpider(scrapy.Spider):
name = "user-timeline"
allowed_domains = ["twitter.com"]def __init__(self, screen_name = None, *args, **kwargs):
if not screen_name:
raise scrapy.exceptions.CloseSpider('Argument scren_name not set.')
super(UserTimelineSpider, self).__init__(*args, **kwargs)
self.screen_name = screen_name
self.count = 100def start_requests(self):
return [ TwitterUserTimelineRequest(
screen_name = self.screen_name,
count = self.count) ]def parse(self, response):
tweets = response.tweetsfor tweet in tweets:
yield to_item(tweet)if tweets:
yield TwitterUserTimelineRequest(
screen_name = self.screen_name,
count = self.count,
max_id = tweets[-1]['id'] - 1)
```### Streaming
This spider plugs to the streaming API and triggers all tweets to the pipeline.
scrapy crawl stream-filter -a track=#starwars
```python
import scrapyfrom scrapy_twitter import TwitterStreamFilterRequest, to_item
class StreamFilterSpider(scrapy.Spider):
name = "stream-filter"
allowed_domains = ["twitter.com"]def __init__(self, track = None, *args, **kwargs):
if not track:
raise scrapy.exceptions.CloseSpider('Argument track not set.')
super(StreamFilterSpider, self).__init__(*args, **kwargs)
self.track = track.split(',')def start_requests(self):
return [ TwitterStreamFilterRequest(track = self.track) ]def parse(self, response):
for tweet in response.tweets:
yield to_item(tweet)
```