Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/duaraghav8/larry-crawler
Kayako Twitter challenge
https://github.com/duaraghav8/larry-crawler
crawler fetch-tweets hashtag nodejs pagination tweets twitter-api
Last synced: about 23 hours ago
JSON representation
Kayako Twitter challenge
- Host: GitHub
- URL: https://github.com/duaraghav8/larry-crawler
- Owner: duaraghav8
- License: mit
- Created: 2017-01-18T12:31:24.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-01-20T11:00:26.000Z (about 8 years ago)
- Last Synced: 2025-01-13T09:39:22.238Z (10 days ago)
- Topics: crawler, fetch-tweets, hashtag, nodejs, pagination, tweets, twitter-api
- Language: JavaScript
- Homepage:
- Size: 52.7 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# larry-crawler
[![Build Status](https://travis-ci.org/duaraghav8/larry-crawler.svg?branch=master)](https://travis-ci.org/duaraghav8/larry-crawler)
Kayako Twitter challenge
## Installation
```js
npm install --save larry-crawler
```## Usage
Navigate to the ```node_modules``` directory which contains larry-crawler.```bash
cd larry-crawler/usage
node get-tweets.js
```## Test
```
npm test
```## Output
The application fetches tweets in batches of 100. Unless forcefully killed (CTRL+C), the app will keep running until all tweets matching the defined criteria have been fetched.
See [result](https://github.com/duaraghav8/larry-crawler/blob/master/usage/result).NOTE: A batch might produce less than 100 tweets in output if you've applied a secondary filter (like retweetCounts).
If 100 tweets were retrieved based on specified HashTag and 30 of them haven't been retweeted, then only 70 tweets are supplied in the ```response.statuses``` Array.## Module API
To access the class larry-crawler exposes for crawling twitter:```js
const {TwitterCrawler} = require ('./larry-crawler');
```Get your app or user credentials from https://dev.twitter.com/, then create a new object like:
```js
const crawler = new TwitterCrawler ({consumerKey: process.env.TWITTER_CONSUMER_KEY,
consumerSecret: process.env.TWITTER_CONSUMER_SECRET,
accessTokenKey: process.env.TWITTER_ACCESS_TOKEN_KEY,
accessTokenSecret: process.env.TWITTER_ACCESS_TOKEN_SECRET});
```
If you have a twitter app, use ```bearerToken``` instead of ```accessTokenKey``` & ```accessTokenSecret```.The new object exposes method ```getTweets()``` to fetch tweets based on criteria and returns a ```Promise```.
```js
const criteria = { hashtags: ['custserv'], retweetCount: {$gt: 0} };crawler.getTweets (criteria).then ((response) => {
console.log (JSON.stringify (response, null, 2));
}).catch (() => {});
```To set the ```max_id``` parameter for pagination,
```js
criteria.maxIdString = status.id_str
```
where ```status``` is an item in the ```response.statuses``` Array.See [get-tweets.js](https://github.com/duaraghav8/larry-crawler/blob/master/usage/get-tweets.js) for a full example.
## Technical Details
The module has only 1 dependancy - [twitter](https://www.npmjs.com/package/twitter).
1. Searching based on Hashtags is simple since Twitter API has in-built support for that. But in order to further refine tweets based on number of retweets, the module contains a class ```SecondaryFilterForTweets```.
See [Working with search API](https://dev.twitter.com/rest/reference/get/search/tweets)
1. Since a maximum of 100 tweeets are sent per request, an effective pagination strategy had to be implemented using the ```max_id``` parameter so we can retrieve ALL the tweets since the very beginning. [This strategy](https://dev.twitter.com/rest/public/timelines) was followed to achieve pagination.
2. The primary challenge was to deal with the 64-bit integer ID provided by the Twitter API. JS can only provide precision upto 53 bits. Hence, the application uses ```id_str``` field at all times and a special decrement function has been written in ```usage/utils.js``` to operate on the string ID.
See [Working with 64-bit id in Twitter](https://dev.twitter.com/overview/api/twitter-ids-json-and-snowflake)