Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/duaraghav8/larry-crawler

Kayako Twitter challenge
https://github.com/duaraghav8/larry-crawler

crawler fetch-tweets hashtag nodejs pagination tweets twitter-api

Last synced: about 23 hours ago
JSON representation

Kayako Twitter challenge

Host: GitHub
URL: https://github.com/duaraghav8/larry-crawler
Owner: duaraghav8
License: mit
Created: 2017-01-18T12:31:24.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2017-01-20T11:00:26.000Z (about 8 years ago)
Last Synced: 2025-01-13T09:39:22.238Z (10 days ago)
Topics: crawler, fetch-tweets, hashtag, nodejs, pagination, tweets, twitter-api
Language: JavaScript
Homepage:
Size: 52.7 KB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # larry-crawler

[![Build Status](https://travis-ci.org/duaraghav8/larry-crawler.svg?branch=master)](https://travis-ci.org/duaraghav8/larry-crawler)

Kayako Twitter challenge

## Installation

```js

npm install --save larry-crawler

```

## Usage

Navigate to the ```node_modules``` directory which contains larry-crawler.

```bash

cd larry-crawler/usage

node get-tweets.js

```

## Test

```

npm test

```

## Output

The application fetches tweets in batches of 100. Unless forcefully killed (CTRL+C), the app will keep running until all tweets matching the defined criteria have been fetched.

See [result](https://github.com/duaraghav8/larry-crawler/blob/master/usage/result).

NOTE: A batch might produce less than 100 tweets in output if you've applied a secondary filter (like retweetCounts).

If 100 tweets were retrieved based on specified HashTag and 30 of them haven't been retweeted, then only 70 tweets are supplied in the ```response.statuses``` Array.

## Module API

To access the class larry-crawler exposes for crawling twitter:

```js

const {TwitterCrawler} = require ('./larry-crawler');

```

Get your app or user credentials from https://dev.twitter.com/, then create a new object like:

```js

const crawler = new TwitterCrawler ({

	consumerKey: process.env.TWITTER_CONSUMER_KEY,

	consumerSecret: process.env.TWITTER_CONSUMER_SECRET,

	accessTokenKey: process.env.TWITTER_ACCESS_TOKEN_KEY,

	accessTokenSecret: process.env.TWITTER_ACCESS_TOKEN_SECRET

});

```

If you have a twitter app, use ```bearerToken``` instead of ```accessTokenKey``` & ```accessTokenSecret```.

The new object exposes method ```getTweets()``` to fetch tweets based on criteria and returns a ```Promise```.

```js

const criteria = { hashtags: ['custserv'], retweetCount: {$gt: 0} };

crawler.getTweets (criteria).then ((response) => {

  console.log (JSON.stringify (response, null, 2));

}).catch (() => {});

```

To set the ```max_id``` parameter for pagination,

```js

criteria.maxIdString = status.id_str

```

where ```status``` is an item in the ```response.statuses``` Array.

See [get-tweets.js](https://github.com/duaraghav8/larry-crawler/blob/master/usage/get-tweets.js) for a full example.

## Technical Details

The module has only 1 dependancy - [twitter](https://www.npmjs.com/package/twitter).

1. Searching based on Hashtags is simple since Twitter API has in-built support for that. But in order to further refine tweets based on number of retweets, the module contains a class ```SecondaryFilterForTweets```.

See [Working with search API](https://dev.twitter.com/rest/reference/get/search/tweets)

1. Since a maximum of 100 tweeets are sent per request, an effective pagination strategy had to be implemented using the ```max_id``` parameter so we can retrieve ALL the tweets since the very beginning. [This strategy](https://dev.twitter.com/rest/public/timelines) was followed to achieve pagination.

2. The primary challenge was to deal with the 64-bit integer ID provided by the Twitter API. JS can only provide precision upto 53 bits. Hence, the application uses ```id_str``` field at all times and a special decrement function has been written in ```usage/utils.js``` to operate on the string ID.

See [Working with 64-bit id in Twitter](https://dev.twitter.com/overview/api/twitter-ids-json-and-snowflake)