{"id":16741549,"url":"https://github.com/duaraghav8/larry-crawler","last_synced_at":"2026-05-17T20:07:21.161Z","repository":{"id":98907313,"uuid":"79338118","full_name":"duaraghav8/larry-crawler","owner":"duaraghav8","description":"Kayako Twitter challenge","archived":false,"fork":false,"pushed_at":"2017-01-20T11:00:26.000Z","size":54,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-08T08:04:04.588Z","etag":null,"topics":["crawler","fetch-tweets","hashtag","nodejs","pagination","tweets","twitter-api"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/duaraghav8.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-01-18T12:31:24.000Z","updated_at":"2019-06-07T09:26:42.000Z","dependencies_parsed_at":"2023-06-08T11:15:52.661Z","dependency_job_id":null,"html_url":"https://github.com/duaraghav8/larry-crawler","commit_stats":{"total_commits":28,"total_committers":1,"mean_commits":28.0,"dds":0.0,"last_synced_commit":"e536e6fc3609a899ef1f15edb16b817650a4aa9d"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duaraghav8%2Flarry-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duaraghav8%2Flarry-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duaraghav8%2Flarry-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duaraghav8%2Flarry-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/duaraghav8","download_url":"https://codeload.github.com/duaraghav8/larry-crawler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243707308,"owners_count":20334616,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","fetch-tweets","hashtag","nodejs","pagination","tweets","twitter-api"],"created_at":"2024-10-13T01:03:27.657Z","updated_at":"2026-05-17T20:07:16.126Z","avatar_url":"https://github.com/duaraghav8.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# larry-crawler\n\n[![Build Status](https://travis-ci.org/duaraghav8/larry-crawler.svg?branch=master)](https://travis-ci.org/duaraghav8/larry-crawler)\n\nKayako Twitter challenge\n\n## Installation\n```js\nnpm install --save larry-crawler\n```\n\n## Usage\nNavigate to the ```node_modules``` directory which contains larry-crawler.\n\n```bash\ncd larry-crawler/usage\nnode get-tweets.js\n```\n\n## Test\n```\nnpm test\n```\n\n## Output\nThe application fetches tweets in batches of 100. Unless forcefully killed (CTRL+C), the app will keep running until all tweets matching the defined criteria have been fetched.\nSee [result](https://github.com/duaraghav8/larry-crawler/blob/master/usage/result).\n\nNOTE: A batch might produce less than 100 tweets in output if you've applied a secondary filter (like retweetCounts).\nIf 100 tweets were retrieved based on specified HashTag and 30 of them haven't been retweeted, then only 70 tweets are supplied in the ```response.statuses``` Array.\n\n\n## Module API\nTo access the class larry-crawler exposes for crawling twitter:\n\n```js\nconst {TwitterCrawler} = require ('./larry-crawler');\n```\n\nGet your app or user credentials from https://dev.twitter.com/, then create a new object like:\n\n```js\nconst crawler = new TwitterCrawler ({\n\n\tconsumerKey: process.env.TWITTER_CONSUMER_KEY,\n\tconsumerSecret: process.env.TWITTER_CONSUMER_SECRET,\n\taccessTokenKey: process.env.TWITTER_ACCESS_TOKEN_KEY,\n\taccessTokenSecret: process.env.TWITTER_ACCESS_TOKEN_SECRET\n\n});\n```\nIf you have a twitter app, use ```bearerToken``` instead of ```accessTokenKey``` \u0026 ```accessTokenSecret```.\n\nThe new object exposes method ```getTweets()``` to fetch tweets based on criteria and returns a ```Promise```.\n\n```js\nconst criteria = { hashtags: ['custserv'], retweetCount: {$gt: 0} };\n\ncrawler.getTweets (criteria).then ((response) =\u003e {\n  console.log (JSON.stringify (response, null, 2));\n}).catch (() =\u003e {});\n```\n\nTo set the ```max_id``` parameter for pagination,\n```js\ncriteria.maxIdString = status.id_str\n```\nwhere ```status``` is an item in the ```response.statuses``` Array.\n\nSee [get-tweets.js](https://github.com/duaraghav8/larry-crawler/blob/master/usage/get-tweets.js) for a full example.\n\n\n\n## Technical Details\n\nThe module has only 1 dependancy - [twitter](https://www.npmjs.com/package/twitter).\n\n1. Searching based on Hashtags is simple since Twitter API has in-built support for that. But in order to further refine tweets based on number of retweets, the module contains a class ```SecondaryFilterForTweets```.\n\nSee [Working with search API](https://dev.twitter.com/rest/reference/get/search/tweets)\n\n1. Since a maximum of 100 tweeets are sent per request, an effective pagination strategy had to be implemented using the ```max_id``` parameter so we can retrieve ALL the tweets since the very beginning. [This strategy](https://dev.twitter.com/rest/public/timelines) was followed to achieve pagination.\n\n2. The primary challenge was to deal with the 64-bit integer ID provided by the Twitter API. JS can only provide precision upto 53 bits. Hence, the application uses ```id_str``` field at all times and a special decrement function has been written in ```usage/utils.js``` to operate on the string ID.\n\nSee [Working with 64-bit id in Twitter](https://dev.twitter.com/overview/api/twitter-ids-json-and-snowflake)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduaraghav8%2Flarry-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fduaraghav8%2Flarry-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduaraghav8%2Flarry-crawler/lists"}