Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/saasify-sh/twitter-flock

Simple & robust workflow automations for Twitter.
https://github.com/saasify-sh/twitter-flock

automation followers-twitter saasify twitter twitter-api workflow

Last synced: about 2 months ago
JSON representation

Simple & robust workflow automations for Twitter.

Host: GitHub
URL: https://github.com/saasify-sh/twitter-flock
Owner: saasify-sh
Created: 2020-06-15T05:32:57.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2023-01-07T19:08:31.000Z (almost 2 years ago)
Last Synced: 2023-03-09T15:15:40.857Z (almost 2 years ago)
Topics: automation, followers-twitter, saasify, twitter, twitter-api, workflow
Language: JavaScript
Homepage:
Size: 5.44 MB
Stars: 15
Watchers: 4
Forks: 3
Open Issues: 15
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

        # Twitter Flock

> Simple & robust workflows to export your flock of followers from Twitter.



  

    

  



[![Build Status](https://travis-ci.com/saasify-sh/twitter-flock.svg?branch=master)](https://travis-ci.com/saasify-sh/twitter-flock) [![JavaScript Style Guide](https://img.shields.io/badge/code_style-standard-brightgreen.svg)](https://standardjs.com)

## How it works

All automations are built around Twitter OAuth which gives us higher rate limits and access to private user actions like tweeting and sending DMs.

### BatchJob

The core automation functionality is built around the [BatchJob](./lib/batch-job.js) class.

The goal of `BatchJob` is to ensure that potentially large batches of Twitter API calls are **serializable** and **resumable**.

A `BatchJob` stores all of the state it would need to continue resolving its async batched operation in the event of an error. `BatchJob` instances support serializing their state in order to store them in a database of JSON file on disk.

Here's an example batch job in action:

```js

// fetches the user ids for all of your followers

const job = BatchJobFactory.createBatchJobTwitterGetFollowers({

  params: {

    // assumes that you already have user oauth credentials

    accessToken: twitterAccessToken,

    accessTokenSecret: twitterAccessTokenSecret,

    // only fetch your first 10 followers for testing purposes

    maxLimit: 10,

    count: 10

  }

})

// process as much of this job as possible until it either completes or errors

await job.run()

// job.status: 'active' | 'done' | 'error'

// job.results: string[]

console.log(job.status, job.results)

// store this job to disk

fs.writeFileSync('out.json', job.serialize())

// ...

// read the job from disk and resume processing

const jobData = fs.readFileSync('out.json')

const job = BatchJobFactory.deserialize(jobData)

if (job.status === 'active') {

  await job.run()

}

```

This example also shows how to serialize and resume a job.

### Workflow

Sequences of `BatchJob` instances can be connected together to form a [Workflow](./lib/workflow.js).

Here's an example workflow:

```js

const job = new Workflow({

  params: {

    // assumes that you already have user oauth credentials

    accessToken: twitterAccessToken,

    accessTokenSecret: twitterAccessTokenSecret,

    pipeline: [

      {

        type: 'twitter:get-followers',

        label: 'followers',

        params: {

          // only fetch your first 10 followers for testing purposes

          maxLimit: 10,

          count: 10

        }

      },

      {

        type: 'twitter:lookup-users',

        label: 'users',

        connect: {

          // connect the output of the first job to the `userIds` param for this job

          userIds: 'followers'

        },

        transforms: ['sort-users-by-fuzzy-popularity']

      },

      {

        type: 'twitter:send-direct-messages',

        connect: {

          // connect the output of the second job to the `users` param for this job

          users: 'users'

        },

        params: {

          // handlebars template with access to the current twitter user object

          template: `Hey @{{user.screen_name}}, I'm testing an open source Twitter automation tool and you happen to be my one and only lucky test user.\n\nSorry for the spam. https://github.com/saasify-sh/twitter-flock`

        }

      }

    ]

  }

})

await job.run()

```

This workflow is comprised of three jobs:

- `twitter:get-followers` - Fetches the user ids of all of your followers.

  - Batches twitter API calls to [twitter followers/ids](https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference/get-followers-ids)

- `twitter:lookup-users` - Expands these user ids into user objects.

  - Batches twitter API calls to [users/lookup](https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference/get-users-lookup)

- `twitter:send-direct-messages` - Sends a template-based direct message to each of these users.

  - Batches twitter API calls to [direct_messages/events/new](https://developer.twitter.com/en/docs/direct-messages/sending-and-receiving/api-reference/new-event)

Note that `Workflow` derives from `BatchJob`, so workflows are also serializable and resumable. Huzzah!

## Future work

A more robust, scalable version of this project would use a solution like [Apache Kafka](https://kafka.apache.org). [Kafka.js](https://kafka.js.org) looks useful as a higher-level Node.js wrapper.

Kafka would add quite a bit of complexity, but it would also handle a lot of details and be significantly more efficient. In particular, Kafka would solve the producer / consumer model, give us more robust error handling, horizontal scalability, storing and committing state, and enable easy interop with different data sources and sinks.

This project was meant as a quick prototype, however, and our relatively simple `BatchJob` abstraction works pretty well all things considered.

### Producer / Consumer

One of the disadvantages of the current design is that a `BatchJob` needs to complete before any dependent jobs can run, whereas we'd really like to model this as a [Producer-Consumer problem](https://en.wikipedia.org/wiki/Producer%E2%80%93consumer_problem).

### DAGs

Another shortcoming of the current design is that `Workflows` can only combine sequences of jobs where the output of one job feeds into the input of the next job.

A more extensible design would allow for workflows comprised of [directed acyclic graphs](https://en.wikipedia.org/wiki/Directed_acyclic_graph).

## MVP TODO

- [x] resumable batch jobs

- [x] resumable workflows (sequences of batch jobs)

- [x] twitter:get-followers batch job

- [x] twitter:lookup-users batch job

- [x] twitter:send-direct-messages batch job

- [x] test workflow which combines these three batch jobs

- [x] test rate limits

  - twitter:get-followers 75k / 15 min

  - twitter:lookup-users 90k / 15 min

  - twitter:send-direct-messages 1k / day

  - twitter:send-tweets 300 / 3h -> 2.4k / day

- [x] large account test

- [x] gracefully handle twitter rate limits

- [x] experiment with extracting public emails

- [x] add default persistent storage

  - via [leveldb](https://github.com/Level/level)

- [x] support commiting batch job updates

- [x] user-friendly cli

- [x] add cli support for different output formats

  - via [sheetjs/xlsx](https://github.com/SheetJS/sheetjs#supported-output-formats)

  - json, csv, xls, xlsx, html, txt, etc

- [x] gracefully handle process exit

- [ ] initial set of cli commands

- [ ] cli oauth support

- [ ] unit tests for snapshotting, serializing, deserializing

- [ ] unit tests for workflows

- [ ] convert transforms to batchjob

- [ ] more dynamic rate limit handling

- [ ] support bring-your-own-api-key

- [ ] basic docs and demo video

- [ ] hosted saasify version

## License

MIT © [Saasify](https://saasify.sh)

Support my OSS work by following me on twitter