{"id":13394294,"url":"https://github.com/esbenp/pdf-bot","last_synced_at":"2025-05-14T19:07:50.299Z","repository":{"id":56562919,"uuid":"99672291","full_name":"esbenp/pdf-bot","owner":"esbenp","description":"🤖 A Node queue API for generating PDFs using headless Chrome. Comes with a CLI, S3 storage and webhooks for notifying subscribers about generated PDFs","archived":false,"fork":false,"pushed_at":"2024-03-07T17:14:23.000Z","size":94,"stargazers_count":2625,"open_issues_count":20,"forks_count":142,"subscribers_count":44,"default_branch":"master","last_synced_at":"2024-10-29T15:21:36.180Z","etag":null,"topics":["chromium","google-chrome","headless","headless-chrome","headless-chromium","html","node-js","nodejs","pdf","pdf-generation","pdf-generator"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/esbenp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-08T08:57:24.000Z","updated_at":"2024-09-20T08:26:44.000Z","dependencies_parsed_at":"2024-01-13T10:42:06.997Z","dependency_job_id":"0d7fce3f-13b8-405e-9a69-262bc2a0e6b0","html_url":"https://github.com/esbenp/pdf-bot","commit_stats":{"total_commits":60,"total_committers":6,"mean_commits":10.0,"dds":0.09999999999999998,"last_synced_commit":"516223548d044669c28b52dcefdda6f20f4113cf"},"previous_names":[],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esbenp%2Fpdf-bot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esbenp%2Fpdf-bot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esbenp%2Fpdf-bot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esbenp%2Fpdf-bot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/esbenp","download_url":"https://codeload.github.com/esbenp/pdf-bot/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248710408,"owners_count":21149185,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chromium","google-chrome","headless","headless-chrome","headless-chromium","html","node-js","nodejs","pdf","pdf-generation","pdf-generator"],"created_at":"2024-07-30T17:01:15.176Z","updated_at":"2025-04-13T11:46:35.482Z","avatar_url":"https://github.com/esbenp.png","language":"JavaScript","readme":"# 🤖 pdf-bot\n\n[![npm](https://img.shields.io/npm/v/pdf-bot.svg)](https://www.npmjs.com/package/pdf-bot) [![Build Status](https://travis-ci.org/esbenp/pdf-bot.svg?branch=master)](https://travis-ci.org/esbenp/pdf-bot) [![Coverage Status](https://coveralls.io/repos/github/esbenp/pdf-bot/badge.svg?branch=master)](https://coveralls.io/github/esbenp/pdf-bot?branch=master)\n\nEasily create a microservice for generating PDFs using headless Chrome.\n\n`pdf-bot` is installed on a server and will receive URLs to turn into PDFs through its API or CLI. `pdf-bot` will manage a queue of PDF jobs. Once a PDF job has run it will notify you using a webhook so you can fetch the API. `pdf-bot` supports storing PDFs on S3 out of the box. Failed PDF generations and Webhook pings will be retried after a configurable decaying schedule.\n\n![How to use the pdf-bot CLI](http://imgur.com/aRHye2l.gif)\n\n`pdf-bot` uses [`html-pdf-chrome`](https://github.com/westy92/html-pdf-chrome) under the hood and supports all the settings that it supports. Major thanks to [@westy92](https://github.com/westy92/html-pdf-chrome) for making this possible.\n\n## How does it work?\n\nImagine you have an app that creates invoices. You want to save those invoices as PDF. You install `pdf-bot` on a server as an API. Your app server sends the URL of the invoice to the `pdf-bot` server. A cronjob on the `pdf-bot` server keeps checking for new jobs, generates a PDF using headless Chrome and sends the location back to the application server using a webhook.\n\n## Prerequisites\n\n* Node.js v6 or later\n\n## Installation\n\n```bash\n$ npm install -g pdf-bot\n$ pdf-bot install\n```\n\n\u003e Make sure the node path is in your $PATH\n\n`pdf-bot install` will prompt for some basic configurations and then create a storage folder where your database and pdf files will be saved.\n\n### Configuration\n\n`pdf-bot` comes packaged with sensible defaults. At the very minimum you must have a config file in the same folder from which you are executing `pdf-bot` with a `storagePath` given. However, in reality what you probably want to do is use the `pdf-bot install` command to generate a configuration file and then use an alias `ALIAS pdf-bot = \"pdf-bot -c /home/pdf-bot.config.js\"`\n\n`pdf-bot.config.js`\n```js\nvar htmlPdf = require('html-pdf-chrome')\n\nmodule.exports = {\n  api: {\n    token: 'crazy-secret'\n  },\n  generator: {\n    completionTrigger: new htmlPdf.CompletionTrigger.Timer(1000) // 1 sec timeout\n  },\n  storagePath: 'storage'\n}\n```\n\n```bash\n$ pdf-bot -c ./pdf-bot.config.js push https://esbenp.github.io\n```\n\n[See a full list of the available configuration options.](#options)\n\n## Usage guide\n\n### Structure and concept\n\n`pdf-bot` is meant to be a microservice that runs a server to generate PDFs for you. That usually means you will send requests from your application server to the PDF server to request an url to be generated as a PDF. `pdf-bot` will manage a queue and retry failed generations. Once a job is successfully generated a path to it will be sent back to your application server.\n\nLet us check out the flow for an app that generates PDF invoices.\n\n```\n1. (App server): An invoice is created ----\u003e Send URL to invoice to pdf-bot server\n2. (pdf-bot server): Put the URL in the queue\n3. (pdf-bot server): PDF is generated using headless Chrome\n4. (pdf-bot server): (if failed try again using 1 min, 3 min, 10 min, 30 min, 60 min delay)\n5. (pdf-bot server): Upload PDF to storage (e.g. Amazon S3)\n6. (pdf-bot server): Send S3 location of PDF back to the app server\n7. (App server): Receive S3 location of PDF -\u003e Check signature sum matches for security\n8. (App server): Handle PDF however you see fit (move it, download it, save it etc.)\n```\n\nYou can send meta data to the `pdf-bot` server that will be sent back to the application. This can help you identify what PDF you are receiving.\n\n### Setup\n\nOn your `pdf-bot` server start by creating a config file `pdf-bot.config.js`. [You can see an example file here](https://github.com/esbenp/pdf-bot/blob/master/examples/pdf-bot.config.js)\n\n`pdf-bot.config.js`\n```js\nmodule.exports = {\n  api: {\n    port: 3000,\n    token: 'api-token'\n  },\n  storage: {\n    's3': createS3Config({\n      bucket: '',\n      accessKeyId: '',\n      region: '',\n      secretAccessKey: ''\n    })\n  },\n  webhook: {\n    secret: '1234',\n    url: 'http://localhost:3000/webhooks/pdf'\n  }\n}\n```\n\nAs a minimum you should configure an access token for your API. This will be used to authenticate jobs sent to your `pdf-bot` server. You also need to add a `webhook` configuration to have pdf notifications sent back to your application server. You should add a `secret` that will be used to generate a signature used to check that the request has not been tampered with during transfer.\n\nStart your API using\n\n`pdf-bot -c ./pdf-bot.config.js api`\n\nThis will start an [express server](http://expressjs.com) that listens for new jobs on port `3000`.\n\n#### Setting up Chrome\n\n`pdf-bot` uses [html-pdf-chrome](https://github.com/westy92/html-pdf-chrome) which in turns uses [chrome-launcher](https://github.com/GoogleChrome/lighthouse/tree/master/chrome-launcher) to launch chrome. You should check out those two resources on how to properly setup Chrome. However, with `chrome-launcher` Chrome should be started automatically. Otherwise, `html-pdf-chrome` has a small guide on how to have it running as a process using `pm2`.\n\nYou can install chrome on Ubuntu using\n\n```\nsudo apt-get update \u0026\u0026 apt-get install chromium-browser\n```\n\nIf you are testing things on OSX or similar, `chrome-launcher` should be able to find and automatically startup Chrome for you.\n\n#### Setting up the receiving API\n\nIn the [examples folder](https://github.com/esbenp/pdf-bot/blob/master/examples/receiving-api.js) there is a small example on how the application API could look. Basically, you just have to define an endpoint that will receive the webhook and check that the signature matches.\n\n```javascript\napi.post('/hook', function (req, res) {\n  var signature = req.get('X-PDF-Signature', 'sha1=')\n\n  var bodyCrypted = require('crypto')\n    .createHmac('sha1', '12345')\n    .update(JSON.stringify(req.body))\n    .digest('hex')\n\n  if (bodyCrypted !== signature) {\n    res.status(401).send()\n    return\n  }\n\n  console.log('PDF webhook received', JSON.stringify(req.body))\n\n  res.status(204).send()\n})\n```\n\n### Setup production environment\n\n[Follow the guide under `production/` to see how to setup `pdf-bot` using `pm2` and `nginx`](https://github.com/esbenp/pdf-bot/blob/master/production/README.md)\n\n### Setup crontab\n\nWe setup our crontab to continuously look for jobs that have not yet been completed.\n\n```bash\n* * * * * node $(npm bin -g)/pdf-bot -c ./pdf-bot.config.js shift:all \u003e\u003e /var/log/pdfbot.log 2\u003e\u00261\n* * * * * node $(npm bin -g)/pdf-bot -c ./pdf-bot.config.js ping:retry-failed \u003e\u003e /var/log/pdfbot.log 2\u003e\u00261\n```\n\n### Quick example using the CLI\n\nLet us assume I want to generate a PDF for `https://esbenp.github.io`. I can add the job using the `pdf-bot` CLI.\n\n```bash\n$ pdf-bot -c ./pdf-bot.config.js push https://esbenp.github.io --meta '{\"id\":1}'\n```\n\nNext, if my crontab is not setup to run it automatically I can run it using the `shift:all` command\n\n```bash\n$ pdf-bot -c ./pdf-bot.config.js shift:all\n```\n\nThis will look for the oldest uncompleted job and run it.\n\n### How can I generate PDFs for sites that use Javascript?\n\nThis is a common issue with PDF generation. Luckily, `html-pdf-chrome` has a really awesome API for dealing with Javascript. You can specify a timeout in milliseconds, wait for elements or custom events. To add a wait simply configure the `generator` key in your configuration. Below are a few examples.\n\n**Wait for 5 seconds**\n\n```javascript\nvar htmlPdf = require('html-pdf-chrome')\n\nmodule.exports = {\n  api: {\n    token: 'api-token'\n  },\n  // html-pdf-chrome options\n  generator: {\n    completionTrigger: new htmlPdf.CompletionTrigger.Timer(5000), // waits for 5 sec\n  },\n  webhook: {\n    secret: '1234',\n    url: 'http://localhost:3000/webhooks/pdf'\n  }\n}\n```\n\n**Wait for event**\n\n```javascript\nvar htmlPdf = require('html-pdf-chrome')\n\nmodule.exports = {\n  api: {\n    token: 'api-token'\n  },\n  // html-pdf-chrome options\n  generator: {\n    completionTrigger: new htmlPdf.CompletionTrigger.Event(\n      'myEvent', // name of the event to listen for\n      '#myElement', // optional DOM element CSS selector to listen on, defaults to body\n      5000 // optional timeout (milliseconds)\n    )\n  },\n  webhook: {\n    secret: '1234',\n    url: 'http://localhost:3000/webhooks/pdf'\n  }\n}\n```\n\nIn your Javascript trigger the event when rendering is complete\n\n```javascript\ndocument.getElementById('myElement').dispatchEvent(new CustomEvent('myEvent'));\n```\n\n**Wait for variable**\n\n```javascript\nvar htmlPdf = require('html-pdf-chrome')\n\nmodule.exports = {\n  api: {\n    token: 'api-token'\n  },\n  // html-pdf-chrome options\n  generator: {\n    completionTrigger: new htmlPdf.CompletionTrigger.Variable(\n      'myVarName', // optional, name of the variable to wait for.  Defaults to 'htmlPdfDone'\n      5000 // optional, timeout (milliseconds)\n    )\n  },\n  webhook: {\n    secret: '1234',\n    url: 'http://localhost:3000/webhooks/pdf'\n  }\n}\n```\n\nIn your Javascript set the variable when the rendering is complete\n\n```javascript\nwindow.myVarName = true;\n```\n\n[You can find more completion triggers in html-pdf-chrome's documentation](https://github.com/westy92/html-pdf-chrome#trigger-render-completion)\n\n## API\n\nBelow are given the endpoints that are exposed by `pdf-server`'s REST API\n\n### Push URL to queue: POST /\n\nkey | type | required | description\n--- | ---- | -------- | -----------\nurl | string | yes | The URL to generate a PDF from\nmeta | object | | Optional meta data object to send back to the webhook url\n\n#### Example\n\n```bash\ncurl -X POST -H 'Authorization: Bearer api-token' -H 'Content-Type: application/json' http://pdf-bot.com/ -d '\n  {\n    \"url\":\"https://esbenp.github.io\",\n    \"meta\":{\n      \"type\":\"invoice\",\n      \"id\":1\n    }\n  }'\n```\n\n## Database\n\n### LowDB (file-database) (default)\n\nIf you have low conurrency (run a job every now and then) you can use the default database driver that uses LowDB.\n\n```javascript\nvar LowDB = require('pdf-bot/src/db/lowdb')\n\nmodule.exports = {\n  api: {\n    token: 'api-token'\n  },\n  db: LowDB({\n    lowDbOptions: {},\n    path: '' // defaults to $storagePath/db/db.json\n  }),\n  webhook: {\n    secret: '1234',\n    url: 'http://localhost:3000/webhooks/pdf'\n  }\n}\n```\n\n### PostgreSQL\n\n```javascript\nvar pgsql = require('pdf-bot/src/db/pgsql')\n\nmodule.exports = {\n  api: {\n    token: 'api-token'\n  },\n  db: pgsql({\n    database: 'pdfbot',\n    username: 'pdfbot',\n    password: 'pdfbot',\n    port: 5432\n  }),\n  webhook: {\n    secret: '1234',\n    url: 'http://localhost:3000/webhooks/pdf'\n  }\n}\n```\n\nOptionally, you can specify a database url by specifying a `connectionString`.\n\nTo install the necessary database tables, run `db:migrate`. You can also destroy the database by running `db:destroy`.\n\n## Storage\n\nCurrently `pdf-bot` comes bundled with build-in support for storing PDFs on Amazon S3.\n\n[Feel free to contribute a PR if you want to see other storage plugins in `pdf-bot`](https://github.com/esbenp/pdf-bot/compare)!\n\n### Amazon S3\n\nTo install S3 storage add a key to the `storage` configuration. Notice, you can add as many different locations you want by giving them different keys.\n\n```javascript\nvar createS3Config = require('pdf-bot/src/storage/s3')\n\nmodule.exports = {\n  api: {\n    token: 'api-token'\n  },\n  storage: {\n    'my_s3': createS3Config({\n      bucket: '[YOUR BUCKET NAME]',\n      accessKeyId: '[YOUR ACCESS KEY ID]',\n      region: '[YOUR REGION]',\n      secretAccessKey: '[YOUR SECRET ACCESS KEY]'\n    })\n  },\n  webhook: {\n    secret: '1234',\n    url: 'http://localhost:3000/webhooks/pdf'\n  }\n}\n\n```\n\n## Options\n\n```javascript\nvar decaySchedule = [\n  1000 * 60, // 1 minute\n  1000 * 60 * 3, // 3 minutes\n  1000 * 60 * 10, // 10 minutes\n  1000 * 60 * 30, // 30 minutes\n  1000 * 60 * 60 // 1 hour\n];\n\nmodule.exports = {\n  // The settings of the API\n  api: {\n    // The port your express.js instance listens to requests from. (default: 3000)\n    port: 3000,\n    // Spawn command when a job has been pushed to the API\n    postPushCommand: ['/home/user/.npm-global/bin/pdf-bot', ['-c', './pdf-bot.config.js', 'shift:all']],\n    // The token used to validate requests to your API. Not required, but 100% recommended.\n    token: 'api-token'\n  },\n  db: LowDB(), // see other drivers under Database\n  // html-pdf-chrome\n  generator: {\n    // Triggers that specify when the PDF should be generated\n    completionTrigger: new htmlPdf.CompletionTrigger.Timer(1000), // waits for 1 sec\n    // The port to listen for Chrome (default: 9222)\n    port: 9222\n  },\n  queue: {\n    // How frequent should pdf-bot retry failed generations?\n    // (default: 1 min, 3 min, 10 min, 30 min, 60 min)\n    generationRetryStrategy: function(job, retries) {\n      return decaySchedule[retries - 1] ? decaySchedule[retries - 1] : 0\n    },\n    // How many times should pdf-bot try to generate a PDF?\n    // (default: 5)\n    generationMaxTries: 5,\n    // How many generations to run at the same time when using shift:all\n    parallelism: 4,\n    // How frequent should pdf-bot retry failed webhook pings?\n    // (default: 1 min, 3 min, 10 min, 30 min, 60 min)\n    webhookRetryStrategy: function(job, retries) {\n      return decaySchedule[retries - 1] ? decaySchedule[retries - 1] : 0\n    },\n    // How many times should pdf-bot try to ping a webhook?\n    // (default: 5)\n    webhookMaxTries: 5\n  },\n  storage: {\n    's3': createS3Config({\n      bucket: '',\n      accessKeyId: '',\n      region: '',\n      secretAccessKey: ''\n    })\n  },\n  webhook: {\n    // The prefix to add to all pdf-bot headers on the webhook response.\n    // I.e. X-PDF-Transaction and X-PDF-Signature. (default: X-PDF-)\n    headerNamespace: 'X-PDF-',\n    // Extra request options to add to the Webhook ping.\n    requestOptions: {\n\n    },\n    // The secret used to generate the hmac-sha1 signature hash.\n    // !Not required, but should definitely be included!\n    secret: '1234',\n    // The endpoint to send PDF messages to.\n    url: 'http://localhost:3000/webhooks/pdf'\n  }\n}\n```\n\n## CLI\n\n`pdf-bot` comes with a full CLI included! Use `-c` to pass a configuration to `pdf-bot`. You can also use `--help` to get a list of all commands. An example is given below.\n\n```bash\n$ pdf-bot.js --config ./examples/pdf-bot.config.js --help\n\n\n  Usage: pdf-bot [options] [command]\n\n\n  Options:\n\n    -V, --version        output the version number\n    -c, --config \u003cpath\u003e  Path to configuration file\n    -h, --help           output usage information\n\n\n  Commands:\n\n    api                   Start the API\n    db:migrate\n    db:destroy\n    install\n    generate [jobID]      Generate PDF for job\n    jobs [options]        List all completed jobs\n    ping [jobID]          Attempt to ping webhook for job\n    ping:retry-failed\n    pings [jobId]         List pings for a job\n    purge [options]       Will remove all completed jobs\n    push [options] [url]  Push new job to the queue\n    shift                 Run the next job in the queue\n    shift:all             Run all unfinished jobs in the queue\n```\n\n## Debug mode\n\n`pdf-bot` uses `debug` for debug messages. You can turn on debugging by setting the environment variable `DEBUG=pdf:*` like so\n\n```bash\nDEBUG=pdf:* pdf-bot jobs\n```\n\n## Tests\n\n```bash\n$ npm run test\n```\n\n## Issues\n\n[Please report issues to the issue tracker](https://github.com/esbenp/pdf-bot/issues/new)\n\n## License\n\nThe MIT License (MIT). Please see [License File](https://github.com/esbenp/pdf-bot/blob/master/LICENSE) for more information.\n","funding_links":[],"categories":["JavaScript","📦 Legacy \u0026 Inactive Projects","Tools","chromium"],"sub_categories":["Node"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fesbenp%2Fpdf-bot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fesbenp%2Fpdf-bot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fesbenp%2Fpdf-bot/lists"}