{"id":27019246,"url":"https://github.com/jameslavin/htmlstopdf","last_synced_at":"2025-10-24T19:39:21.595Z","repository":{"id":56876829,"uuid":"2535359","full_name":"JamesLavin/HtmlsToPdf","owner":"JamesLavin","description":"Creates single PDF file from 1+ HTML pages","archived":false,"fork":false,"pushed_at":"2012-10-12T20:16:30.000Z","size":216,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-14T19:06:50.373Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JamesLavin.png","metadata":{"files":{"readme":"README.markdown","changelog":null,"contributing":null,"funding":null,"license":"license.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-10-07T21:18:14.000Z","updated_at":"2019-01-17T23:30:15.000Z","dependencies_parsed_at":"2022-08-20T11:31:05.479Z","dependency_job_id":null,"html_url":"https://github.com/JamesLavin/HtmlsToPdf","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JamesLavin%2FHtmlsToPdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JamesLavin%2FHtmlsToPdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JamesLavin%2FHtmlsToPdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JamesLavin%2FHtmlsToPdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JamesLavin","download_url":"https://codeload.github.com/JamesLavin/HtmlsToPdf/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247217203,"owners_count":20903009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-04T17:20:00.303Z","updated_at":"2025-10-24T19:39:16.560Z","avatar_url":"https://github.com/JamesLavin.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HtmlsToPdf\n\n## DESCRIPTION\n\nHtmlsToPdf enables you to package one or more (ordered) HTML pages as a PDF.\n\n## WHY?\n\nI often see multi-page websites with content I would rather have in a single PDF file for searching and offline viewing. Examples include: *The Ruby on Rails Guides* and *RSpec documentation*.\n\nViewing docs offline also reduces browser \"tab-itis,\" browser crashes, and unnecessary re-downloading of server content.\n\n## REQUIREMENTS\n\nI have run this only on Linux. It likely works on OS X. It may not work on Windows.\n\nHtmlsToPdf uses [the PDFKit gem](https://github.com/pdfkit/PDFKit/), which itself uses [the wkhtmltopdf program](http://madalgo.au.dk/~jakobt/wkhtmltoxdoc/wkhtmltopdf-0.9.9-doc.html), which uses qtwebkit.\n\nDependence chain summary: HtmlsToPdf -\u003e PDFKit -\u003e wkhtmltopdf -\u003e qtwebkit -\u003e webkit\n\nFor information on qtwebkit:\n\n- [Installing on Linux](http://trac.webkit.org/wiki/BuildingQtOnLinux)\n\n- [Installing on MacOS](http://trac.webkit.org/wiki/BuildingQtOnOSX)\n\n- [Installing on Windows](http://trac.webkit.org/wiki/BuildingQtOnWindows)\n\nFor information on wkhtmltopdf:\n\n- [Installation guide from PDFKit author](https://github.com/pdfkit/PDFKit/wiki/Installing-WKHTMLTOPDF)\n\n- [code.google.com](http://code.google.com/p/wkhtmltopdf/)\n\nFor information on PDFKit:\n\n- [Github](https://github.com/pdfkit/PDFKit)\n\n- [Railscasts](http://railscasts.com/episodes/220-pdfkit)\n\n## BASIC USAGE\n\nCreate a new HtmlsToPdf object, passing in all your configuration options. Then tell the new object to .create_pdf:\n\n    require 'rubygems'\n    require 'htmls_to_pdf'\n\n    config = {}\n    config[:urls]     = ['http://.../url1.htm', 'https://.../url2.html']\n    config[:savedir]  = '~/my/savedir'\n    config[:savename] = 'Name_to_save_file_as.pdf'\n    config[:css]      = ['http://www.example.com/css_file.css',\n                         'h1 {color: red; margin: 10px 5px;} p {color: blue; border: 1px solid green; font-size: 80%;}']\n\n    HtmlsToPdf.new(config).create_pdf\n\n    (Alternatively, you can set configuration options by calling setters on an HtmlsToPdf instance, e.g.: h2p = HtmlsToPdf.new({}); h2p.savedir = '~/my/savedir') \n\n## OPTIONS\n\n`config[:css]` takes an array of CSS file URLs and/or valid CSS strings (you can mix URLs and CSS strings within an array) to apply during PDF rendering. (If you have just one CSS URL/string, you can pass it without an array.)\n\n`config[:debug]` (default: false) determines whether the program outputs verbose information while processing create_pdf()\n\n`config[:overwrite_existing_pdf]` (default: false) determines whether the program can overwrite a previously generated PDF file\n\n`config[:options]` takes a hash of options that are passed through to PdfKit\n\n`config[:remove_css_files]` (default: true) determines whether CSS files used to generate the PDF file are deleted or retained. You probably want to set this to false if you want to modify the CSS file(s).\n\n`config[:remove_html_files]` (default: true) determines whether HTML files downloaded from websites and used to generate the PDF file are deleted or retained. You probably want to set this to false if you think you may want to regenerate the PDF again, perhaps because you're tweaking the CSS file to adjust rendering.\n\n`config[:remove_tmp_pdf_files]` (default: true) determines whether temporary PDF files (one per HTML file) created during the PDF generation process are deleted or retained. You probably want to accept the default and always regenerate the temporary PDFs.\n\n`config[:remove_temp_files]` (default: false) sets `:remove_css_files`, `:remove_html_files`, and `:remove_tmp_pdf_files` all to true\n\n## EXAMPLES\n\nYou will find 20 example scripts in the /examples directory. Each creates a PDF from a website:\n\n- [The 12 Factor App](http://www.12factor.net) (Adam Wiggins)\n- [Advanced Rails - Five-Day](http://tutorials.jumpstartlab.com/paths/advanced_rails_five_day.html) (Jumpstart Labs)\n- [Backbone Fundamentals](https://github.com/addyosmani/backbone-fundamentals/blob/master/book.md) (Addy Osmani)\n- [Bash Guide](http://mywiki.wooledge.org/BashGuide) (Greg Wooledge)\n- [Coffeescript Meet Backbone.js](http://adamjspooner.github.com/coffeescript-meet-backbonejs/) (Adam J. Spooner)\n- [Coffeescript Cookbook](http://coffeescriptcookbook.com) ([Various authors](http://coffeescriptcookbook.com/authors))\n- [Coffeescript official documentation](http://coffeescript.org/)\n- [Exploring Coffeescript](http://elegantcode.com/2011/08/09/exploring-coffeescript-part-6-show-me-the-goodies/) (ElegantCode.com)\n- [Jasmine Wiki](https://github.com/pivotal/jasmine/wiki/) (Pivotal Labs)\n- [The Little Book on Coffeescript](http://arcturo.github.com/library/coffeescript/) (Alex MacCaw)\n- [Natural Language Processing for the Working Programmer](nlpwp.org/book/) (Daniël de Kok)\n- [Learn Python the Hard Way](http://learnpythonthehardway.org) (Zed A. Shaw)\n- [Practicing Ruby Vol 2](http://community.mendicantuniversity.org/articles/practicing-ruby-volume-2-now-freely-avai) (Gregory Brown)\n- [The Python Tutorial](http://docs.python.org/tutorial/index.html)\n- Rails 3.1 release notes\n- [Ruby on Rails Guides](http://guides.rubyonrails.org)\n- [RSpec-Rails documentation](https://www.relishapp.com/rspec/rspec-rails/docs)\n- [RSpec documentation](https://www.relishapp.com/rspec/rspec-rails/docs)\n- [Learn Ruby the Hard Way](http://ruby.learncodethehardway.org) (Zed A. Shaw)\n- [RubyGems User Guide](http://docs.rubygems.org/read/book/1)\n\nAfter you install HtmlsToPdf and its dependencies, you can write an ordinary Ruby script to save multiple ordered HTML pages as a single PDF.\n\n### EXAMPLE 1: Single HTML page without CSS, with debugging\n\nAnnotated version of /examples/get\\_rails\\_3\\_1\\_release\\_notes.rb:\n\n    # require the gem\n    require 'rubygems'\n    require 'htmls_to_pdf'\n\n    # Get 'Rails 3.1 Release Notes' as pdf file\n    # Source: 'http://guides.rubyonrails.org/3_1_release_notes.html'\n\n    # create an empty hash to hold your configuration options\n    config = {}\n    config[:urls] = ['http://guides.rubyonrails.org/3_1_release_notes.html']\n\n    # enable verbose messages during PDF creation process\n    config[:debug] = true\n\n    # set a :savedir key with a string value indicating the directory to create\n    # your PDF file in. If the directory does not exist, it will be created\n    config[:savedir] = '~/Tech/Rails/3.1'\n\n    # set a :savename key with a string value indicating the name of the PDF file\n    config[:savename] = 'Rails_3.1_Release_Notes.pdf'\n\n    # create a new HtmlsToPdf object, passing in your hash, and then call create_pdf\n    # on the new object\n    HtmlsToPdf.new(config).create_pdf\n\n### EXAMPLE 2: Multiple HTML pages without CSS\n\nAnnotated version of /examples/get\\_rubygems\\_user\\_guide.rb:\n\n    # require the gem\n    require 'rubygems'\n    require 'htmls_to_pdf'\n\n    # Get 'RubyGems User Guide' as pdf file\n    # Source: 'http://docs.rubygems.org/read/book/1'\n\n    # create an empty hash to hold your configuration options\n    config = {}\n\n    # set a :urls key with a value of an array containing all the \n    # urls you want in your PDF (in the order you want them)\n    config[:urls] = ['http://docs.rubygems.org/read/book/1']\n    # I have no idea why these chapters are numbered as they are!\n    [1,2,3,4,16,7,5,6,21].each do |val|\n      config[:urls] \u003c\u003c 'http://docs.rubygems.org/read/chapter/' + val.to_s\n    end\n\n    # set a :savedir key with a string value indicating the directory to create\n    # your PDF file in. If the directory does not exist, it will be created\n    config[:savedir] = '~/Tech/Ruby/GEMS/DOCUMENTATION'\n\n    # set a :savename key with a string value indicating the name of the PDF file\n    config[:savename] = 'RubyGems_User_Guide.pdf'\n\n    # create a new HtmlsToPdf object, passing in your hash, and then call create_pdf\n    # on the new object\n    HtmlsToPdf.new(config).create_pdf\n\n### EXAMPLE 3: Multiple HTML pages with CSS \u0026 PdfKit formatting options\n\nAnnotated version of /examples/get\\_coffeescript\\_meet\\_backbone.rb:\n\n    require 'rubygems'\n    require 'htmls_to_pdf'\n\n    # Get 'CoffeeScript, Meet Backbone.js' as pdf file\n    # Source: 'http://adamjspooner.github.com/coffeescript-meet-backbonejs/'\n\n    config = {}\n    config[:urls] = ['http://adamjspooner.github.com/coffeescript-meet-backbonejs/']\n    (1..5).each do |val|\n      config[:urls] \u003c\u003c 'http://adamjspooner.github.com/coffeescript-meet-backbonejs/0' + val.to_s + '/docs/script.html'\n    end\n    config[:savedir] = '~/Tech/Javascript/COFFEESCRIPT/BACKBONE.JS'\n    config[:savename] = 'CoffeeScript_Meet_Backbone.js.pdf'\n\n    # If a :css key is given with an array value, the CSS files in the array will be used to generate\n    # the PDF document. This allows you to modify the CSS file(s) to, for example, hide HTML headers,\n    # sidebars and footers you do not wish to appear in your PDF.\n    config[:css] = ['http://adamjspooner.github.com/coffeescript-meet-backbonejs/05/docs/docco.css']\n\n    # If a :options key is passed with a hash value, that hash will be passed to wkhtmltopdf.\n    # Many options are available through wkhtmltopdf; see: [the wkhtmltopdf documentation](http://madalgo.au.dk/~jakobt/wkhtmltoxdoc/wkhtmltopdf-0.9.9-doc.html).\n    config[:options] = {:page_size =\u003e 'Letter', :orientation =\u003e 'Landscape'}\n\n    HtmlsToPdf.new(config).create_pdf\n\n### EXAMPLE 4: Multiple HTML pages with hand-modified CSS file to adjust rendering\n\nAnnotated version of /examples/get\\_ruby\\_core\\_docs.rb:\n\n    require 'rubygems'\n    require 'htmls_to_pdf'\n\n    # Get 'Ruby Core documentation' as pdf file\n    # Source: 'http://www.ruby-doc.org/core-1.9.3/'\n\n    config = {}\n\n    config[:urls] = %w(\n    ARGF.html\n    ArgumentError.html\n    Array.html\n    BasicObject.html\n    ...\n    ZeroDivisionError.html\n    fatal.html)\n\n    config[:urls] = config[:urls].map { |u| 'http://www.ruby-doc.org/core-1.9.3/' + u }\n    config[:savedir] = '~/Tech/Ruby/DOCUMENTATION'\n    config[:savename] = 'Ruby_Core_docs.pdf'\n\n    # Specify a CSS file\n    config[:css] = 'http://www.ruby-doc.org/core-1.9.3/css/obf.css'\n\n    # Tell HtmlsToPdf not to remove the CSS file\n    config[:remove_css_files] = false\n\n    # You are now free to create a \"obf.css\" file in the directory\n    # and edit it however you choose. It will not be overwritten.\n    # (Alternatively, you can run the program once and then modify\n    # the downloaded CSS file.)\n    #\n    # I added the following to the CSS file to suppress unwanted output:\n    #\n    # .info, noscript, #footer, #metadata, #actionbar, .dsq-brlink {\n    #   display: none;\n    #   width: 0;\n    # }\n    # .class #documentation, .file #documentation, .module #documentation {\n    #   margin: 2em 1em 5em 1em;\n    # }\n    #\n    # If you're playing around with CSS to optimize the display in your\n    # PDF, I recommend you set config[:remove_html_files] = false to\n    # avoid repeatedly downloading the HTML files from the server.\n\n    HtmlsToPdf.new(config).create_pdf\n\n### EXAMPLE 5: Using CSS string to remove unwanted cruft\n\nAbbreviated version of /examples/get\\_jasmine\\_wiki.rb:\n\n    # When I tried to create this PDF, lots of unwanted formatting (headers, footers, etc.) appeared in the PDF.\n\n    # When this happens, I tell the HtmlsToPdf instance to NOT re-download the content each time:\n    config[:remove_css_files] = false\n    config[:remove_html_files] = false\n    config[:overwrite_existing_pdf] = true\n\n    # And then I start building up a CSS string I pass into config[:css] that suppresses the unwanted output:\n    config[:css] = 'div#header{display:none;} ul.tabs{display:none;} div#logo-popup{display:none;} div#footer{display:none;} div#markdown-help{display:none;} div.pagehead{display:none;} ul.wiki-actions{display:none;} div#keyboard_shortcuts_pane{display:none;} div.js-hidden-pane{display:none;} div#ajax-error-message{display:none;}'\n\n## LEGAL DISCLAIMER\n\nPlease use at your own risk. I guarantee nothing about this program.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjameslavin%2Fhtmlstopdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjameslavin%2Fhtmlstopdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjameslavin%2Fhtmlstopdf/lists"}