{"id":13665949,"url":"https://github.com/ropensci/tabulapdf","last_synced_at":"2025-12-12T00:49:26.649Z","repository":{"id":38375302,"uuid":"57394840","full_name":"ropensci/tabulapdf","owner":"ropensci","description":"Bindings for Tabula PDF Table Extractor Library","archived":false,"fork":false,"pushed_at":"2025-01-03T08:31:10.000Z","size":33924,"stargazers_count":550,"open_issues_count":74,"forks_count":71,"subscribers_count":36,"default_branch":"main","last_synced_at":"2025-01-03T09:25:41.808Z","etag":null,"topics":["java","pdf","pdf-document","peer-reviewed","r","r-package","ropensci","rstats","tabula","tabular-data"],"latest_commit_sha":null,"homepage":"https://docs.ropensci.org/tabulapdf/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ropensci.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-04-29T15:33:50.000Z","updated_at":"2025-01-03T08:31:15.000Z","dependencies_parsed_at":"2024-06-01T07:14:38.600Z","dependency_job_id":"ee478c15-0b45-4968-8470-78df5cb7f307","html_url":"https://github.com/ropensci/tabulapdf","commit_stats":{"total_commits":144,"total_committers":10,"mean_commits":14.4,"dds":"0.47916666666666663","last_synced_commit":"bfc79cb128f284eceeecc310b9b957bbfb9dd719"},"previous_names":["ropensci/tabulizer"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Ftabulapdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Ftabulapdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Ftabulapdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Ftabulapdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ropensci","download_url":"https://codeload.github.com/ropensci/tabulapdf/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250967072,"owners_count":21515526,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java","pdf","pdf-document","peer-reviewed","r","r-package","ropensci","rstats","tabula","tabular-data"],"created_at":"2024-08-02T06:00:54.609Z","updated_at":"2025-10-21T20:04:48.255Z","avatar_url":"https://github.com/ropensci.png","language":"R","funding_links":["https://buymeacoffee.com/pacha"],"categories":["R"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n```{r, echo = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"##\"\n)\n```\n\n# tabulapdf: Extract tables from PDF documents \u003cimg src=\"man/figures/logo.svg\" align=\"right\" height=\"139\" alt=\"\" /\u003e\n\n[![R-CMD-check](https://github.com/ropensci/tabulapdf/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ropensci/tabulapdf/actions/workflows/R-CMD-check.yaml)\n[![](https://badges.ropensci.org/42_status.svg)](https://github.com/ropensci/software-review/issues/42)\n[![BuyMeACoffee](https://raw.githubusercontent.com/pachadotdev/buymeacoffee-badges/main/bmc-donate-yellow.svg)](https://buymeacoffee.com/pacha)\n\n**tabulapdf** provides R bindings to the [Tabula java library](https://github.com/tabulapdf/tabula-java/), which can be used to computationaly extract tables from PDF documents.\n\nNote: tabulapdf is released under the MIT license, as is Tabula itself.\n\n## Installation\n\ntabulapdf depends on [rJava](https://cran.r-project.org/package=rJava), which\nimplies a system requirement for Java. This can be frustrating, especially on\nWindows. The preferred Windows workflow is to use\n[Chocolatey](https://chocolatey.org/) to obtain, configure, and update Java.\nYou need do this before installing rJava or attempting to use tabulapdf. More on\n[this](#installing-java-on-windows-with-chocolatey) and\n[troubleshooting](#troubleshooting) below.\n\ntabulapdf is available on CRAN, and it can also be installed from rOpenSci's\nR-Universe:\n```r\n# either\ninstall.packages(\"tabulapdf\")\n\n# or\ninstall.packages(\"tabulapdf\", repos = c(\"https://ropensci.r-universe.dev\", \"https://cloud.r-project.org\"))\n```\n\nTo install the latest development version:\n```r\nif (!require(remotes)) install.packages(\"remotes\")\n\n# on 64-bit Windows\nremotes::install_github(c(\"ropensci/tabulapdf\"), INSTALL_opts = \"--no-multiarch\")\n\n# elsewhere\nremotes::install_github(c(\"ropensci/tabulapdf\"))\n```\n\n## Code Examples\n\nThe main function, `extract_tables()` provides an R clone of the Tabula command line application:\n\n```r\nlibrary(tabulapdf)\nf \u003c- system.file(\"examples\", \"data.pdf\", package = \"tabulapdf\")\nout \u003c- extract_tables(f)\nout[[1]]\n\n# # A tibble: 32 × 11\n#      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb\n#    \u003cdbl\u003e \u003cdbl\u003e \u003cdbl\u003e \u003cdbl\u003e \u003cdbl\u003e \u003cdbl\u003e \u003cdbl\u003e \u003cdbl\u003e \u003cdbl\u003e \u003cdbl\u003e \u003cdbl\u003e\n#  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4\n#  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4\n#  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1\n#  4  21.4     6  258    110  3.08  3.21  19.4     1     0     3     1\n#  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2\n#  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1\n#  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4\n#  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2\n#  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2\n# 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4\n# # ℹ 22 more rows\n# # ℹ Use `print(n = ...)` to see more rows\n```\n\nThe vignette provides more examples and details on how to use the package.\n\n## Installing Java on Windows with Chocolatey\n\nIn Power Shell prompt, install Chocolately if you don't already have it.\n\nRun `Get-ExecutionPolicy`. If it returns `Restricted`, then run `Set-ExecutionPolicy AllSigned` or `Set-ExecutionPolicy Bypass -Scope Process`. Then, install Chocolatey by running the following command:\n\n```\nSet-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))\n```\n\nInstall java using the following command:\n\n```\nchoco install openjdk11\n```\n\nYou should now be able to safely open R, and use rJava and tabulapdf. From\nPowerShell, you should see something like this after running `java -version`:\n\n```\nOpenJDK Runtime Environment (build 11.0.22+7-post-Ubuntu-0ubuntu222.04.1)\nOpenJDK 64-Bit Server VM (build 11.0.22+7-post-Ubuntu-0ubuntu222.04.1, mixed mode, sharing)\n```\n\n## Troubleshooting\n\n### Mac OS and Linux\n\nWe tested with OpenJDK version 11. The package is configured to ask for that\nversion of Java. If you have a different version of Java installed, you may need\nto change the `JAVA_HOME` environment variable to point to the correct version.\n\nYou need to ensure that R has been installed with Java support. This can often\nbe fixed by running `R CMD javareconf` on the command line (possibly with\n`sudo`).\n\n### Windows\n\nMake sure you have permission to write to and install packages to your R\ndirectory before trying to install the package. This can be changed from\n\"Properties\" on the right-click context menu. Alternatively, you can ensure\nwrite permission by choosing \"Run as administrator\" when launching R (again,\nfrom the right-click context menu).\n\n## Debugging\n\nLoad the package like this:\n\n```r\ndevtools::load_all()\nlibname = \"/home/pacha/R/x86_64-pc-linux-gnu-library/4.4\"\npkgname = \"tabulapdf\"\nrJava::.jpackage(pkgname, jars = \"*\", lib.loc = libname)\nrJava::J(\"java.lang.System\")$setProperty(\"java.awt.headless\", \"true\")\n```\n\n## Meta\n\n* Please [report any issues or bugs](https://github.com/ropensci/tabulapdf/issues).\n* Get citation information for `tabulapdf` in R doing `citation(package = 'tabulapdf')`\n* License: Apache\n\n[![rofooter](https://ropensci.org/public_images/github_footer.png)](https://ropensci.org)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci%2Ftabulapdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fropensci%2Ftabulapdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci%2Ftabulapdf/lists"}