{"id":13441533,"url":"https://github.com/chezou/tabula-py","last_synced_at":"2025-05-12T13:20:24.665Z","repository":{"id":10973241,"uuid":"67859516","full_name":"chezou/tabula-py","owner":"chezou","description":"Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame","archived":false,"fork":false,"pushed_at":"2024-12-05T16:14:56.000Z","size":44491,"stargazers_count":2248,"open_issues_count":0,"forks_count":296,"subscribers_count":45,"default_branch":"master","last_synced_at":"2025-04-23T16:08:05.400Z","etag":null,"topics":["pandas","pdf","python","tabula","tabula-java"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chezou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"chezou","buy_me_a_coffee":"chezou"}},"created_at":"2016-09-10T08:18:37.000Z","updated_at":"2025-04-16T07:32:54.000Z","dependencies_parsed_at":"2023-02-11T22:46:35.185Z","dependency_job_id":"a4663b08-b867-4afd-928d-9f4e7ba1641f","html_url":"https://github.com/chezou/tabula-py","commit_stats":{"total_commits":364,"total_committers":21,"mean_commits":"17.333333333333332","dds":"0.11263736263736268","last_synced_commit":"f3f9550c8a7147d22508b601a9cb0f6fb7520743"},"previous_names":[],"tags_count":46,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chezou%2Ftabula-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chezou%2Ftabula-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chezou%2Ftabula-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chezou%2Ftabula-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chezou","download_url":"https://codeload.github.com/chezou/tabula-py/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253745197,"owners_count":21957320,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pandas","pdf","python","tabula","tabula-java"],"created_at":"2024-07-31T03:01:35.167Z","updated_at":"2025-05-12T13:20:24.640Z","avatar_url":"https://github.com/chezou.png","language":"Python","readme":"# tabula-py\n\n[![Build Status](https://github.com/chezou/tabula-py/actions/workflows/pythontest.yml/badge.svg)](https://github.com/chezou/tabula-py/actions/workflows/pythontest.yml)\n[![PyPI version](https://badge.fury.io/py/tabula-py.svg)](https://badge.fury.io/py/tabula-py)\n[![Documentation Status](https://readthedocs.org/projects/tabula-py/badge/?version=latest)](https://tabula-py.readthedocs.io/en/latest/?badge=latest)\n![PyPI - Downloads](https://img.shields.io/pypi/dw/tabula-py)\n[![](https://img.shields.io/badge/-Sponsor-fafbfc?logo=GitHub%20Sponsors\n)](https://github.com/sponsors/chezou)\n\n`tabula-py` is a simple Python wrapper of [tabula-java](https://github.com/tabulapdf/tabula-java), which can read tables in a PDF.\nYou can read tables from a PDF and convert them into a pandas DataFrame. tabula-py also enables you to convert a PDF file into a CSV, a TSV or a JSON file.\n\nYou can see [the example notebook](https://nbviewer.jupyter.org/github/chezou/tabula-py/blob/master/examples/tabula_example.ipynb) and try it on Google Colab, or we highly recommend reading [our documentation](https://tabula-py.readthedocs.io/en/latest/), especially the FAQ section.\n\n![tabula-py example](https://github.com/chezou/tabula-py/raw/master/example.png)\n\n## Requirements\n\n- Java 8+\n- Python 3.9+\n\n### OS\n\nI confirmed working on macOS and Ubuntu. But some people confirm it works on Windows 10. See also [the documentation for the detailed installation for Windows 10](https://tabula-py.readthedocs.io/en/latest/getting_started.html#get-tabula-py-working-windows-10).\n\n## Usage\n\n- [Documentation](https://tabula-py.readthedocs.io/en/latest/)\n  - [FAQ](https://tabula-py.readthedocs.io/en/latest/faq.html) would be helpful if you have an issue\n- [Example notebook on Google Colaboratory](https://colab.research.google.com/github/chezou/tabula-py/blob/master/examples/tabula_example.ipynb)\n\n### Install\n\nEnsure you have a Java runtime and set the PATH for it.\n\n```bash\npip install tabula-py\n```\n\nIf you want to leverage faster execution with jpype, install with `jpype` extra.\n\n```sh\npip install tabula-py[jpype]\n```\n\n### Example\n\ntabula-py enables you to extract tables from a PDF into a DataFrame, or a JSON. It can also extract tables from a PDF and save the file as a CSV, a TSV, or a JSON.  \n\n```py\nimport tabula\n\n# Read pdf into list of DataFrame\ndfs = tabula.read_pdf(\"test.pdf\", pages='all')\n\n# Read remote pdf into list of DataFrame\ndfs2 = tabula.read_pdf(\"https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf\")\n\n# convert PDF into CSV file\ntabula.convert_into(\"test.pdf\", \"output.csv\", output_format=\"csv\", pages='all')\n\n# convert all PDFs in a directory\ntabula.convert_into_by_batch(\"input_directory\", output_format='csv', pages='all')\n```\n\nSee [an example notebook](https://nbviewer.jupyter.org/github/chezou/tabula-py/blob/master/examples/tabula_example.ipynb) for more details. I also recommend reading [the tutorial article](https://aegis4048.github.io/parse-pdf-files-while-retaining-structure-with-tabula-py) written by [@aegis4048](https://github.com/aegis4048), and [another tutorial](https://www.dunderdata.com/blog/read-trapped-tables-within-pdfs-as-pandas-dataframes) written by [@tdpetrou](https://github.com/tdpetrou).\n\n### Contributing\n\nInterested in helping out? I'd love to have your help!\n\nYou can help by:\n\n- [Reporting a bug](https://github.com/chezou/tabula-py/issues).\n- Adding or editing documentation.\n- Contributing code via a Pull Request. See also [for the contribution](docs/contributing.rst)\n- Write a blog post or spread the word about `tabula-py` to people who might be able to benefit from using it.\n\n#### Contributors\n\n- [@lahoffm](https://github.com/lahoffm)\n- [@jakekara](https://github.com/jakekara)\n- [@lcd1232](https://github.com/lcd1232)\n- [@kirkholloway](https://github.com/kirkholloway)\n- [@CurtLH](https://github.com/CurtLH)\n- [@nikhilgk](https://github.com/nikhilgk)\n- [@krassowski](https://github.com/krassowski)\n- [@alexandreio](https://github.com/alexandreio)\n- [@rmnevesLH](https://github.com/rmnevesLH)\n- [@red-bin](https://github.com/red-bin)\n- [@Gallaecio](https://github.com/Gallaecio)\n- [@red-bin](https://github.com/red-bin)\n- [@alexandreio](https://github.com/alexandreio)\n- [@bpben](https://github.com/bpben)\n- [@Bueddl](https://github.com/Bueddl)\n- [@cjotade](https://github.com/cjotade)\n- [@codeboy5](https://github.com/codeboy5)\n- [@manohar-voggu](https://github.com/manohar-voggu)\n- [@deveshSingh06](https://github.com/deveshSingh06)\n- [@grfeller](https://github.com/grfeller)\n- [@djbrown](https://github.com/djbrown)\n- [@swar](https://github.com/swar)\n- [@mvoggu](https://github.com/mvoggu)\n- [@tdpetrou](https://github.com/tdpetrou)\n\n#### Another support\n\nYou can also support our continued work on `tabula-py` with a donation on GitHub Sponsors or [Patreon](https://www.patreon.com/chezou).\n","funding_links":["https://github.com/sponsors/chezou","https://buymeacoffee.com/chezou","https://www.patreon.com/chezou"],"categories":["HarmonyOS","Python","📄 Document \u0026 PDF Extraction"],"sub_categories":["Windows Manager","Ruby"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchezou%2Ftabula-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchezou%2Ftabula-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchezou%2Ftabula-py/lists"}