{"id":13425331,"url":"https://github.com/camelot-dev/excalibur","last_synced_at":"2025-03-15T19:33:14.410Z","repository":{"id":40625771,"uuid":"153899105","full_name":"camelot-dev/excalibur","owner":"camelot-dev","description":"A web interface to extract tabular data from PDFs","archived":false,"fork":false,"pushed_at":"2023-07-15T11:04:03.000Z","size":18656,"stargazers_count":1457,"open_issues_count":106,"forks_count":219,"subscribers_count":38,"default_branch":"master","last_synced_at":"2024-04-14T01:00:38.601Z","etag":null,"topics":["extract","for-humans","pdf","table"],"latest_commit_sha":null,"homepage":"https://excalibur-py.readthedocs.io","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/camelot-dev.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null},"funding":{"open_collective":"excalibur"}},"created_at":"2018-10-20T11:34:49.000Z","updated_at":"2024-04-12T14:45:01.000Z","dependencies_parsed_at":"2023-09-29T08:51:35.092Z","dependency_job_id":null,"html_url":"https://github.com/camelot-dev/excalibur","commit_stats":{"total_commits":195,"total_committers":11,"mean_commits":"17.727272727272727","dds":"0.16410256410256407","last_synced_commit":"2a8e6cabfe8fa1fb2265da0756cf0130d5e52025"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/camelot-dev%2Fexcalibur","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/camelot-dev%2Fexcalibur/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/camelot-dev%2Fexcalibur/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/camelot-dev%2Fexcalibur/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/camelot-dev","download_url":"https://codeload.github.com/camelot-dev/excalibur/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242910949,"owners_count":20205401,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extract","for-humans","pdf","table"],"created_at":"2024-07-31T00:01:10.190Z","updated_at":"2025-03-15T19:33:14.372Z","avatar_url":"https://github.com/camelot-dev.png","language":"HTML","readme":"\u003cp align=\"center\"\u003e\n   \u003cimg src=\"https://raw.githubusercontent.com/camelot-dev/excalibur/master/docs/_static/excalibur-logo.png\" width=\"200\"\u003e\n\u003c/p\u003e\n\n# Excalibur: A web interface to extract tabular data from PDFs\n\n[![Documentation Status](https://readthedocs.org/projects/excalibur-py/badge/?version=master)](https://excalibur-py.readthedocs.io/en/master/) [![image](https://img.shields.io/pypi/v/excalibur-py.svg)](https://pypi.org/project/excalibur-py/) [![image](https://img.shields.io/pypi/l/excalibur-py.svg)](https://pypi.org/project/excalibur-py/) [![image](https://img.shields.io/pypi/pyversions/excalibur-py.svg)](https://pypi.org/project/excalibur-py/) [![Gitter chat](https://badges.gitter.im/camelot-dev/Lobby.png)](https://gitter.im/camelot-dev/Lobby) [![image](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black) [![image](https://img.shields.io/badge/continous%20quality-deepsource-lightgrey)](https://deepsource.io/gh/camelot-dev/excalibur/?ref=repository-badge)\n\n**Excalibur** is a web interface to extract tabular data from PDFs, written in **Python 3**! It is powered by [Camelot](https://camelot-py.readthedocs.io/).\n\n**Note:** Excalibur only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), \"If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based\".)\n\n## Using Excalibur\n\n**Note:** You need to [install ghostscript](https://camelot-py.readthedocs.io/en/master/user/install-deps.html) before moving forward.\n\nAfter [installing Excalibur with pip](https://excalibur-py.readthedocs.io/en/master/user/install.html), you need to initialize the metadata database using:\n\n\u003cpre\u003e\n$ excalibur initdb\n\u003c/pre\u003e\n\nAnd then start the webserver using:\n\n\u003cpre\u003e\n$ excalibur webserver\n\u003c/pre\u003e\n\nThat's it! Now you can go to http://localhost:5000 and start extracting tabular data from your PDFs.\n\n\n1. **Upload** a PDF and enter the page numbers you want to extract tables from.\n\n2. Go to each page and select the table by drawing a box around it. (You can choose to skip this step since Excalibur can automatically detect tables on its own. Click on \"**Autodetect tables**\" to see what Excalibur sees.)\n\n3. Choose a flavor (Lattice or Stream) from \"**Advanced**\".\n\n    a. **Lattice**: For tables formed with lines.\n\n    b. **Stream**: For tables formed with whitespaces.\n\n4. Click on \"**View and download data**\" to see the extracted tables.\n\n5. Select your favorite format (CSV/Excel/JSON/HTML) and click on \"**Download**\"!\n\n**Note:** You can also download executables for Windows and Linux from the [releases page](https://github.com/camelot-dev/excalibur/releases) and run them directly!\n\n![usage.gif](https://excalibur-py.readthedocs.io/en/master/_images/usage.gif)\n\n## Why Excalibur?\n\n- Extracting tables from PDFs is hard. A simple copy-and-paste from a PDF into an Excel doesn't preserve table structure. **Excalibur makes PDF table extraction very easy**, by automatically detecting tables in PDFs and letting you save them into CSVs and Excel files.\n- Excalibur uses [Camelot](https://camelot-py.readthedocs.io/) under the hood, which gives you additional settings to tweak table extraction and get the best results. You can see how it performs better than other open-source tools and libraries [in this comparison](https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).\n- You can save table extraction [settings](https://excalibur-py.readthedocs.io/en/master/user/faq.html#faq) (like table areas) for a PDF once, and apply them on new PDFs to extract tables with similar structures.\n- You get complete control over your data. All file storage and processing happens on your own local or remote machine.\n- Excalibur can be configured with MySQL and Celery for parallel and distributed workloads. By default, sqlite and multiprocessing are used for sequential workloads.\n\n## Installation\n\n### Using pip\n\nAfter installing [ghostscript](https://www.ghostscript.com/), which is one of the requirements for Camelot (See [install instructions](https://camelot-py.readthedocs.io/en/master/user/install-deps.html)), you can simply use pip to install Excalibur:\n\n\u003cpre\u003e\n$ pip install excalibur-py\n\u003c/pre\u003e\n\n### From the source code\n\nAfter installing ghostscript, clone the repo using:\n\n\u003cpre\u003e\n$ git clone https://www.github.com/camelot-dev/excalibur\n\u003c/pre\u003e\n\nand install Excalibur using pip:\n\n\u003cpre\u003e\n$ cd excalibur\n$ pip install .\n\u003c/pre\u003e\n\n## Documentation\n\nFantastic documentation is available at [http://excalibur-py.readthedocs.io/](http://excalibur-py.readthedocs.io/).\n\n## Development\n\nThe [Contributor's Guide](https://excalibur-py.readthedocs.io/en/master/dev/contributing.html) has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.\n\n### Source code\n\nYou can check the latest sources with:\n\n\u003cpre\u003e\n$ git clone https://www.github.com/camelot-dev/excalibur\n\u003c/pre\u003e\n\n### Setting up a development environment\n\nYou can install the development dependencies easily, using pip:\n\n\u003cpre\u003e\n$ pip install excalibur-py[dev]\n\u003c/pre\u003e\n\n### Testing (soon)\n\nAfter installation, you can run tests using:\n\n\u003cpre\u003e\n$ python setup.py test\n\u003c/pre\u003e\n\n## Versioning\n\nExcalibur uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out [HISTORY.md](https://github.com/camelot-dev/excalibur/blob/master/HISTORY.md).\n\n## License\n\nThis project is licensed under the MIT License, see the [LICENSE](https://github.com/camelot-dev/excalibur/blob/master/LICENSE) file for details.\n\n## Support the development\n\nYou can support our work on Excalibur with a one-time or monthly donation [on OpenCollective](https://opencollective.com/excalibur). Organizations who use Excalibur can also sponsor the project for an acknowledgement on [our official site](https://www.tryexcalibur.com/) and this README.\n\nSpecial thanks to all the users and organizations that support Excalibur!\n\n\u003ca href=\"https://opencollective.com/excalibur/backer/0/website\" target=\"_blank\"\u003e\u003cimg src=\"https://opencollective.com/excalibur/backer/0/avatar.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/excalibur/sponsor/0/website\" target=\"_blank\"\u003e\u003cimg src=\"https://opencollective.com/excalibur/sponsor/0/avatar.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/excalibur/backer/1/website\" target=\"_blank\"\u003e\u003cimg src=\"https://opencollective.com/excalibur/backer/1/avatar.svg\"\u003e\u003c/a\u003e\n","funding_links":["https://opencollective.com/excalibur","https://opencollective.com/excalibur/backer/0/website","https://opencollective.com/excalibur/backer/1/website"],"categories":["HTML","\u003ca id=\"tag-productivity\" href=\"#tag-productivity\"\u003eProductivity\u003c/a\u003e","Python","Data Loading \u0026 Extraction"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamelot-dev%2Fexcalibur","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcamelot-dev%2Fexcalibur","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamelot-dev%2Fexcalibur/lists"}