{"id":13534433,"url":"https://github.com/camelot-dev/camelot","last_synced_at":"2025-05-14T22:06:20.903Z","repository":{"id":37335948,"uuid":"194679925","full_name":"camelot-dev/camelot","owner":"camelot-dev","description":"A Python library to extract tabular data from PDFs","archived":false,"fork":false,"pushed_at":"2025-05-06T21:47:37.000Z","size":21544,"stargazers_count":3282,"open_issues_count":219,"forks_count":491,"subscribers_count":45,"default_branch":"master","last_synced_at":"2025-05-07T21:58:31.321Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://camelot-py.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/camelot-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"open_collective":"camelot"}},"created_at":"2019-07-01T13:39:33.000Z","updated_at":"2025-05-07T08:48:28.000Z","dependencies_parsed_at":"2023-12-15T10:01:50.389Z","dependency_job_id":"9d93e19e-26ea-401f-a968-a80e22ca2781","html_url":"https://github.com/camelot-dev/camelot","commit_stats":{"total_commits":1002,"total_committers":68,"mean_commits":"14.735294117647058","dds":0.501996007984032,"last_synced_commit":"b15be195356351be8b23b00d5fdaf1549488463e"},"previous_names":[],"tags_count":26,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/camelot-dev%2Fcamelot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/camelot-dev%2Fcamelot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/camelot-dev%2Fcamelot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/camelot-dev%2Fcamelot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/camelot-dev","download_url":"https://codeload.github.com/camelot-dev/camelot/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254235694,"owners_count":22036963,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T07:01:32.938Z","updated_at":"2025-05-14T22:06:15.877Z","avatar_url":"https://github.com/camelot-dev.png","language":"Python","funding_links":["https://opencollective.com/camelot"],"categories":["Python","📄 Document \u0026 PDF Extraction","📦 Additional Python Libraries","Data Loading \u0026 Extraction"],"sub_categories":["Ruby","Documentation \u0026 File Processing"],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/camelot-dev/camelot/master/docs/_static/camelot.png\" width=\"200\"\u003e\n\u003c/p\u003e\n\n# Camelot: PDF Table Extraction for Humans\n\n[![tests](https://github.com/camelot-dev/camelot/actions/workflows/tests.yml/badge.svg)](https://github.com/camelot-dev/camelot/actions/workflows/tests.yml) [![Documentation Status](https://readthedocs.org/projects/camelot-py/badge/?version=master)](https://camelot-py.readthedocs.io/en/master/)\n[![codecov.io](https://codecov.io/github/camelot-dev/camelot/badge.svg?branch=master\u0026service=github)](https://codecov.io/github/camelot-dev/camelot?branch=master)\n[![image](https://img.shields.io/pypi/v/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![image](https://img.shields.io/pypi/l/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![image](https://img.shields.io/pypi/pyversions/camelot-py.svg)](https://pypi.org/project/camelot-py/)\n\n**Camelot** is a Python library that can help you extract tables from PDFs.\n\n---\n\n**Extract tables from PDFs in just a few lines of code:**\n\nTry it yourself in our interactive quickstart notebook. [![image](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/camelot-dev/camelot/blob/master/examples/camelot-quickstart-notebook.ipynb)\n\nOr check out a simple example using [this pdf](https://github.com/camelot-dev/camelot/blob/main/docs/_static/pdf/foo.pdf).\n\n\u003cpre\u003e\n\u003e\u003e\u003e import camelot\n\u003e\u003e\u003e tables = camelot.read_pdf('foo.pdf')\n\u003e\u003e\u003e tables\n\u0026lt;TableList n=1\u0026gt;\n\u003e\u003e\u003e tables.export('foo.csv', f='csv', compress=True) # json, excel, html, markdown, sqlite\n\u003e\u003e\u003e tables[0]\n\u0026lt;Table shape=(7, 7)\u0026gt;\n\u003e\u003e\u003e tables[0].parsing_report\n{\n    'accuracy': 99.02,\n    'whitespace': 12.24,\n    'order': 1,\n    'page': 1\n}\n\u003e\u003e\u003e tables[0].to_csv('foo.csv') # to_json, to_excel, to_html, to_markdown, to_sqlite\n\u003e\u003e\u003e tables[0].df # get a pandas DataFrame!\n\u003c/pre\u003e\n\n| Cycle Name | KI (1/km) | Distance (mi) | Percent Fuel Savings |                 |                 |                |\n| ---------- | --------- | ------------- | -------------------- | --------------- | --------------- | -------------- |\n|            |           |               | Improved Speed       | Decreased Accel | Eliminate Stops | Decreased Idle |\n| 2012_2     | 3.30      | 1.3           | 5.9%                 | 9.5%            | 29.2%           | 17.4%          |\n| 2145_1     | 0.68      | 11.2          | 2.4%                 | 0.1%            | 9.5%            | 2.7%           |\n| 4234_1     | 0.59      | 58.7          | 8.5%                 | 1.3%            | 8.5%            | 3.3%           |\n| 2032_2     | 0.17      | 57.8          | 21.7%                | 0.3%            | 2.7%            | 1.2%           |\n| 4171_1     | 0.07      | 173.9         | 58.1%                | 1.6%            | 2.1%            | 0.5%           |\n\nCamelot also comes packaged with a [command-line interface](https://camelot-py.readthedocs.io/en/latest/user/cli.html)!\n\nRefer to the [QuickStart Guide](https://github.com/camelot-dev/camelot/blob/main/docs/user/quickstart.rst#quickstart) to quickly get started with Camelot, extract tables from PDFs and explore some basic options.\n\n**Tip:** Visit the `parser-comparison-notebook` to get an overview of all the packed parsers and their features. [![image](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/camelot-dev/camelot/blob/master/examples/parser-comparison-notebook.ipynb)\n\n**Note:** Camelot only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), \"If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based\".)\n\nYou can check out some frequently asked questions [here](https://camelot-py.readthedocs.io/en/latest/user/faq.html).\n\n## Why Camelot?\n\n- **Configurability**: Camelot gives you control over the table extraction process with [tweakable settings](https://camelot-py.readthedocs.io/en/latest/user/advanced.html).\n- **Metrics**: You can discard bad tables based on metrics like accuracy and whitespace, without having to manually look at each table.\n- **Output**: Each table is extracted into a **pandas DataFrame**, which seamlessly integrates into [ETL and data analysis workflows](https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873). You can also export tables to multiple formats, which include CSV, JSON, Excel, HTML, Markdown, and Sqlite.\n\nSee [comparison with similar libraries and tools](https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).\n\n## Installation\n\n### Using conda\n\nThe easiest way to install Camelot is with [conda](https://conda.io/docs/), which is a package manager and environment management system for the [Anaconda](http://docs.continuum.io/anaconda/) distribution.\n\n```bash\nconda install -c conda-forge camelot-py\n```\n\n### Using pip\n\nAfter [installing the dependencies](https://camelot-py.readthedocs.io/en/latest/user/install-deps.html) ([tk](https://packages.ubuntu.com/bionic/python/python-tk) and [ghostscript](https://www.ghostscript.com/)), you can also just use pip to install Camelot:\n\n```bash\npip install \"camelot-py[base]\"\n```\n\n### From the source code\n\nAfter [installing the dependencies](https://camelot-py.readthedocs.io/en/latest/user/install.html#using-pip), clone the repo using:\n\n```bash\ngit clone https://github.com/camelot-dev/camelot.git\n```\n\nand install using pip:\n\n```\ncd camelot\npip install \".\"\n```\n\n## Documentation\n\nThe documentation is available at [http://camelot-py.readthedocs.io/](http://camelot-py.readthedocs.io/).\n\n## Wrappers\n\n- [camelot-php](https://github.com/randomstate/camelot-php) provides a [PHP](https://www.php.net/) wrapper on Camelot.\n\n## Related projects\n\n- [camelot-sharp](https://github.com/BobLd/camelot-sharp) provides a C sharp implementation of Camelot.\n\n## Contributing\n\nThe [Contributor's Guide](https://camelot-py.readthedocs.io/en/latest/dev/contributing.html) has detailed information about contributing issues, documentation, code, and tests.\n\n## Versioning\n\nCamelot uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out the [releases](https://github.com/camelot-dev/camelot/releases) page.\n\n## License\n\nThis project is licensed under the MIT License, see the [LICENSE](https://github.com/camelot-dev/camelot/blob/main/LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamelot-dev%2Fcamelot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcamelot-dev%2Fcamelot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamelot-dev%2Fcamelot/lists"}