{"id":13542967,"url":"https://github.com/ExtractTable/ExtractTable-py","last_synced_at":"2025-04-02T12:30:57.130Z","repository":{"id":35149782,"uuid":"213262054","full_name":"ExtractTable/ExtractTable-py","owner":"ExtractTable","description":"Python library to extract tabular data from images and scanned PDFs","archived":false,"fork":false,"pushed_at":"2024-07-30T18:13:36.000Z","size":3559,"stargazers_count":273,"open_issues_count":8,"forks_count":34,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-04T21:16:16.540Z","etag":null,"topics":["extracttable","image-table-recognition","ocr","pdf-table-extract","table-extraction","tabular-data"],"latest_commit_sha":null,"homepage":"https://extracttable.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ExtractTable.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-06T23:54:29.000Z","updated_at":"2025-02-28T06:53:48.000Z","dependencies_parsed_at":"2024-11-03T09:31:52.106Z","dependency_job_id":"4ea5ca4c-2731-4a8c-bd8f-a132f1e5067d","html_url":"https://github.com/ExtractTable/ExtractTable-py","commit_stats":{"total_commits":53,"total_committers":4,"mean_commits":13.25,"dds":"0.37735849056603776","last_synced_commit":"e7ad566f2b49089ca4acb16149a985278b46a9a9"},"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExtractTable%2FExtractTable-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExtractTable%2FExtractTable-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExtractTable%2FExtractTable-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExtractTable%2FExtractTable-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ExtractTable","download_url":"https://codeload.github.com/ExtractTable/ExtractTable-py/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246815341,"owners_count":20838426,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extracttable","image-table-recognition","ocr","pdf-table-extract","table-extraction","tabular-data"],"created_at":"2024-08-01T11:00:20.555Z","updated_at":"2025-04-02T12:30:52.113Z","avatar_url":"https://github.com/ExtractTable.png","language":"Python","readme":"[![image](https://i.imgur.com/2Hihfwwg.png)](https://extracttable.com?ref=github-ET)\n\n[![image](https://img.shields.io/pypi/v/extracttable.svg?maxAge=3600)](https://pypi.org/project/extracttable/) ![image](https://img.shields.io/github/license/ExtractTable/ExtractTable-py) ![image](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue)\n  \n# Overview\n[ExtractTable](https://extracttable.com) - **API to extract tabular data from images and scanned PDFs**\n\nThe motivation is to make it easy for developers to extract tabular data from images or scanned PDF files without worrying about the table area, column coordinates, rotation et al.\n\n# Prerequisite\n\n**API Key**: All requests to ExtractTable are authorized by an API Key. [FREE credits here](https://extracttable.com/signup/trial.html). The same API Key can also be used for conversions on the browser at [Web Pro](https://extracttable.com/pro.html).\n\n\n# Installation\n\n`pip install -U ExtractTable`\n\n\n# Basic Usage\nOk, enough selling. Let the ease in coding do the talk, and the output encourages you to buy credits; put that timer on and count the LOC.\n\n\n```python\nfrom ExtractTable import ExtractTable\net_sess = ExtractTable(api_key=YOUR_API_KEY)        # Replace your VALID API Key here\nprint(et_sess.check_usage())        # Checks the API Key validity as well as shows associated plan usage \ntable_data = et_sess.process_file(filepath=Location_of_Image_with_Tables, output_format=\"df\")\n\n# To process PDF, make use of pages (\"1\", \"1,3-4\", \"all\") params in the read_pdf function\ntable_data = et_sess.process_file(filepath=Location_of_PDF_with_Tables, output_format=\"df\", pages=\"all\")\n```\n\n## Detailed Library Usage\nThe tutorial available at \u003ca href=\"https://colab.research.google.com/github/ExtractTable/ExtractTable-py/blob/master/example-code.ipynb\" target=\"_parent\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e takes you through\n\n```Markup\n1. Installation\n2. Import and check version\n3. Create Session \u0026 Validate API Key\n    3.1 Create Session with your API Key\n    3.2 Validate the Key and check the plan usage\n    3.3 Check Usage Details\n4. Trigger the extraction process\n    4.1 Accepted Input Types\n    4.2 Process an IMAGE Input\n    4.3 Process a PDF Input\n    4.4 Output options\n    4.5 Explore session objects\n5. Explore the Output\n    5.1 Output Structure\n    5.2 Output Details\n6. Make Corrections\n    6.1 Split Merged Rows\n    6.2 Split Merged Columns\n    6.3 Fix Decimal Format\n    6.4 Fix Date Format\n7. Helpful Code Snippets\n    7.1 Get text data\n    7.2 Table output to Excel\n```\n\n### Woahh, as simple as that ?!\n\nCertainly. Do you know the current ExtractTable users use it for\n- Bank Statement\n- Medical Records\n- Invoice Details\n- Tax forms\n- Tender Notices\n\nIts up to you now to explore the ways.\n\n\n# Explore\ncheck the complete server response of the latest job with `et_sess.ServerResponse.json()`\n```javascript\n{\n    \"JobStatus\": \u003cstring\u003e,                              # Status of the triggered Process  @ JOB-LEVEL\n    \"Pages\": \u003cinteger\u003e,                                 # Number of pages processed in this request @ PAGE-LEVEL\n    \"Tables\": [\u003clist of key-value objects of table\u003e     # List of all tables found @ TABLE-LEVEL\n        {\n            \"Page\": \u003cinteger\u003e,                              ## Page number in which this table is found\n            \"CharacterConfidence\": \u003cfloat\u003e,                 ## Accuracy of Characters recognized from the input-page\n            \"LayoutConfidence\": \u003cfloat\u003e,                    ## Accuracy of table layout's design decision\n            \"TableJson\": \u003cdict\u003e,                            ## Table Cell Text in key-value format with index orientation - {row#: {col#: \u003cstr\u003e}}\n            \"TableCoordinates\": \u003cdict\u003e,                     ## Top-left \u0026 Bottom-right Cell Coordinates - {row#: {col#: \u003clist(x1,y1,x2,y2)\u003e}}\n            \"TableConfidence\": \u003cdict\u003e                       ## Cell level accuracy of detected characters - {row#: {col#: \u003cfloat\u003e}}\n        },\n    {...}                                               ## ... more \"Tables\" objects\n    ],\n    \"Lines\": [\u003clist of key-value objects\u003e               # Pagewise Line details @ PAGE-LEVEL\n        {\n            \"Page\": \u003cinteger\u003e,                          # Page number in which the lines are found\n            \"CharacterConfidence\": \u003cfloat\u003e,             # Average Accuracy of all Characters recognized from the input-page\n            \"LinesArray\": [\n                \u003clist of key-value objects of line\u003e     # Ordered list of lines in this page @ LINE-LEVEL\n                {\n                    \"Line\": \u003cstr\u003e,                          ## Detected text of the complete line\n                    \"WordsArray\": [\n                        \u003clist of key-value objects\u003e         ## Word level datails in this line @ WORD-LEVEL\n                        {\n                            \"Conf\": \u003cfloat\u003e,                    ### Accuracy of recognized characters of the word\n                            \"Word\": \u003cstr\u003e,                      ### Detected text of the word\n                            \"Loc\": [x1, y1, x2, y2]             ### Top-left \u0026 Bottom-right coordinates, w.r.t the input-page width-height dimensions\n                        },\n                    {...}                                   ### More \"WordsArray\" objects\n                    ]\n                },\n            {...}                                       ## More \"LinesArray\" objects\n            ]\n        },\n    {...}                                               # More Pagewise \"Lines\" details\n    ]\n}\n```\n\n## Bug Reports\nBug reports/fixes are most welcome and greatly appreciated with API credits. For support reach us at pydevs@extracttable.com \n\n\n## License  \n  \nThis project is licensed under the Apache License 2.0, see the [LICENSE](https://github.com/extracttable/ExtractTable-py/blob/master/LICENSE) file for details.\n\n\n## Social Media\nFollow us on Social media for library updates and free credits.\n\n[![Image](https://cdn3.iconfinder.com/data/icons/socialnetworking/32/linkedin.png)](https://www.linkedin.com/company/extracttable)\n\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\n[![Image](https://abs.twimg.com/favicons/twitter.ico)](https://twitter.com/extracttable)\n","funding_links":[],"categories":["Table detection"],"sub_categories":["Form Segmentation"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FExtractTable%2FExtractTable-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FExtractTable%2FExtractTable-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FExtractTable%2FExtractTable-py/lists"}