{"id":17309827,"url":"https://github.com/bobld/tabula-sharp","last_synced_at":"2025-05-15T13:05:59.433Z","repository":{"id":39103202,"uuid":"293806145","full_name":"BobLd/tabula-sharp","owner":"BobLd","description":" Extract tables from PDF files (port of tabula-java)","archived":false,"fork":false,"pushed_at":"2025-03-17T20:00:14.000Z","size":9780,"stargazers_count":177,"open_issues_count":13,"forks_count":27,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-05-15T13:05:57.735Z","etag":null,"topics":["csharp","dotnet","extract","extract-table","extracting-tables","extraction","extraction-engine","netstandard","pdf-table-extract","pdf-table-extraction","pdfparser","pdfpig","pdfs","table","table-extraction","tabula","tabula-java","tabula-sharp"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BobLd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-09-08T12:38:41.000Z","updated_at":"2025-05-10T08:16:55.000Z","dependencies_parsed_at":"2022-08-09T10:31:18.021Z","dependency_job_id":"ab2309f9-1819-4b86-a666-d1a1f4bebd61","html_url":"https://github.com/BobLd/tabula-sharp","commit_stats":{"total_commits":178,"total_committers":1,"mean_commits":178.0,"dds":0.0,"last_synced_commit":"39391431a7ae241cb8560197abafa0361b8e412f"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BobLd%2Ftabula-sharp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BobLd%2Ftabula-sharp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BobLd%2Ftabula-sharp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BobLd%2Ftabula-sharp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BobLd","download_url":"https://codeload.github.com/BobLd/tabula-sharp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254346624,"owners_count":22055808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csharp","dotnet","extract","extract-table","extracting-tables","extraction","extraction-engine","netstandard","pdf-table-extract","pdf-table-extraction","pdfparser","pdfpig","pdfs","table","table-extraction","tabula","tabula-java","tabula-sharp"],"created_at":"2024-10-15T12:32:55.383Z","updated_at":"2025-05-15T13:05:59.407Z","avatar_url":"https://github.com/BobLd.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# tabula-sharp\n`tabula-sharp` is a library for extracting tables from PDF files — it is a port of [tabula-java](https://github.com/tabulapdf/tabula-java)\n\n![Windows](https://github.com/BobLd/tabula-sharp/workflows/Windows/badge.svg)\n![Linux](https://github.com/BobLd/tabula-sharp/workflows/Linux/badge.svg)\n![Mac OS](https://github.com/BobLd/tabula-sharp/workflows/Mac%20OS/badge.svg)\n\n- Supports netstandard2.0, net462, net471, net6.0, net8.0\n- No java bindings\n\nNuGet packages available on the [releases](https://github.com/BobLd/tabula-sharp/releases) page and on www.nuget.org:\n- [Tabula](https://www.nuget.org/packages/Tabula)\n- [Tabula.Json](https://www.nuget.org/packages/Tabula.Json)\n- [Tabula.Csv](https://www.nuget.org/packages/Tabula.Csv)\n\n## Differences with tabula-java\n- Uses [PdfPig](https://github.com/UglyToad/PdfPig), and not PdfBox.\n- Coordinate system starts from the bottom left point (going up) of the page, and not from the top left point (going down).\n- The `NurminenDetectionAlgorithm` is replaced by `SimpleNurminenDetectionAlgorithm`, because it requieres an image management library.\n- Table results might be different because of the way PdfPig builds Letters bounding box.\n\n# Usage\n## Stream mode - BasicExtractionAlgorithm\n```csharp\nusing (PdfDocument document = PdfDocument.Open(\"doc.pdf\", new ParsingOptions() { ClipPaths = true }))\n{\n\tPageArea page = ObjectExtractor.Extract(document, 1);\n\t\n\t// detect canditate table zones\n\tSimpleNurminenDetectionAlgorithm detector = new SimpleNurminenDetectionAlgorithm();\n\tvar regions = detector.Detect(page);\n\t\n\tIExtractionAlgorithm ea = new BasicExtractionAlgorithm();\n\tIReadOnlyList\u003cTable\u003e tables = ea.Extract(page.GetArea(regions[0].BoundingBox)); // take first candidate area\n\tvar table = tables[0];\n\tvar rows = table.Rows;\n}\n```\n## Lattice mode - SpreadsheetExtractionAlgorithm\n```csharp\nusing (PdfDocument document = PdfDocument.Open(\"doc.pdf\", new ParsingOptions() { ClipPaths = true }))\n{\n\tPageArea page = ObjectExtractor.Extract(document, 1);\n\n\tIExtractionAlgorithm ea = new SpreadsheetExtractionAlgorithm();\n\tIReadOnlyList\u003cTable\u003e tables = ea.Extract(page);\n\tvar table = tables[0];\n\tvar rows = table.Rows;\n}\n```\n\n# Results\n## Stream mode - BasicExtractionAlgorithm\n![example](images/stream-us-018.png)\n## Lattice mode - SpreadsheetExtractionAlgorithm\n![example](images/lattice-eu-004.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbobld%2Ftabula-sharp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbobld%2Ftabula-sharp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbobld%2Ftabula-sharp/lists"}