{"id":17309872,"url":"https://github.com/bobld/camelot-sharp","last_synced_at":"2025-06-14T13:36:06.301Z","repository":{"id":56035726,"uuid":"313595533","full_name":"BobLd/camelot-sharp","owner":"BobLd","description":"A C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig).","archived":false,"fork":false,"pushed_at":"2022-02-04T16:19:16.000Z","size":3680,"stargazers_count":31,"open_issues_count":1,"forks_count":5,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-14T14:22:10.956Z","etag":null,"topics":["camelot","camelot-sharp","csharp","dotnet","extract-table","extracting-tables","extraction","extraction-engine","netstandard","opencv","pdf-table-extract","pdf-table-extraction","pdfparser","pdfpig","pdfs","table","table-extraction"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BobLd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-17T11:17:11.000Z","updated_at":"2024-07-11T16:19:14.000Z","dependencies_parsed_at":"2022-08-15T11:50:54.303Z","dependency_job_id":null,"html_url":"https://github.com/BobLd/camelot-sharp","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BobLd%2Fcamelot-sharp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BobLd%2Fcamelot-sharp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BobLd%2Fcamelot-sharp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BobLd%2Fcamelot-sharp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BobLd","download_url":"https://codeload.github.com/BobLd/camelot-sharp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248894963,"owners_count":21179157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["camelot","camelot-sharp","csharp","dotnet","extract-table","extracting-tables","extraction","extraction-engine","netstandard","opencv","pdf-table-extract","pdf-table-extraction","pdfparser","pdfpig","pdfs","table","table-extraction"],"created_at":"2024-10-15T12:33:18.757Z","updated_at":"2025-04-14T14:22:18.604Z","avatar_url":"https://github.com/BobLd.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# camelot-sharp\nA C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig).\n\nOriginal Python source code available here: [camelot-dev/camelot](https://github.com/camelot-dev/camelot).\n\n[![Windows](https://github.com/BobLd/camelot-sharp/actions/workflows/dotnet.yml/badge.svg)](https://github.com/BobLd/camelot-sharp/actions/workflows/dotnet.yml)\n\nNuGet packages available on the [releases](https://github.com/BobLd/camelot-sharp/releases) page and on www.nuget.org:\n- [Camelot](https://www.nuget.org/packages/Camelot)\n- [Camelot.ImageProcessing.OpenCvSharp4](https://www.nuget.org/packages/Camelot.ImageProcessing.OpenCvSharp4)\n\n# Usage\n## Stream mode \n```csharp\nusing (PdfDocument doc = PdfDocument.Open(@\"Files\\foo.pdf\", new ParsingOptions() { ClipPaths = true }))\n{\n\tStream stream = new Stream();\n\tvar tables = stream.ExtractTables(doc.GetPage(1));\n\n\tAssert.Single(tables);\n\tAssert.Equal((612, 792), stream.Dimensions);\n\tAssert.Equal(612, stream.PdfWidth);\n\tAssert.Equal(792, stream.PdfHeight);\n\t//Assert.Equal(84, stream.HorizontalText.Count);\n\n\tvar parsingReport = tables[0].ParsingReport();\n\t//   parsing_report = {\"accuracy\": 99.02, \"whitespace\": 12.24, \"order\": 1, \"page\": 1}\n\tparsingReport[\"order\"] = 1;\n\tparsingReport[\"page\"] = 1;\n}\n```\n\n## Lattice mode\n```csharp\nusing (var doc = PdfDocument.Open(@\"Files\\column_span_2.pdf\", new ParsingOptions() { ClipPaths = true }))\n{\n\tvar page = doc.GetPage(1);\n\n\tLattice lattice = new Lattice(new OpenCvImageProcesser(), new BasicSystemDrawingProcessor(), line_scale: 40);\n\tvar tables = lattice.ExtractTables(page,\n\t\tlayout_kwargs: new DlaOptions[]\n\t\t{\n\t\t\tnew DocstrumBoundingBoxes.DocstrumBoundingBoxesOptions()\n\t\t\t{\n\t\t\t\tWithinLineMultiplier = 2\n\t\t\t}\n\t\t});\n\tAssert.Single(tables);\n\tAssert.Equal(DataLatticeShiftTextLeftTop.Length, tables[0].Cells.Count);\n\tAssert.Equal(DataLatticeShiftTextLeftTop, tables[0].Data().Select(r =\u003e r.Select(c =\u003e c).ToArray()).ToArray());\n}\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbobld%2Fcamelot-sharp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbobld%2Fcamelot-sharp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbobld%2Fcamelot-sharp/lists"}