{"id":36235954,"url":"https://github.com/harubi/bolivar","last_synced_at":"2026-02-22T05:00:31.119Z","repository":{"id":337481184,"uuid":"1125478932","full_name":"harubi/bolivar","owner":"harubi","description":"High-performance PDF table extraction library. Bindings for Python and JVM.","archived":false,"fork":false,"pushed_at":"2026-02-12T12:23:22.000Z","size":6911,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-02-12T21:22:31.150Z","etag":null,"topics":["jvm","pdf","pdf-parsing","python","rust","table-extraction","text-extraction"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/harubi.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-30T19:56:25.000Z","updated_at":"2026-02-12T12:23:25.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/harubi/bolivar","commit_stats":null,"previous_names":["harubi/bolivar"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/harubi/bolivar","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harubi%2Fbolivar","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harubi%2Fbolivar/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harubi%2Fbolivar/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harubi%2Fbolivar/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/harubi","download_url":"https://codeload.github.com/harubi/bolivar/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harubi%2Fbolivar/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29704406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T23:35:04.139Z","status":"online","status_checked_at":"2026-02-22T02:00:08.193Z","response_time":110,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["jvm","pdf","pdf-parsing","python","rust","table-extraction","text-extraction"],"created_at":"2026-01-11T05:59:32.052Z","updated_at":"2026-02-22T05:00:31.106Z","avatar_url":"https://github.com/harubi.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bolivar\n\nFast PDF text and table extraction. Written in Rust, drop-in compatible with pdfminer and pdfplumber.\n\n## Install\n\n```sh\npip install bolivar\n```\n\n```kotlin\nimplementation(\"sa.ingenious:bolivar:1.2.0\")\n```\n\n```toml\n[dependencies]\nbolivar-core = \"1.2\"\n```\n\n## Extract text\n\nPull all text from a PDF in one call. The pdfplumber interface opens the file and iterates pages; the pdfminer interface returns the full text directly. Kotlin and Rust follow the same pattern with their respective APIs.\n\n```python\nimport pdfplumber\n\nwith pdfplumber.open(\"doc.pdf\") as pdf:\n    for page in pdf.pages:\n        print(page.extract_text())\n```\n\n```python\nfrom pdfminer.high_level import extract_text\n\ntext = extract_text(\"doc.pdf\")\n```\n\n```kotlin\nimport sa.ingenious.DocumentOptions\nimport sa.ingenious.bolivar\n\nval doc = bolivar.open(\"doc.pdf\", DocumentOptions {\n    maxPages = 1\n    layout {\n        lineMargin = 0.5\n        wordMargin = 0.1\n    }\n})\nval text = doc.extractText()\n```\n\n```rust\nuse bolivar_core::high_level::extract_text;\n\nfn main() -\u003e bolivar_core::Result\u003c()\u003e {\n    let data = std::fs::read(\"doc.pdf\")?;\n    let text = extract_text(\u0026data, None)?;\n    println!(\"{text}\");\n    Ok(())\n}\n```\n\n## Extract tables\n\nDetect and extract tabular data from each page. Bolivar returns structured tables with row and column counts, bounding boxes, and cell text so you can inspect or export them without manual parsing.\n\n```python\nimport pdfplumber\n\nwith pdfplumber.open(\"doc.pdf\") as pdf:\n    for page in pdf.pages:\n        for table in page.extract_tables():\n            print(table)\n```\n\n```kotlin\nimport sa.ingenious.DocumentOptions\nimport sa.ingenious.bolivar\n\nval doc = bolivar.open(\"doc.pdf\", DocumentOptions {\n    pages(1, 2)\n})\nval tables = doc.extractTables()\nfor (table in tables) {\n    println(\"${table.rowCount}x${table.columnCount}\")\n}\n```\n\n```rust\nuse bolivar_core::high_level::{extract_tables_with_document, ExtractOptions};\nuse bolivar_core::pdfdocument::PDFDocument;\nuse bolivar_core::table::TableSettings;\n\nfn main() -\u003e bolivar_core::Result\u003c()\u003e {\n    let data = std::fs::read(\"doc.pdf\")?;\n    let doc = PDFDocument::new(\u0026data, \"\")?;\n    let tables = extract_tables_with_document(\n        \u0026doc,\n        ExtractOptions::default(),\n        \u0026TableSettings::default(),\n    )?;\n    Ok(())\n}\n```\n\n## Iterate pages\n\nWalk through pages one at a time to read metadata like page number, dimensions, and a text preview. This is useful when you need to locate content across a large document before extracting specific pages.\n\n```python\nimport pdfplumber\n\nwith pdfplumber.open(\"doc.pdf\") as pdf:\n    for page in pdf.pages:\n        print(page.page_number, page.width, page.height)\n```\n\n```python\nfrom pdfminer.high_level import extract_pages\n\nfor page in extract_pages(\"doc.pdf\"):\n    print(page.pageid, page.width, page.height)\n```\n\n```kotlin\nimport sa.ingenious.DocumentOptions\nimport sa.ingenious.bolivar\n\nval doc = bolivar.open(\"doc.pdf\", DocumentOptions {\n    maxPages = 3\n})\nval pages = doc.extractPageSummaries()\nfor (page in pages) {\n    println(\"${page.pageNumber}: ${page.text.take(80)}\")\n}\n```\n\n```rust\nuse bolivar_core::high_level::extract_pages;\n\nfn main() -\u003e bolivar_core::Result\u003c()\u003e {\n    let data = std::fs::read(\"doc.pdf\")?;\n    for page in extract_pages(\u0026data, None)? {\n        let page = page?;\n        println!(\"{}\", page.pageid);\n    }\n    Ok(())\n}\n```\n\n## Async (Python)\n\nRun extraction off the main thread in Python while keeping the same `pdfplumber` API.\n\n```python\nimport pdfplumber\n\nasync with pdfplumber.open(\"doc.pdf\") as pdf:\n    for page in pdf.pages:\n        for table in page.extract_tables():\n            print(table)\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharubi%2Fbolivar","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharubi%2Fbolivar","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharubi%2Fbolivar/lists"}