{"id":13439595,"url":"https://github.com/J-F-Liu/lopdf","last_synced_at":"2025-03-20T08:31:40.137Z","repository":{"id":37390975,"uuid":"76324529","full_name":"J-F-Liu/lopdf","owner":"J-F-Liu","description":"A Rust library for PDF document manipulation.","archived":false,"fork":false,"pushed_at":"2025-03-15T14:45:00.000Z","size":7558,"stargazers_count":1758,"open_issues_count":61,"forks_count":187,"subscribers_count":22,"default_branch":"main","last_synced_at":"2025-03-18T19:13:26.230Z","etag":null,"topics":["pdf-document","rust","rust-library"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/J-F-Liu.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-12-13T05:05:53.000Z","updated_at":"2025-03-18T05:19:34.000Z","dependencies_parsed_at":"2023-12-26T10:45:02.748Z","dependency_job_id":"0c2bf53a-f7e9-454f-b998-d80f0e4b734d","html_url":"https://github.com/J-F-Liu/lopdf","commit_stats":{"total_commits":394,"total_committers":69,"mean_commits":"5.7101449275362315","dds":0.6243654822335025,"last_synced_commit":"50bc41144db20031e9d16faa69f6902fbd974ed2"},"previous_names":[],"tags_count":32,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J-F-Liu%2Flopdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J-F-Liu%2Flopdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J-F-Liu%2Flopdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J-F-Liu%2Flopdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/J-F-Liu","download_url":"https://codeload.github.com/J-F-Liu/lopdf/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244577871,"owners_count":20475376,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pdf-document","rust","rust-library"],"created_at":"2024-07-31T03:01:15.416Z","updated_at":"2025-03-20T08:31:40.131Z","avatar_url":"https://github.com/J-F-Liu.png","language":"Rust","funding_links":[],"categories":["Rust","Libraries","库 Libraries","库","HarmonyOS"],"sub_categories":["Graphics","图形 Graphics","图像","Windows Manager","Rust","图像 Graphics"],"readme":"# lopdf\n\n[![Crates.io](https://img.shields.io/crates/v/lopdf.svg)](https://crates.io/crates/lopdf)\n[![CI](https://github.com/J-F-Liu/lopdf/actions/workflows/ci.yml/badge.svg)](https://github.com/J-F-Liu/lopdf/actions/workflows/ci.yml)\n[![Docs]( https://docs.rs/lopdf/badge.svg)](https://docs.rs/lopdf)\n\nA Rust library for PDF document manipulation.\n\nA useful reference for understanding the PDF file format and the\neventual usage of this library is the\n[PDF 1.7 Reference Document](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf).\nThe PDF 2.0 specification is available [here](https://www.pdfa.org/announcing-no-cost-access-to-iso-32000-2-pdf-2-0/).\n\n## Example Code\n\n* Create PDF document\n\n```rust\nuse lopdf::dictionary;\nuse lopdf::{Document, Object, Stream};\nuse lopdf::content::{Content, Operation};\n\n// `with_version` specifes the PDF version this document complies with.\nlet mut doc = Document::with_version(\"1.5\");\n// Object IDs are used for cross referencing in PDF documents.\n// `lopdf` helps keep track of them for us. They are simple integers.\n// Calls to `doc.new_object_id` and `doc.add_object` return an object ID.\n\n// \"Pages\" is the root node of the page tree.\nlet pages_id = doc.new_object_id();\n\n// Fonts are dictionaries. The \"Type\", \"Subtype\" and \"BaseFont\" tags\n// are straight out of the PDF spec.\n//\n// The dictionary macro is a helper that allows complex\n// key-value relationships to be represented in a simpler\n// visual manner, similar to a match statement.\n// A dictionary is implemented as an IndexMap of Vec\u003cu8\u003e, and Object\nlet font_id = doc.add_object(dictionary! {\n    // type of dictionary\n    \"Type\" =\u003e \"Font\",\n    // type of font, type1 is simple postscript font\n    \"Subtype\" =\u003e \"Type1\",\n    // basefont is postscript name of font for type1 font.\n    // See PDF reference document for more details\n    \"BaseFont\" =\u003e \"Courier\",\n});\n\n// Font dictionaries need to be added into resource\n// dictionaries in order to be used.\n// Resource dictionaries can contain more than just fonts,\n// but normally just contains fonts.\n// Only one resource dictionary is allowed per page tree root.\nlet resources_id = doc.add_object(dictionary! {\n    // Fonts are actually triplely nested dictionaries. Fun!\n    \"Font\" =\u003e dictionary! {\n        // F1 is the font name used when writing text.\n        // It must be unique in the document. It does not\n        // have to be F1\n        \"F1\" =\u003e font_id,\n    },\n});\n\n// `Content` is a wrapper struct around an operations struct that contains\n// a vector of operations. The operations struct contains a vector of\n// that match up with a particular PDF operator and operands.\n// Refer to the PDF spec for more details on the operators and operands\n// Note, the operators and operands are specified in a reverse order\n// from how they actually appear in the PDF file itself.\nlet content = Content {\n    operations: vec![\n        // BT begins a text element. It takes no operands.\n        Operation::new(\"BT\", vec![]),\n        // Tf specifies the font and font size.\n        // Font scaling is complicated in PDFs.\n        // Refer to the spec for more info.\n        // The `into()` methods convert the types into\n        // an enum that represents the basic object types in PDF documents.\n        Operation::new(\"Tf\", vec![\"F1\".into(), 48.into()]),\n        // Td adjusts the translation components of the text matrix.\n        // When used for the first time after BT, it sets the initial\n        // text position on the page.\n        // Note: PDF documents have Y=0 at the bottom. Thus 600 to print text near the top.\n        Operation::new(\"Td\", vec![100.into(), 600.into()]),\n        // Tj prints a string literal to the page. By default, this is black text that is\n        // filled in. There are other operators that can produce various textual effects and\n        // colors\n        Operation::new(\"Tj\", vec![Object::string_literal(\"Hello World!\")]),\n        // ET ends the text element.\n        Operation::new(\"ET\", vec![]),\n    ],\n};\n\n// Streams are a dictionary followed by a (possibly encoded) sequence of bytes.\n// What that sequence of bytes represents, depends on the context.\n// The stream dictionary is set internally by lopdf and normally doesn't\n// need to be manually manipulated. It contains keys such as\n// Length, Filter, DecodeParams, etc.\nlet content_id = doc.add_object(Stream::new(dictionary! {}, content.encode().unwrap()));\n\n// Page is a dictionary that represents one page of a PDF file.\n// Its required fields are \"Type\", \"Parent\" and \"Contents\".\nlet page_id = doc.add_object(dictionary! {\n    \"Type\" =\u003e \"Page\",\n    \"Parent\" =\u003e pages_id,\n    \"Contents\" =\u003e content_id,\n});\n\n// Again, \"Pages\" is the root of the page tree. The ID was already created\n// at the top of the page, since we needed it to assign to the parent element\n// of the page dictionary.\n//\n// These are just the basic requirements for a page tree root object.\n// There are also many additional entries that can be added to the dictionary,\n// if needed. Some of these can also be defined on the page dictionary itself,\n// and not inherited from the page tree root.\nlet pages = dictionary! {\n    // Type of dictionary\n    \"Type\" =\u003e \"Pages\",\n    // Vector of page IDs in document. Normally would contain more than one ID\n    // and be produced using a loop of some kind.\n    \"Kids\" =\u003e vec![page_id.into()],\n    // Page count\n    \"Count\" =\u003e 1,\n    // ID of resources dictionary, defined earlier\n    \"Resources\" =\u003e resources_id,\n    // A rectangle that defines the boundaries of the physical or digital media.\n    // This is the \"page size\".\n    \"MediaBox\" =\u003e vec![0.into(), 0.into(), 595.into(), 842.into()],\n};\n\n// Using `insert()` here, instead of `add_object()` since the ID is already known.\ndoc.objects.insert(pages_id, Object::Dictionary(pages));\n\n// Creating document catalog.\n// There are many more entries allowed in the catalog dictionary.\nlet catalog_id = doc.add_object(dictionary! {\n    \"Type\" =\u003e \"Catalog\",\n    \"Pages\" =\u003e pages_id,\n});\n\n// The \"Root\" key in trailer is set to the ID of the document catalog,\n// the remainder of the trailer is set during `doc.save()`.\ndoc.trailer.set(\"Root\", catalog_id);\ndoc.compress();\n\n// Store file in current working directory.\n// Note: Line is excluded when running tests\nif false {\n    doc.save(\"example.pdf\").unwrap();\n}\n```\n\n* Merge PDF documents\n\n```rust\nuse lopdf::dictionary;\n\nuse std::collections::BTreeMap;\n\nuse lopdf::content::{Content, Operation};\nuse lopdf::{Document, Object, ObjectId, Stream, Bookmark};\n\npub fn generate_fake_document() -\u003e Document {\n    let mut doc = Document::with_version(\"1.5\");\n    let pages_id = doc.new_object_id();\n    let font_id = doc.add_object(dictionary! {\n        \"Type\" =\u003e \"Font\",\n        \"Subtype\" =\u003e \"Type1\",\n        \"BaseFont\" =\u003e \"Courier\",\n    });\n    let resources_id = doc.add_object(dictionary! {\n        \"Font\" =\u003e dictionary! {\n            \"F1\" =\u003e font_id,\n        },\n    });\n    let content = Content {\n        operations: vec![\n            Operation::new(\"BT\", vec![]),\n            Operation::new(\"Tf\", vec![\"F1\".into(), 48.into()]),\n            Operation::new(\"Td\", vec![100.into(), 600.into()]),\n            Operation::new(\"Tj\", vec![Object::string_literal(\"Hello World!\")]),\n            Operation::new(\"ET\", vec![]),\n        ],\n    };\n    let content_id = doc.add_object(Stream::new(dictionary! {}, content.encode().unwrap()));\n    let page_id = doc.add_object(dictionary! {\n        \"Type\" =\u003e \"Page\",\n        \"Parent\" =\u003e pages_id,\n        \"Contents\" =\u003e content_id,\n        \"Resources\" =\u003e resources_id,\n        \"MediaBox\" =\u003e vec![0.into(), 0.into(), 595.into(), 842.into()],\n    });\n    let pages = dictionary! {\n        \"Type\" =\u003e \"Pages\",\n        \"Kids\" =\u003e vec![page_id.into()],\n        \"Count\" =\u003e 1,\n    };\n    doc.objects.insert(pages_id, Object::Dictionary(pages));\n    let catalog_id = doc.add_object(dictionary! {\n        \"Type\" =\u003e \"Catalog\",\n        \"Pages\" =\u003e pages_id,\n    });\n    doc.trailer.set(\"Root\", catalog_id);\n\n    doc\n}\n\nfn main() -\u003e std::io::Result\u003c()\u003e {\n    // Generate a stack of Documents to merge.\n    let documents = vec![\n        generate_fake_document(),\n        generate_fake_document(),\n        generate_fake_document(),\n        generate_fake_document(),\n    ];\n\n    // Define a starting `max_id` (will be used as start index for object_ids).\n    let mut max_id = 1;\n    let mut pagenum = 1;\n    // Collect all Documents Objects grouped by a map\n    let mut documents_pages = BTreeMap::new();\n    let mut documents_objects = BTreeMap::new();\n    let mut document = Document::with_version(\"1.5\");\n\n    for mut doc in documents {\n        let mut first = false;\n        doc.renumber_objects_with(max_id);\n\n        max_id = doc.max_id + 1;\n\n        documents_pages.extend(\n            doc\n                    .get_pages()\n                    .into_iter()\n                    .map(|(_, object_id)| {\n                        if !first {\n                            let bookmark = Bookmark::new(String::from(format!(\"Page_{}\", pagenum)), [0.0, 0.0, 1.0], 0, object_id);\n                            document.add_bookmark(bookmark, None);\n                            first = true;\n                            pagenum += 1;\n                        }\n\n                        (\n                            object_id,\n                            doc.get_object(object_id).unwrap().to_owned(),\n                        )\n                    })\n                    .collect::\u003cBTreeMap\u003cObjectId, Object\u003e\u003e(),\n        );\n        documents_objects.extend(doc.objects);\n    }\n\n    // \"Catalog\" and \"Pages\" are mandatory.\n    let mut catalog_object: Option\u003c(ObjectId, Object)\u003e = None;\n    let mut pages_object: Option\u003c(ObjectId, Object)\u003e = None;\n\n    // Process all objects except \"Page\" type\n    for (object_id, object) in documents_objects.iter() {\n        // We have to ignore \"Page\" (as are processed later), \"Outlines\" and \"Outline\" objects.\n        // All other objects should be collected and inserted into the main Document.\n        match object.type_name().unwrap_or(b\"\") {\n            b\"Catalog\" =\u003e {\n                // Collect a first \"Catalog\" object and use it for the future \"Pages\".\n                catalog_object = Some((\n                    if let Some((id, _)) = catalog_object {\n                        id\n                    } else {\n                        *object_id\n                    },\n                    object.clone(),\n                ));\n            }\n            b\"Pages\" =\u003e {\n                // Collect and update a first \"Pages\" object and use it for the future \"Catalog\"\n                // We have also to merge all dictionaries of the old and the new \"Pages\" object\n                if let Ok(dictionary) = object.as_dict() {\n                    let mut dictionary = dictionary.clone();\n                    if let Some((_, ref object)) = pages_object {\n                        if let Ok(old_dictionary) = object.as_dict() {\n                            dictionary.extend(old_dictionary);\n                        }\n                    }\n\n                    pages_object = Some((\n                        if let Some((id, _)) = pages_object {\n                            id\n                        } else {\n                            *object_id\n                        },\n                        Object::Dictionary(dictionary),\n                    ));\n                }\n            }\n            b\"Page\" =\u003e {}     // Ignored, processed later and separately\n            b\"Outlines\" =\u003e {} // Ignored, not supported yet\n            b\"Outline\" =\u003e {}  // Ignored, not supported yet\n            _ =\u003e {\n                document.objects.insert(*object_id, object.clone());\n            }\n        }\n    }\n\n    // If no \"Pages\" object found, abort.\n    if pages_object.is_none() {\n        println!(\"Pages root not found.\");\n\n        return Ok(());\n    }\n\n    // Iterate over all \"Page\" objects and collect into the parent \"Pages\" created before\n    for (object_id, object) in documents_pages.iter() {\n        if let Ok(dictionary) = object.as_dict() {\n            let mut dictionary = dictionary.clone();\n            dictionary.set(\"Parent\", pages_object.as_ref().unwrap().0);\n\n            document\n                    .objects\n                    .insert(*object_id, Object::Dictionary(dictionary));\n        }\n    }\n\n    // If no \"Catalog\" found, abort.\n    if catalog_object.is_none() {\n        println!(\"Catalog root not found.\");\n\n        return Ok(());\n    }\n\n    let catalog_object = catalog_object.unwrap();\n    let pages_object = pages_object.unwrap();\n\n    // Build a new \"Pages\" with updated fields\n    if let Ok(dictionary) = pages_object.1.as_dict() {\n        let mut dictionary = dictionary.clone();\n\n        // Set new pages count\n        dictionary.set(\"Count\", documents_pages.len() as u32);\n\n        // Set new \"Kids\" list (collected from documents pages) for \"Pages\"\n        dictionary.set(\n            \"Kids\",\n            documents_pages\n                    .into_iter()\n                    .map(|(object_id, _)| Object::Reference(object_id))\n                    .collect::\u003cVec\u003c_\u003e\u003e(),\n        );\n\n        document\n                .objects\n                .insert(pages_object.0, Object::Dictionary(dictionary));\n    }\n\n    // Build a new \"Catalog\" with updated fields\n    if let Ok(dictionary) = catalog_object.1.as_dict() {\n        let mut dictionary = dictionary.clone();\n        dictionary.set(\"Pages\", pages_object.0);\n        dictionary.remove(b\"Outlines\"); // Outlines not supported in merged PDFs\n\n        document\n                .objects\n                .insert(catalog_object.0, Object::Dictionary(dictionary));\n    }\n\n    document.trailer.set(\"Root\", catalog_object.0);\n\n    // Update the max internal ID as wasn't updated before due to direct objects insertion\n    document.max_id = document.objects.len() as u32;\n\n    // Reorder all new Document objects\n    document.renumber_objects();\n\n    // Set any Bookmarks to the First child if they are not set to a page\n    document.adjust_zero_pages();\n\n    // Set all bookmarks to the PDF Object tree then set the Outlines to the Bookmark content map.\n    if let Some(n) = document.build_outline() {\n        if let Ok(Object::Dictionary(dict)) = document.get_object_mut(catalog_object.0) {\n            dict.set(\"Outlines\", Object::Reference(n));\n        }\n    }\n\n    document.compress();\n\n    // Save the merged PDF.\n    // Store file in current working directory.\n    // Note: Line is excluded when running doc tests\n    if false {\n        document.save(\"merged.pdf\").unwrap();\n    }\n\n    Ok(())\n}\n```\n\n* Modify PDF document\n\n```rust\nuse lopdf::Document;\n\n// For this example to work a parser feature needs to be enabled\n#[cfg(not(feature = \"async\"))]\n#[cfg(feature = \"nom_parser\")]\n{\n    let mut doc = Document::load(\"assets/example.pdf\").unwrap();\n\n    doc.version = \"1.4\".to_string();\n    doc.replace_text(1, \"Hello World!\", \"Modified text!\");\n    // Store file in current working directory.\n    // Note: Line is excluded when running tests\n    if false {\n        doc.save(\"modified.pdf\").unwrap();\n    }\n}\n\n#[cfg(feature = \"async\")]\n#[cfg(feature = \"nom_parser\")]\n{\n    tokio::runtime::Builder::new_current_thread()\n        .build()\n        .expect(\"Failed to create runtime\")\n        .block_on(async move {\n            let mut doc = Document::load(\"assets/example.pdf\").await.unwrap();\n            \n            doc.version = \"1.4\".to_string();\n            doc.replace_text(1, \"Hello World!\", \"Modified text!\");\n            // Store file in current working directory.\n            // Note: Line is excluded when running tests\n            if false {\n                doc.save(\"modified.pdf\").unwrap();\n            }\n    });\n}\n```\n\n## FAQ\n\n* Why does the library keep everything in memory as high-level objects until finally serializing the entire document?\n\n    Normally, a PDF document won't be very large, ranging from tens of KB to hundreds of MB. Memory size is not a bottle neck for today's computer.\n    By keeping the whole document in memory, the stream length can be pre-calculated, no need to use a reference object for the Length entry.\n    The resulting PDF file is smaller for distribution and faster for PDF consumers to process.\n\n    Producing is a one-time effort, while consuming is many more.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJ-F-Liu%2Flopdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJ-F-Liu%2Flopdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJ-F-Liu%2Flopdf/lists"}