{"id":13822494,"url":"https://github.com/zitsen/ooxml-rs","last_synced_at":"2025-04-06T23:17:09.793Z","repository":{"id":42659295,"uuid":"309960885","full_name":"zitsen/ooxml-rs","owner":"zitsen","description":"Office OpenXML reader and writer in Rust","archived":false,"fork":false,"pushed_at":"2023-11-28T10:55:05.000Z","size":423,"stargazers_count":106,"open_issues_count":5,"forks_count":17,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-04-24T18:41:49.401Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zitsen.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-11-04T09:51:20.000Z","updated_at":"2024-04-14T18:33:26.000Z","dependencies_parsed_at":"2023-11-28T11:43:03.919Z","dependency_job_id":"a19fc28d-4d84-476c-b35b-d2b8520621e4","html_url":"https://github.com/zitsen/ooxml-rs","commit_stats":{"total_commits":67,"total_committers":3,"mean_commits":"22.333333333333332","dds":0.05970149253731338,"last_synced_commit":"6635f43d1bda0f3530d52fe0732d197a79be22c2"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zitsen%2Fooxml-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zitsen%2Fooxml-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zitsen%2Fooxml-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zitsen%2Fooxml-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zitsen","download_url":"https://codeload.github.com/zitsen/ooxml-rs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247563937,"owners_count":20958971,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-04T08:02:02.658Z","updated_at":"2025-04-06T23:17:09.770Z","avatar_url":"https://github.com/zitsen.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# OOXML - Office OpenXML parser in Rust\n\n**This crate is started as a private-purposed project with limited knownledge of Office Open XML, use it with caution!**\n\n\u003e Office Open XML，为由Microsoft开发的一种以XML为基础并以ZIP格式压缩的电子文件规范，支持文件、表格、备忘录、幻灯片等文件格式。\n\n\u003e Office Open XML (also informally known as OOXML or Microsoft Open XML (MOX)) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. The format was initially standardized by Ecma (as ECMA-376), and by the ISO and IEC (as ISO/IEC 29500) in later versions.\n\nOOXML, as it's naming, is trying to be a pure rust implementation of Office Open XML parser - reading and writing ooxml components efficiently in Rust. But at now, only xlsx parsing is supported.\n\n## TLDR;\n\nExample code in `examples/xlsx.rs`:\n\n```rust\nuse ooxml::document::SpreadsheetDocument;\n\nfn main() {\n    let xlsx =\n        SpreadsheetDocument::open(\"examples/simple-spreadsheet/data-image-demo.xlsx\").unwrap();\n\n    let workbook = xlsx.get_workbook();\n    //println!(\"{:?}\", xlsx);\n\n    let _sheet_names = workbook.worksheet_names();\n\n    for (sheet_idx, sheet) in workbook.worksheets().iter().enumerate() {\n        println!(\"worksheet {}\", sheet_idx);\n        println!(\"worksheet dimension: {:?}\", sheet.dimenstion());\n        println!(\"---------DATA---------\");\n        for rows in sheet.rows() {\n            // get cell values\n            let cols: Vec\u003c_\u003e = rows\n                .into_iter()\n                .map(|cell| cell.value().unwrap_or_default())\n                .collect();\n            println!(\"{}\", itertools::join(\u0026cols, \",\"));\n        }\n    }\n}\n\n```\n\nRun `cargo run --example xlsx`:\n\n```\nworksheet 0\nworksheet dimension: Some((1, 1))\n---------DATA---------\n\n----------------------\nworksheet 1\nworksheet dimension: Some((4, 4))\n---------DATA---------\nname,age,birthday,last edited\nbob,17,1983/12/12,2020/10/11 19:59\ntom,18,1982/12/12,2020/10/11 19:59\ncury,20,1980-12-12,2020-10-11 19:59\n----------------------\n```\n\n## Library Design\n\nThe main idea come from the [DotNet OpenXML SDK].\n\n1. Implement [OpenXML Package Convention] for any OOXML format(docx/xlsx/pptx...), including:\n   - package read and write\n   - content type parsing\n   - relationship common types\n2. Implement shared OpenXML parts\n   - content type\n   - core properties\n   - app properties\n   - file properties\n   - embedded package\n   - image\n   - theme\n   - style\n3. Implement [Excel/SpreadsheetML specifications](http://officeopenxml.com/anatomyofOOXML-xlsx.php)\n   - Calculation Chain\n   - Chartsheet\n   - Comments\n   - Connections\n   - Custom Property\n   - Customer XML Mappings\n   - Dialogsheet\n   - Drawings\n   - External Workbook References\n   - Metadata\n   - Pivot Table\n   - Pivot Table Cache Definition\n   - Pivot Table Cache Records\n   - Query Table\n   - Shared String Table\n   - Shared Workbook Revision Log\n   - Shared Workbook User Data\n   - Single Cell Table Definition\n   - Table Definition\n   - Volatile Dependencies\n   - Workbook\n   - Worksheet\n4. Other OpenXML formats(docx, pptx)\n\nCodebase tree structure will be like below.\n\n```text\nsrc\n├── document\n│   ├── mod.rs\n│   ├── presentation\n│   │   └── mod.rs\n│   ├── spreadsheet\n│   │   ├── cell.rs\n│   │   ├── chart.rs\n│   │   ├── document_type.rs\n│   │   ├── drawing.rs\n│   │   ├── media.rs\n│   │   ├── mod.rs\n│   │   ├── shared_string.rs\n│   │   ├── style.rs\n│   │   ├── workbook.rs\n│   │   └── worksheet.rs\n│   └── wordprocessing\n│       └── mod.rs\n├── drawing\n│   └── mod.rs\n├── error.rs\n├── lib.rs\n├── math\n│   └── mod.rs\n└── packaging\n    ├── app_property.rs\n    ├── content_type.rs\n    ├── custom_property.rs\n    ├── element.rs\n    ├── mod.rs\n    ├── namespace.rs\n    ├── package.rs\n    ├── part\n    │   ├── container.rs\n    │   ├── mod.rs\n    │   └── pair.rs\n    ├── property.rs\n    ├── relationship\n    │   ├── mod.rs\n    │   └── reference.rs\n    ├── variant.rs\n    ├── xml.rs\n    └── zip.rs\n```\n\n## Definitions For the Crate\n\n**The main design principle is `typed everything`.**\n\n- **`Package`**: A `Package` is a zipped OpenXML document, which could be wordprocessing/spreadsheet/presentation document.\n- **`Element`**: An `Element` is an OpenXML element reperasenting data details in each xml.\n- **`Part`**: A `Part` is a collection of `Element`s or pure data that should be serializing to an file in the package.\n- **`Component`**: A `Component` is the bridge of behaviors and the internal OpenXML stuff, including `Package`, `Element`, and `Part`.\n- **`Property`**: A `Property` represents attributes for an element.\n- **`Document`**: A `Document` is the entry `Component` for an real document, eg. `SpreadSheetDocument` etc.\n- **`Relationship`**: A `Relationship` is a link relationship for the element and other resources from a `Part`.\n\nThe data flows open or create an document will be like below.\n\n```plantuml\nDocument -\u003e Package : open/parse from\nPackage -\u003e Parts : parse to parts\nParts -\u003e Components: build components tree\nComponents -\u003e Elements: elements one-to-one map\nElements -\u003e Components: elements changes\nComponents -\u003e Parts: components write back\nParts -\u003e Package: serialize to package\nPackage \u003c- Document: flush, save or others\n\nDocument -\u003e Components: create new document. add or remove components\nComponents \u003c-\u003e Elements: operations\nComponents -\u003e Parts: component add/remove\nParts -\u003e Package: serialize to package\nDocument -\u003e Package: flush, save or others\n```\n\n## Initialize Implementing Features\n\n- [x] OPC parsing, include read and write\n- [x] Shared components\n  - [x] content type\n  - [x] core properties\n  - [x] app properties\n  - [ ] file properties(not in schedule)\n  - [ ] embedded package(not int schedule)\n  - [ ] image\n  - [ ] theme\n  - [ ] style\n- [ ] SpreadsheetML\n  - [ ] Workbook\n  - [ ] Worksheet\n\nTODOS:\n- create marker traits for OpenXML element, make it more generialize.\n- use `minidom` in an xml part, tracking the changes and write back to dom tree.\n- lazy parse some of the openxml part for first start speedup.\n- implement helper macros for component generation.\n  \n## Tokei - 2020-11-04-11:35:51\n\n```text\n===============================================================================\n Language            Files        Lines         Code     Comments       Blanks\n===============================================================================\n Markdown                1          272            0          230           42\n Plain Text              1            1            0            1            0\n TOML                    1           23           21            1            1\n XML                    52          164          164            0            0\n-------------------------------------------------------------------------------\n Rust                   34         2721         2189          194          338\n |- Markdown            14          106            7           90            9\n (Total)                           2827         2196          284          347\n===============================================================================\n Total                  89         3287         2381          516          390\n===============================================================================\n```\n\n## Concepts\n\n### Office Open XML, or OpenXML\n\nOffice Open XML (also informally known as OOXML or Microsoft Open XML (MOX)) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. The format was initially standardized by Ecma (as ECMA-376), and by the ISO and IEC (as ISO/IEC 29500) in later versions.\n\nMicrosoft Office 2010 provides read support for ECMA-376, read/write support for ISO/IEC 29500 Transitional, and read support for ISO/IEC 29500 Strict. Microsoft Office 2013 and Microsoft Office 2016 additionally support both reading and writing of ISO/IEC 29500 Strict.While Office 2013 and onward have full read/write support for ISO/IEC 29500 Strict, Microsoft has not yet implemented the strict non-transitional, or original standard, as the default file format yet due to remaining interoperability concerns.\n\n### OpenXML Package Convention\n\nThe Open Packaging Conventions (OPC) is a container-file technology initially created by Microsoft to store a combination of XML and non-XML files that together form a single entity such as an Open XML Paper Specification (OpenXPS) document. OPC-based file formats combine the advantages of leaving the independent file entities embedded in the document intact and resulting in much smaller files compared to normal use of XML.\n\n### Standard ECMA-376\n\n[Standard ECMA-376] - The Office Open XML File Formats standard.\n\n1st edition (December 2006), 2nd edition (December 2008), 3rd edition (June 2011), 4th edition (December 2012) and 5th edition (Part 3, December 2015; and Parts 1 \u0026 4, December 2016).\n\nEdition downloads:\n\n- [ECMA-376 5th edition Part 1]\n- [ECMA-376 5th edition Part 3]\n- [ECMA-376 5th edition Part 4]\n  \n- [ECMA-376 4th edition Part 1]\n- [ECMA-376 4th edition Part 2]\n- [ECMA-376 4th edition Part 3]\n- [ECMA-376 4th edition Part 4]\n  \nCurrently is 4th edition, technically aligned with ISO/IEC 29500. 5th edition is ongoing. There is a [Office Open XML Overview] introduction pdf file.\n\n### SpreadsheetML\n\nA SpreadsheetML or .xlsx file is a zip file (a package) containing a number of \"parts\" (typically UTF-8 or UTF-16 encoded) or XML files. The package may also contain other media files such as images. The structure is organized according to the Open Packaging Conventions as outlined in Part 2 of the OOXML standard ECMA-376.\n\nYou can look at the file structure and the files that comprise a SpreadsheetML file by simply unzipping the .xlsx file.\n\n```text\n├── [Content_Types].xml\n├── docProps\n│   ├── app.xml\n│   ├── core.xml\n│   └── custom.xml\n├── _rels\n└── xl\n    ├── charts\n    │   ├── chart1.xml\n    │   ├── colors1.xml\n    │   ├── _rels\n    │   │   └── chart1.xml.rels\n    │   └── style1.xml\n    ├── drawings\n    │   ├── drawing1.xml\n    │   ├── drawing2.xml\n    │   └── _rels\n    │       ├── drawing1.xml.rels\n    │       └── drawing2.xml.rels\n    ├── media\n    │   └── image1.png\n    ├── _rels\n    │   └── workbook.xml.rels\n    ├── sharedStrings.xml\n    ├── styles.xml\n    ├── theme\n    │   └── theme1.xml\n    ├── workbook.xml\n    └── worksheets\n        ├── _rels\n        │   ├── sheet1.xml.rels\n        │   └── sheet2.xml.rels\n        ├── sheet1.xml\n        └── sheet2.xml\n```\n\nThe number and types of parts will vary based on what is in the spreadsheet, but there will always be a `[Content_Types].xml`, one or more relationship parts, a workbook part , and at least one worksheet. The core data of the spreadsheet is contained within the worksheet part(s), discussed in more detail at [xslx Content Overview](http://officeopenxml.com/SScontentOverview.php).\n\n## Resources\n\n1. Wikipedia Office OpenXML: [English](https://en.wikipedia.org/wiki/Office_Open_XML), [中文](https://zh.wikipedia.org/wiki/Office_Open_XML).\n2. Microsoft [DotNet OpenXML SDK] documents and [source code](https://github.com/OfficeDev/Open-XML-SDK/).\n3. Wikipedia [OpenXML Package Convention] - [开放打包约定].\n4. What is OOXML: http://officeopenxml.com/\n5. SpreadsheetML: http://officeopenxml.com/anatomyofOOXML-xlsx.php\n6. Rust [quick-xml](https://crates.io/crates/quick-xml) [documents](https://docs.rs/quick-xml/0.20.0).\n7. Rust [docx-rs](https://crates.io/crates/docx-rs) [documents](https://docs.rs/docx-rs) and [source code on github](https://github.com/bokuweb/docx-rs).\n8. Go Excel file parser [excelize](https://github.com/360EntSecGroup-Skylar/excelize).\n9. [Standard ECMA-376].\n\n[Office Open XML]: http://officeopenxml.com/\n[DotNet OpenXML SDK]: https://docs.microsoft.com/en-us/dotnet/api/overview/openxml/?view=openxml-2.8.1\n[OpenXML Package Convention]: https://en.wikipedia.org/wiki/Open_Packaging_Conventions\n[开放打包约定]: https://zh.wikipedia.org/wiki/%E5%BC%80%E6%94%BE%E6%89%93%E5%8C%85%E7%BA%A6%E5%AE%9A\n[Standard ECMA-376]: https://www.ecma-international.org/publications/standards/Ecma-376.htm\n[ECMA-376 5th edition Part 1]: https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-376,%20Fifth%20Edition,%20Part%201%20-%20Fundamentals%20And%20Markup%20Language%20Reference.zip\n[ECMA-376 5th edition Part 3]: https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-376,%20Fifth%20Edition,%20Part%203%20-%20Markup%20Compatibility%20and%20Extensibility.zip\n[ECMA-376 5th edition Part 4]: https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-376,%20Fifth%20Edition,%20Part%204%20-%20Transitional%20Migration%20Features.zip\n\n[ECMA-376 4th edition Part 1]: https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-376,%20Fourth%20Edition,%20Part%201%20-%20Fundamentals%20And%20Markup%20Language%20Reference.zip\n[ECMA-376 4th edition Part 2]: https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-376,%20Fourth%20Edition,%20Part%202%20-%20Open%20Packaging%20Conventions.zip\n[ECMA-376 4th edition Part 3]: https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-376,%20Fourth%20Edition,%20Part%203%20-%20Markup%20Compatibility%20and%20Extensibility.zip\n[ECMA-376 4th edition Part 4]: https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-376,%20Fourth%20Edition,%20Part%204%20-%20Transitional%20Migration%20Features.zip\n[Office Open XML Overview]: https://www.ecma-international.org/news/TC45_current_work/OpenXML%20White%20Paper.pdf","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzitsen%2Fooxml-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzitsen%2Fooxml-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzitsen%2Fooxml-rs/lists"}