{"id":13595102,"url":"https://github.com/tafia/calamine","last_synced_at":"2026-03-07T16:07:11.649Z","repository":{"id":37381986,"uuid":"61877861","full_name":"tafia/calamine","owner":"tafia","description":"A pure Rust Excel/OpenDocument SpreadSheets file reader: rust on metal sheets","archived":false,"fork":false,"pushed_at":"2025-04-21T06:01:48.000Z","size":5989,"stargazers_count":1892,"open_issues_count":59,"forks_count":174,"subscribers_count":28,"default_branch":"master","last_synced_at":"2025-04-21T07:26:53.859Z","etag":null,"topics":["deserializer","excel","opendocument-spreadsheet","parser","rust","serde","vba"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tafia.png","metadata":{"files":{"readme":"README.md","changelog":"Changelog.md","contributing":null,"funding":null,"license":"LICENSE-MIT.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-06-24T10:44:23.000Z","updated_at":"2025-04-21T06:01:51.000Z","dependencies_parsed_at":"2023-10-28T11:26:42.356Z","dependency_job_id":"2b8eb9ff-7bc1-4790-89fe-4993d5048155","html_url":"https://github.com/tafia/calamine","commit_stats":{"total_commits":578,"total_committers":82,"mean_commits":7.048780487804878,"dds":0.5121107266435987,"last_synced_commit":"6e231a8577b075b46010eaf6fbf41a079fca1658"},"previous_names":[],"tags_count":36,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tafia%2Fcalamine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tafia%2Fcalamine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tafia%2Fcalamine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tafia%2Fcalamine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tafia","download_url":"https://codeload.github.com/tafia/calamine/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254043784,"owners_count":22005016,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deserializer","excel","opendocument-spreadsheet","parser","rust","serde","vba"],"created_at":"2024-08-01T16:01:43.982Z","updated_at":"2026-03-07T16:07:11.612Z","avatar_url":"https://github.com/tafia.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# calamine\n\nAn Excel/OpenDocument Spreadsheets file reader/deserializer, in pure Rust.\n\n[![GitHub CI Rust tests](https://github.com/tafia/calamine/workflows/Rust/badge.svg)](https://github.com/tafia/calamine/actions)\n[![Build status](https://ci.appveyor.com/api/projects/status/njpnhq54h5hxsgel/branch/master?svg=true)](https://ci.appveyor.com/project/tafia/calamine/branch/master)\n\n[Documentation](https://docs.rs/calamine/)\n\n## Description\n\n**calamine** is a pure Rust library to read and deserialize any spreadsheet file:\n\n- excel like (`xls`, `xlsx`, `xlsm`, `xlsb`, `xla`, `xlam`)\n- opendocument spreadsheets (`ods`)\n\nAs long as your files are *simple enough*, this library should just work.\nFor anything else, please file an issue with a failing test or send a pull request!\n\n## Examples\n\n### Serde deserialization\n\nIt is as simple as:\n\n```rust\nuse calamine::{open_workbook, Error, Xlsx, Reader, RangeDeserializerBuilder};\n\nfn example() -\u003e Result\u003c(), Error\u003e {\n    let path = format!(\"{}/tests/temperature.xlsx\", env!(\"CARGO_MANIFEST_DIR\"));\n    let mut workbook: Xlsx\u003c_\u003e = open_workbook(path)?;\n    let range = workbook.worksheet_range(\"Sheet1\")?;\n\n\n    let mut iter = RangeDeserializerBuilder::new().from_range(\u0026range)?;\n\n    if let Some(result) = iter.next() {\n        let (label, value): (String, f64) = result?;\n        assert_eq!(label, \"celsius\");\n        assert_eq!(value, 22.2222);\n        Ok(())\n    } else {\n        Err(From::from(\"expected at least one record but got none\"))\n    }\n}\n```\n\nCalamine provides helper functions to deal with invalid type values. For\ninstance, to deserialize a column which should contain floats but may also\ncontain invalid values (i.e. strings), you can use the\n[`deserialize_as_f64_or_none`](https://docs.rs/calamine/latest/calamine/fn.deserialize_as_f64_or_none.html)\nhelper function with Serde's\n[`deserialize_with`](https://serde.rs/field-attrs.html) field attribute:\n\n```rust\nuse calamine::{deserialize_as_f64_or_none, open_workbook, RangeDeserializerBuilder, Reader, Xlsx};\nuse serde::Deserialize;\n\n#[derive(Deserialize)]\nstruct Record {\n    metric: String,\n    #[serde(deserialize_with = \"deserialize_as_f64_or_none\")]\n    value: Option\u003cf64\u003e,\n}\n\nfn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let path = format!(\"{}/tests/excel.xlsx\", env!(\"CARGO_MANIFEST_DIR\"));\n    let mut excel: Xlsx\u003c_\u003e = open_workbook(path)?;\n\n    let range = excel\n        .worksheet_range(\"Sheet1\")\n        .map_err(|_| calamine::Error::Msg(\"Cannot find Sheet1\"))?;\n\n    let iter_records =\n        RangeDeserializerBuilder::with_headers(\u0026[\"metric\", \"value\"]).from_range(\u0026range)?;\n\n    for result in iter_records {\n        let record: Record = result?;\n        println!(\"metric={:?}, value={:?}\", record.metric, record.value);\n    }\n\n    Ok(())\n}\n```\n\nThe\n[`deserialize_as_f64_or_none`](https://docs.rs/calamine/latest/calamine/fn.deserialize_as_f64_or_none.html)\nfunction discards all invalid values. If instead you would like to return them\nas `String`s, you can use the similar\n[`deserialize_as_f64_or_string`](https://docs.rs/calamine/latest/calamine/fn.deserialize_as_f64_or_string.html)\nfunction.\n\n### Reader: Simple\n\n```rust\nuse calamine::{Reader, Xlsx, open_workbook};\n\nlet mut excel: Xlsx\u003c_\u003e = open_workbook(\"file.xlsx\").unwrap();\nif let Ok(r) = excel.worksheet_range(\"Sheet1\") {\n    for row in r.rows() {\n        println!(\"row={:?}, row[0]={:?}\", row, row[0]);\n    }\n}\n```\n\n### Reader: With header row\n\n```rs\nuse calamine::{HeaderRow, Reader, Xlsx, open_workbook};\n\nlet mut excel: Xlsx\u003c_\u003e = open_workbook(\"file.xlsx\").unwrap();\n\nlet sheet1 = excel\n    .with_header_row(HeaderRow::Row(3))\n    .worksheet_range(\"Sheet1\")\n    .unwrap();\n```\n\nNote that `xlsx` and `xlsb` files support lazy loading, so specifying a\nheader row takes effect immediately when reading a sheet range.\nIn contrast, for `xls` and `ods` files, all sheets are loaded at once when\nopening the workbook with default settings.\nAs a result, setting the header row only applies afterward and does not\nprovide any performance benefits.\n\n### Reader: More complex\n\nLet's assume\n\n- the file type (xls, xlsx ...) cannot be known at static time\n- we need to get all data from the workbook\n- we need to parse the vba\n- we need to see the defined names\n- and the formula!\n\n```rust\nuse calamine::{Reader, open_workbook_auto, Xlsx, DataType};\n\n// opens a new workbook\nlet path = ...; // we do not know the file type\nlet mut workbook = open_workbook_auto(path).expect(\"Cannot open file\");\n\n// Read whole worksheet data and provide some statistics\nif let Some(Ok(range)) = workbook.worksheet_range(\"Sheet1\") {\n    let total_cells = range.get_size().0 * range.get_size().1;\n    let non_empty_cells: usize = range.used_cells().count();\n    println!(\"Found {} cells in 'Sheet1', including {} non empty cells\",\n             total_cells, non_empty_cells);\n    // alternatively, we can manually filter rows\n    assert_eq!(non_empty_cells, range.rows()\n        .flat_map(|r| r.iter().filter(|\u0026c| c != \u0026DataType::Empty)).count());\n}\n\n// Check if the workbook has a vba project\nif let Some(Ok(mut vba)) = workbook.vba_project() {\n    let vba = vba.to_mut();\n    let module1 = vba.get_module(\"Module 1\").unwrap();\n    println!(\"Module 1 code:\");\n    println!(\"{}\", module1);\n    for r in vba.get_references() {\n        if r.is_missing() {\n            println!(\"Reference {} is broken or not accessible\", r.name);\n        }\n    }\n}\n\n// You can also get defined names definition (string representation only)\nfor name in workbook.defined_names() {\n    println!(\"name: {}, formula: {}\", name.0, name.1);\n}\n\n// Now get all formula!\nlet sheets = workbook.sheet_names().to_owned();\nfor s in sheets {\n    println!(\"found {} formula in '{}'\",\n             workbook\n                .worksheet_formula(\u0026s)\n                .expect(\"sheet not found\")\n                .expect(\"error while getting formula\")\n                .rows().flat_map(|r| r.iter().filter(|f| !f.is_empty()))\n                .count(),\n             s);\n}\n```\n\n## Features\n\n- `dates`: Add date related fn to `DataType`.\n- `picture`: Extract picture data.\n\n### Others\n\nBrowse the [examples](https://github.com/tafia/calamine/tree/master/examples) directory.\n\n## Performance\n\nAs `calamine` is readonly, the comparisons will only involve reading an excel `xlsx` file and then iterating over the rows. Along with `calamine`, three other libraries were chosen, from three different languages:\n\n- [`excelize`](https://github.com/qax-os/excelize) written in `go`\n- [`ClosedXML`](https://github.com/ClosedXML/ClosedXML) written in `C#`\n- [`openpyxl`](https://foss.heptapod.net/openpyxl/openpyxl) written in `python`\n\nThe benchmarks were done using this [dataset](https://raw.githubusercontent.com/wiki/jqnatividad/qsv/files/NYC_311_SR_2010-2020-sample-1M.7z), a `186MB` `xlsx` file when the `csv` is converted. The plotting data was gotten from the [`sysinfo`](https://github.com/GuillaumeGomez/sysinfo) crate, at a sample interval of `200ms`. The program samples the reported values for the running process and records it.\n\nThe programs are all structured to follow the same constructs:\n\n`calamine`:\n\n```rust\nuse calamine::{open_workbook, Reader, Xlsx};\n\nfn main() {\n    // Open workbook\n    let mut excel: Xlsx\u003c_\u003e =\n        open_workbook(\"NYC_311_SR_2010-2020-sample-1M.xlsx\").expect(\"failed to find file\");\n\n    // Get worksheet\n    let sheet = excel\n        .worksheet_range(\"NYC_311_SR_2010-2020-sample-1M\")\n        .unwrap()\n        .unwrap();\n\n    // iterate over rows\n    for _row in sheet.rows() {}\n}\n```\n\n`excelize`:\n\n```go\npackage main\n\nimport (\n        \"fmt\"\n        \"github.com/xuri/excelize/v2\"\n)\n\nfunc main() {\n        // Open workbook\n        file, err := excelize.OpenFile(`NYC_311_SR_2010-2020-sample-1M.xlsx`)\n\n        if err != nil {\n                fmt.Println(err)\n                return\n        }\n\n        defer func() {\n                // Close the spreadsheet.\n                if err := file.Close(); err != nil {\n                        fmt.Println(err)\n                }\n        }()\n\n        // Select worksheet\n        rows, err := file.Rows(\"NYC_311_SR_2010-2020-sample-1M\")\n        if err != nil {\n                fmt.Println(err)\n                return\n        }\n\n        // Iterate over rows\n        for rows.Next() {\n        }\n}\n```\n\n`ClosedXML`:\n\n```csharp\nusing ClosedXML.Excel;\n\ninternal class Program\n{\n        private static void Main(string[] args)\n        {\n                // Open workbook\n                using var workbook = new XLWorkbook(\"NYC_311_SR_2010-2020-sample-1M.xlsx\");\n\n                // Get Worksheet\n                // \"NYC_311_SR_2010-2020-sample-1M\"\n                var worksheet = workbook.Worksheet(1);\n\n                // Iterate over rows\n                foreach (var row in worksheet.Rows())\n                {\n\n                }\n        }\n}\n```\n\n`openpyxl`:\n\n```python\nfrom openpyxl import load_workbook\n\n# Open workbook\nwb = load_workbook(\n    filename=r'NYC_311_SR_2010-2020-sample-1M.xlsx', read_only=True)\n\n# Get worksheet\nws = wb['NYC_311_SR_2010-2020-sample-1M']\n\n# Iterate over rows\nfor row in ws.rows:\n    _ = row\n\n# Close the workbook after reading\nwb.close()\n```\n\n### Benchmarks\n\nThe benchmarking was done using [`hyperfine`](https://github.com/sharkdp/hyperfine) with `--warmup 3` on an `AMD RYZEN 9 5900X @ 4.0GHz` running `Windows 11`. Both `calamine` and `ClosedXML` were built in release mode.\n\n```bash\n0.22.1 calamine.exe\n  Time (mean ± σ):     25.278 s ±  0.424 s    [User: 24.852 s, System: 0.470 s]\n  Range (min … max):   24.980 s … 26.369 s    10 runs\n\nv2.8.0 excelize.exe\n  Time (mean ± σ):     44.254 s ±  0.574 s    [User: 46.071 s, System: 7.754 s]\n  Range (min … max):   42.947 s … 44.911 s    10 runs\n\n0.102.1 closedxml.exe\n  Time (mean ± σ):     178.343 s ±  3.673 s    [User: 177.442 s, System: 2.612 s]\n  Range (min … max):   173.232 s … 185.086 s    10 runs\n\n3.0.10 openpyxl.py\n  Time (mean ± σ):     238.554 s ±  1.062 s    [User: 238.016 s, System: 0.661 s]\n  Range (min … max):   236.798 s … 240.167 s    10 runs\n```\n\n`calamine` is 1.75x faster than `excelize`, 7.05x faster than `ClosedXML`, and 9.43x faster than `openpyxl`.\n\nThe spreadsheet has a range of 1,000,001 rows and 41 columns, for a total of 41,000,041 cells in the range. Of those, 28,056,975 cells had values.\n\nGoing off of that number:\n\n- `calamine` =\u003e  1,122,279 cells per second\n- `excelize` =\u003e 633,998 cells per second\n- `ClosedXML` =\u003e 157,320 cells per second\n- `openpyxl` =\u003e 117,612 cells per second\n\n### Plots\n\n#### Disk Read\n\n![bytes_from_disk](https://github.com/RoloEdits/calamine/assets/12489689/fcca1147-d73f-4d1c-b273-e7e4c183ab29)\n\nAs stated, the filesize on disk is `186MB`:\n\n- `calamine` =\u003e `186MB`\n- `ClosedXML` =\u003e `208MB`.\n- `openpyxl` =\u003e  `192MB`.\n- `excelize` =\u003e `1.5GB`.\n\nWhen asking one of the maintainers of `excelize`, I got this [response](https://github.com/qax-os/excelize/issues/1695#issuecomment-1772239230):\n\u003e To avoid high memory usage for reading large files, this library allows user-specific UnzipXMLSizeLimit options when opening the workbook, to set the memory limit on the unzipping worksheet and shared string table in bytes, worksheet XML will be extracted to the system temporary directory when the file size is over this value, so you can see that data written in reading mode, and you can change the default for that to avoid this behavior.\n\u003e\n\u003e \\- xuri\n\n#### Disk Write\n\n![bytes_to_disk](https://github.com/RoloEdits/calamine/assets/12489689/befa9893-7658-41a7-8cbd-b0ce5a7d9341)\n\nAs seen in the previous section, `excelize` is writting to disk to save memory. The others don't employ that kind of mechanism.\n\n#### Memory\n\n![mem_usage](https://github.com/RoloEdits/calamine/assets/12489689/c83fdf6b-1442-4e22-8eca-84cbc1db4a26)\n\n![virt_mem_usage](https://github.com/RoloEdits/calamine/assets/12489689/840a96ed-33d7-44f7-8276-80bb7a02557f)\n\u003e [!NOTE]\n\u003e `ClosedXML` was reporting a constant `2.5TB` of virtual memory usage, so it was excluded from the chart.\n\nThe stepping and falling for `calamine` is from the grows of `Vec`s and the freeing of memory right after, with the memory usage dropping down again. The sudden jump at the end is when the sheet is being read into memory. The others, being garbage collected, have a more linear climb all the way through.\n\n#### CPU\n\n![cpu_usage](https://github.com/RoloEdits/calamine/assets/12489689/c3aa55a8-b008-48ee-ba04-c08bd91c1f6f)\n\nVery noisy chart, but `excelize`'s spikes must be from the GC?\n\n## Unsupported\n\nMany (most) part of the specifications are not implemented, the focus has been put on reading cell **values** and **vba** code.\n\nThe main unsupported items are:\n\n- no support for writing excel files, this is a read-only library\n- no support for reading extra contents, such as formatting, excel parameter, encrypted components etc ...\n- no support for reading VB for opendocuments\n\n## Credits\n\nThanks to [xlsx-js](https://github.com/SheetJS/js-xlsx) developers!\nThis library is by far the simplest open source implementation I could find and helps making sense out of official documentation.\n\nThanks also to all the contributors!\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftafia%2Fcalamine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftafia%2Fcalamine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftafia%2Fcalamine/lists"}