{"id":17646653,"url":"https://github.com/davidssmith/rawarray-rust","last_synced_at":"2026-01-31T20:36:40.895Z","repository":{"id":62443674,"uuid":"231956081","full_name":"davidssmith/rawarray-rust","owner":"davidssmith","description":"Rust implementation of the RawArray file format","archived":false,"fork":false,"pushed_at":"2020-02-12T17:58:02.000Z","size":155,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-28T08:23:03.662Z","etag":null,"topics":["arrays","computational-science","file-format","hdf5","numerical-methods","rawarray","rawarray-format","rust","rust-crate","rust-library","scientific-computing"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davidssmith.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-01-05T18:03:29.000Z","updated_at":"2024-04-15T20:23:28.000Z","dependencies_parsed_at":"2022-11-01T22:16:20.060Z","dependency_job_id":null,"html_url":"https://github.com/davidssmith/rawarray-rust","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidssmith%2Frawarray-rust","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidssmith%2Frawarray-rust/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidssmith%2Frawarray-rust/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidssmith%2Frawarray-rust/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davidssmith","download_url":"https://codeload.github.com/davidssmith/rawarray-rust/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247589835,"owners_count":20963022,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrays","computational-science","file-format","hdf5","numerical-methods","rawarray","rawarray-format","rust","rust-crate","rust-library","scientific-computing"],"created_at":"2024-10-23T11:04:56.093Z","updated_at":"2026-01-31T20:36:35.867Z","avatar_url":"https://github.com/davidssmith.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"Welcome to the RawArray crate!\n===========================\n\n[![Crates.io](https://img.shields.io/crates/v/rawarray.svg)](https://crates.io/crates/rawarray/)\n[![docs.rs](https://docs.rs/rawarray/badge.svg)](https://docs.rs/rawarray/)\n\n\u003cp align=\"center\"\u003e \u003cimg src=\"ra.svg\" width=\"120\"\u003e \u003c/p\u003e\n\nIntroduction\n------------\n\nRawArray is a simple file format for storing n-dimensional arrays. The extension\n`.ra` can be pronounced *arr-ay* or *rah* (as in \"raw\", or the Egyptian sun\ngod).\n\n\nRawArray was designed to be portable, fast, and storage\nefficient. For scientific applications in particular, it can allow the simple\nstorage of large arrays without a separate header file to store the\ndimensions and type metadata. \n\nI believe the world doesn't need another hierarchical data container. We already have one of \nthose---it's called a filesystem. What is needed is a simple one-to-one mapping of data structures to disk files that preserves metadata and is fast and simple to read and write.\n\nIn addition to int, uint, and float of arbitrary sizes, RawArray also supports\n\n(1) **complex floats**: which other common formats, such as HDF5, don't have.\n\n(2) **composite types**: RawArray handles reading and writing these, but the encoding and decoding of those is left to the user, since only they can know the structure of their `struct`. Decoding can be as simple as a typecast, however, for types of fixed size. In Rust they are read as a `Vec\u003cT\u003e`, so you are free to handle it as you like.\n\nAs an aside, the RawArray format is technically recursive (or fractal?!). You could store an array of RawArray files in a RawArray file if you want by defining the file as a composite type.\n\nFormat\n-----------\n\nThe file format is a simple concatenation of a header array and a data array. The header is made up of at least seven 64-bit unsigned integers. The array data is whatever you want it to be. Optionally text or binary metadata can be appended to the end of the file with no harmful effects, but this data is not saved or written back out by the library. It is up to you to keep track of it.\n\n### File Structure\n\n| offset (bytes) | object | type           | meaning \n|----------------|--------|----------------|---------\n|                |        |                | **HEADER**\t\n| 0              | magic  | UInt64         | magic number\n| 8              | flags  | UInt64         | endianness, future options\n| 16             | eltype | UInt64         | element type code\n| 24             | elbyte | UInt64         | element size in bytes\n| 32             | size   | UInt64         | data segment length in bytes\n| 40             | ndims  | UInt64         | number of array dimensions\n| 48             | dims   | Vector{UInt64} | array dimensions\n| 48 + 8 x ndims | data   | Vector{UInt8}  | **ARRAY DATA**\n| 48 + 8 x ndims + size | - | -             | **VOLATILE METADATA**\n\n### Elemental Type Specification\n\n| code | type\n| ---- | ----\n| 0    | user-defined\n| 1    | signed integer\n| 2    | unsigned integer\n| 3    | floating point (IEEE-754 standard)\n| 4    | complex float (pairs of IEEE floats)\n| 5    | brain floats\n\nThe width of these types is defined separately in the `elbyte` field. For example, \n\n* a 32-bit unsigned integer would be `eltype = 2`, `elbyte = 4`;\n* a single-precision complex float (pairs of 32-bit floats) would be `eltype = 4`, `elbyte = 8`;\n* a string would be `eltype = 2`, `elbyte = 1`, and `size` would contain the length of the string.\n\nThe user-defined struct\n\n```\nstruct Foo {\n   info: [u8; 12],\n   index: u32,\n   v: [f32; 8],\n}\n``` \n\ncontains a 12-byte array, a 4-byte int, and 8 8-byte floats, so the total size is 80 bytes. It would be coded as `eltype = 0`, `elbyte = 80`.\n\nThe data is written and read as the binary representation of the hardware you are on. Right now **little endian is assumed**, but big endian support can be added if there is interest.\n\n### Memory Order\n\nThe RawArray format is **column major**, so the first dimension will be the fastest varying one in memory. This decision was made because the majority of scientific languages are traditionally column major, and although C is technically row major it is actually agnostic in applications where multi-dimensional arrays are accessed through computed linear indices (e.g. CUDA).  Of the supplied examples, all are column major except Python. In the case of Python, instead of reading the array into Python and reordering to non-optimal stride, we simply transpose the dimensions before writing and after reading. This means the array looks transposed in Python, but the same dimensions have the same strides in all languages. In other words, the last dimension of the array in Python will be the first one in Julia and Matlab.\n\n\n\nFile Introspection\n------------------\nTo get a better handle on the format of an RawArray file, let's look inside one. If you are on a Unix system or have Cygwin installed on Windows, you can examine the contents of an RawArray file using command line tools.  For this section, we will use the `test.ra` file provided in the `julia/` subdirectory. \n\nFirst, let's pretend you don't know the dimensionality of the array. Then\n\n\n```\n\u003e od -t uL -N 48 test.ra\n0000000              8746397786917265778              0\n0000020              4                                8\n0000040              96                               2\n0000060\n\n```\n\nshows the dimension (2) as the second number on the third line. The command is extracting the first 48 bytes and formatting them as UInt64s. The ridiculous number listed first is the magic number indicating that this is an RawArray file. A slightly different command illuminates that:\n\n```\n\u003e od -a -N 16 test.ra\n0000000    r   a   w   a   r   r   a   y nul nul nul nul nul nul nul nul\n0000020\n```\n\nArmed with the knowledge that the array is 2D, we know that the header is `48 + 2*8 = 64` bytes long. The command to skip the header and view only the data would be:\n\n```\n\u003e od -j 64 -f test.ra\n0000100     0.000000e+00            -inf    1.000000e+00   -1.000000e+00\n0000120     2.000000e+00   -5.000000e-01    3.000000e+00   -3.333333e-01\n0000140     4.000000e+00   -2.500000e-01    5.000000e+00   -2.000000e-01\n0000160     6.000000e+00   -1.666667e-01    7.000000e+00   -1.428571e-01\n0000200     8.000000e+00   -1.250000e-01    9.000000e+00   -1.111111e-01\n0000220     1.000000e+01   -1.000000e-01    1.100000e+01   -9.090909e-02\n0000240\n```\n\nHere we are using `-j` to skip the first 64 bytes and `-f` to format the byte data as single-precision floats. Note `od` doesn't understand complex numbers, but the complex data is stored as real and imaginary float pairs that are contiguous on disk. This means that each line of the output is showing two complex numbers with columns 1 and 3 the real parts and columns 2 and 4 the imaginary parts. Notice that it correctly renders the negative infinity.\n\nGetting\n-------\n\nTo install for Rust, add the package entry to your `Cargo.toml` and in your\nsource add `use rawarray::RawArray`.  \n\nA binary utility called `ra` is included that allows you to do some basic\noperations on the command line, mostly examining header values, but also\nreshaping arrays.\n\nImplementations of RawArray exist for [Python, C, and\nMatlab](https://github.com/davidssmith/ra), and I have written a  Julia package called [RawArray.jl](https://github.com/davidssmith/RawArray.jl).\n\nAnd ultimately the format is simple enough for anyone to roll their own\nimplementation on the fly, or cut corners, like skipping the header entirely and\nmemory mapping the array data part of the file.\n\n\nUsage\n-----\n\nAn example usage in your Rust source code would be:\n\n```rust\nuse rawarray::RawArray;\nuse std::io;\nfn main() -\u003e io::Result\u003c()\u003e {\n\tlet vec1: Vec\u003cf32\u003e = vec![1.0, 2.0, 3.0, 4.0];\n\tlet ra: RawArray\u003cf32\u003e = vec1.clone().into();\n\tra.write(\"myarray.ra\")?;\n\n\tlet vec2: Vec\u003cf32\u003e = RawArray::\u003cf32\u003e::read(\"myarray.ra\")?.into();\n\tassert_eq!(vec1, vec2);\n\tOk(())\n}\n```\n\nChecksums and Time Stamping\n-----------------------------\n\nA data checksum or time stamp was deliberately not included in the format because it is impossible to checksum a file with its checksum inside it.** Existing methods (e.g. tar) often zero out the checksum field and then checksum the rest of the file, but this requires special software that understands the format, so standard command line checksum tools won't work. Checksum verification is best left to external means, even if it requires a separate file. \n\nTime stamping is also not necessary, because file systems already provide that. Adding a time stamp that changes upon rewrite or access also foils checksum attemps. HDF5 files are very difficult to checksum for this reason.  It is our belief that the checksum should depend upon data properties only, not any chronology. Two files are identical if they contain identical data, no matter when they were created or accessed last.\n\nTo checksum an RawArray file, simple run your local checksum command. For example, on\nlinux: \n\n```\n\u003e md5sum examples/test.ra\n1dd9f98a0d57ec3c4d8ad50343bd20cd  examples/test.ra\n```\n\n ** Not technically impossible, but extremely difficult computationally.\n \nGetting Help\n------------\n\nFor help, file an issue on the [bug tracker](http://github.com/davidssmith/rawarray-rust/issues) or email one of the authors.  Third party help is welcome and can be contributed through pull requests.\n\nAuthors\n-------\nDavid S. Smith [\u003cdavid.smith@gmail.com\u003e](mailto:david.smith@gmail.com)\n\nDisclaimer\n----------\nThis code comes with no warranty. Use at your own risk. If it breaks, let us know, and we'll try to help you fix it. \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidssmith%2Frawarray-rust","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavidssmith%2Frawarray-rust","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidssmith%2Frawarray-rust/lists"}