https://github.com/tpapp/largecolumns.jl
Handle large columns (vectors of equal length) with bits types in Julia using mmap.
https://github.com/tpapp/largecolumns.jl
Last synced: 11 months ago
JSON representation
Handle large columns (vectors of equal length) with bits types in Julia using mmap.
- Host: GitHub
- URL: https://github.com/tpapp/largecolumns.jl
- Owner: tpapp
- License: other
- Created: 2017-10-24T08:40:59.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2020-12-20T09:15:59.000Z (over 5 years ago)
- Last Synced: 2025-02-28T16:20:11.785Z (over 1 year ago)
- Language: Julia
- Size: 27.3 KB
- Stars: 4
- Watchers: 3
- Forks: 2
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# LargeColumns
[](http://www.repostatus.org/#wip)
[](https://travis-ci.org/tpapp/LargeColumns.jl)
[](https://coveralls.io/github/tpapp/LargeColumns.jl?branch=master)
[](http://codecov.io/github/tpapp/LargeColumns.jl?branch=master)
Manage large vectors of bits types in Julia. A thin wrapper for
mmapped binary data, with a few sanity checks and convenience
functions.
## Specification
For each dataset, the columns (vectors of equal length) and metadata
are stored in a directory like this:
```
dir/
├── layout.jld2
├── meta/
│ └ ...
├── 1.bin
├── 2.bin
├── ...
├── ...
└── ...
```
The file `layout.jld2` specifies the number and types of columns (using
[JLD2.jl](https://github.com/simonster/JLD2.jl), and the total number of
elements. The `$i.bin` files contain the data for each column, which
can be [memory mapped](https://en.wikipedia.org/wiki/Memory-mapped_file).
Additional metadata can be saved as in files in the directory
`meta`. This is ignored by this library; use the function `meta_path`
to calculate paths relative to `dir/meta`.
## Interfaces
Two interfaces are provided. Use `SinkColumns` for an *ex ante*
unknown number of elements, written sequentially. This is useful for
ingesting data.
`MmappedColumns` is useful when the number of records is known and
fixed.
Types for the columns are specified as `Tuple`s. See the docstrings
for both interfaces and the unit tests for examples.
# Acknowledgments
Work on this library was supported by the Austrian National Bank
Jubiläumsfonds grant #17378.