https://github.com/asinghvi17/rangeextractor.jl
A performant way to extract subsections of arrays, under a tiling scheme. Meant for arrays with slow I/O.
https://github.com/asinghvi17/rangeextractor.jl
big-data io raster
Last synced: 10 months ago
JSON representation
A performant way to extract subsections of arrays, under a tiling scheme. Meant for arrays with slow I/O.
- Host: GitHub
- URL: https://github.com/asinghvi17/rangeextractor.jl
- Owner: asinghvi17
- License: mit
- Created: 2024-11-10T02:50:12.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-17T22:36:09.000Z (11 months ago)
- Last Synced: 2025-02-17T23:28:25.707Z (11 months ago)
- Topics: big-data, io, raster
- Language: Julia
- Homepage: https://asinghvi17.github.io/RangeExtractor.jl/
- Size: 376 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.bib
Awesome Lists containing this project
README
# RangeExtractor
[](https://asinghvi17.github.io/RangeExtractor.jl/stable/)
[](https://asinghvi17.github.io/RangeExtractor.jl/dev/)
[](https://github.com/asinghvi17/RangeExtractor.jl/actions/workflows/CI.yml?query=branch%3Amain)
**RangeExtractor.jl** is a package for efficiently extracting and operating on subsets of large (out-of-memory) arrays, and is optimized for use with arrays that have very high load time.

## Installation
```julia
using Pkg
Pkg.add("RangeExtractor")
using RangeExtractor
```
## Quick Start
```julia
using RangeExtractor
# Create sample array
array = ones(20, 20)
# Define regions of interest, as ranges of indices.
# RangeExtractor only accepts tuples of unit ranges.
ranges = [
(1:4, 1:4),
(9:20, 11:20),
(1:15, 11:20),
(11:20, 1:10)
]
# Define tiling scheme (10x10 tiles)
tiling_strategy = FixedGridTiling{2}(10)
# Extract results, by invoking `extract` with:
# - a function that takes an array and returns some value.
# - a `do` block, which is a convenient way to provide an anonymous function.
# - a `TileOperation`, which is a more flexible way to provide an operation.
# here, we use a `do` block to sum the values in each range.
results = extract(array, ranges; strategy = tiling_strategy) do A
sum(A)
end
```
## Key features
- Multi-threaded, asynchronous processing: extract data from multiple tiles in parallel, and apply the operation to each tile in parallel.
- Split computations efficiently across tiles, choose whether to materialize the whole range requested or reduce sections by some intermediate product.
- Flexible tiling schemes: define your own tiling scheme that encodes your knowledge of the data.
- Completely flexible operations.
RangeExtractor.jl also integrates with Rasters.jl, so you can call `Rasters.zonal(f, raster, strategy; of = geoms, ...)` to use RangeExtractor to accelerate your zonal computations.
## Generic to any Array
RangeExtractor.jl is designed to be generic to any array type, as long as it supports AbstractArray-like indexing.
Here's an example of using RangeExtractor.jl to calculate zonal statistics on a raster dataset, using a custom operation. This is faster single-threaded than Rasters.jl is multithreaded, since it can split computation a
```julia
using RangeExtractor
using Rasters, ArchGDAL
using RasterDataSources, NaturalEarth
import GeoInterface as GI
# Load raster dataset
ras = Raster(WorldClim{Climate}, :tmin, month=1)
# Get country polygons
countries = naturalearth("admin_0_countries", 10)
# Convert extents to index ranges
ranges = Rasters.dims2indices.((ras,), Rasters.Touches.(GI.extent.(countries.geometry)))
# Define tiling scheme
strategy = FixedGridTiling{2}(100)
# Define zonal statistics operation.
# Here, we use a `TileOperation` to define a fully custom operation.
# - `contained` is applied to each range that is fully contained within a tile,
# and returns the final result for that range.
# - `shared` is applied to each range that is partially contained or shared with another tile,
# and returns some intermediate result that is stored.
# - `combine` is applied to the results of all the `shared` operations for a range,
# and returns the final result for that range.
op = TileOperation(
contained = (x, meta) -> zonal(sum, x; of=meta),
shared = (x, meta) -> zonal(sum, x; of=meta),
combine = (results, args...) -> sum(results)
)
# Calculate zonal statistics
results = RangeExtractor.extract(
op, # the operation to perform
ras, # the raster to extract from
ranges, # the ranges to extract
countries.geometry; # the "metadata" - in this case, the polygons to calculate zonal statistics over
strategy = strategy # the tiling strategy to use
)
```
## Similar approaches elsewhere
- `exactextract` in R and Python has a somewhat similar strategy for operating on large, out-of-memory rasters, but it is forced to keep all vector statistics materialized in memory. See https://isciences.github.io/exactextract/performance.html#the-raster-sequential-strategy. It does not support multithreading, or flexible user-defined operations.
## Acknowledgements
This effort was funded by the NASA MEaSUREs program in contribution to the Inter-mission Time Series of Land Ice Velocity and Elevation (ITS_LIVE) project (https://its-live.jpl.nasa.gov/).