An open API service indexing awesome lists of open source software.

https://github.com/program--/flatpoints

Flattened Points of Interest Data Format
https://github.com/program--/flatpoints

Last synced: 10 months ago
JSON representation

Flattened Points of Interest Data Format

Awesome Lists containing this project

README

          

# flatpoints

**flatpoints** is an in-development data format for representing coordinates, particularly [*points of interest*](https://en.wikipedia.org/wiki/Point_of_interest), as a compressed list of integers (i.e. an [inverted list](https://en.wikipedia.org/wiki/Inverted_index)).

> **Warning**: this library is currently in active development, so its API and the underlying data format are subject to change at any moment.

## Format

Section | Description | Bytes
------- | -------------------- | ------
Header | Magic Number | 3 Bytes
Header | Start of Data (S) | 8 Bytes
Header | Coordinate Count (N) | 8 Bytes
Header | Properties Count | 8 Bytes
Header | Property Types | N Bytes
Header | Property Names | ((8 * (N + 3) + 3) - S) Bytes
Header | Property Offsets | 8 * N Bytes
Data | Coordinate Data |
Data | Property 1 Data |
Data | Property 2 Data |
Data | ... |
Data | Property N Data |

## Braindumping

- Coordinate Encoding Codec: Sorted Hilbert Delta Indexing
- Integer Compression Codec: LEB128
- Properties Compression: Gzip

> **Note**: These codecs are defaults for now, the plan is to allow for other encodings in later implementations.

The general idea for the *flatpoints* data format is to store point geometry in a small footprint, with cloud-native capabilities. The offset approach allows users to take advantage of [HTTP Range Requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests).

Coordinates are first projected to 1D by gridding the coordinates using the world extent, then filling the space with a [Hilbert Curve](https://en.wikipedia.org/wiki/Hilbert_curve). Point positions are then associated along the curve to get an unsigned 64-bit integer representation. These integers (and the associated properties) are then sorted in ascending order, and [delta encoded](https://en.wikipedia.org/wiki/Delta_encoding). The resulting differences are then compressed using a codec of choice. In this case, they are compressed into [LEB128](https://en.wikipedia.org/wiki/LEB128) form. This encoding scheme is *slightly* lossy, but can be mitigated to be practically lossless by modifying the extent of the coordinate grid, and the number of iterations (exponent) for the Hilbert Curve. The defaults used (world extent with `N = 31`) are the maximums, and result in a (at most) ±1m margin of error in accuracy (see: [Decimal Degrees](http://wiki.gis.com/wiki/index.php/Decimal_degrees)). This can be reduced to <±1cm margin of error in accuracy by reducing the extent to the coordinate's extent, but would require storing the extent to disk/in-memory.

> See [spress](https://github.com/UFOKN/spress) for the basis of this idea.

Reading the flatpoints format starts with a pre-flight request of reading the first 11 bytes, to ensure the file is a flatpoints file by the magic number, and to get the header size. The start of data (SOD) explicitly states the first (inclusive) position where the *coordinate data* starts, and implicitly gives the header size `(SOD - 1)`.

A header request can then be made to get the entirety of the header.

Coordinate data and the feature properties can be read independently of one another. So, if a user only requires the coordinates and property 1, then they can read only those byte ranges via the offsets & SOD.

## References
- Giulio Ermanno Pibiri, Rossano Venturini (2019). "Techniques for Inverted Index Compression." [arXiv:1908.10598v2](https://arxiv.org/abs/1908.10598v2).
- Vicente Romera Calero (2022). "VTEnc". [GitHub Repository](https://github.com/vteromero/VTEnc).
- Vukovic, Tibor (2016). "Hilbert-Geohash - Hashing Geographical Point Data Using the Hilbert Space-Filling Curve."

## License

The reference implementation of flatpoints is licensed under APLv2.
The format specification for the flatpoints data format is public domain (or under a CC0 license as applicable).