https://github.com/roaringbitmap/real-roaring-datasets
for benchmarking other implementations, just the datasets from https://github.com/RoaringBitmap/RoaringBitmap/tree/master/real-roaring-dataset/src/main/resources/real-roaring-dataset
https://github.com/roaringbitmap/real-roaring-datasets
Last synced: 9 months ago
JSON representation
for benchmarking other implementations, just the datasets from https://github.com/RoaringBitmap/RoaringBitmap/tree/master/real-roaring-dataset/src/main/resources/real-roaring-dataset
- Host: GitHub
- URL: https://github.com/roaringbitmap/real-roaring-datasets
- Owner: RoaringBitmap
- Created: 2016-12-19T22:18:24.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2021-06-25T20:30:09.000Z (over 4 years ago)
- Last Synced: 2025-04-19T19:05:07.422Z (10 months ago)
- Language: Go
- Homepage:
- Size: 43.1 MB
- Stars: 7
- Watchers: 6
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Real data sets for bitmap testing
==
See also https://github.com/RoaringBitmap/CRoaring/tree/master/benchmarks/realdata for uncompressed .txt versions.
Essentially, each file represents a set of integer values. You can create
bitmaps out of these files.
In many cases, the description of the data sets is provided in :
* Samy Chambi, Daniel Lemire, Owen Kaser, Robert Godin, Better bitmap performance with Roaring bitmaps, arXiv:1402.6407.
http://arxiv.org/abs/1402.6407
To be used with software published on http://roaringbitmap.org/
Files starting with the prefix "dimension" were prepared by Xavier Léauté from
a Druid dump.
---
There is one special file (bitsets_1925630_96.gz) which is a binary file. All other files are just zipped text files. This special file can be deserialized by first reading an int, that is the amout of rows to come (e.g. 1925630 rows)
A row is read by first reading an int, the amount of longs to come (e.g. 96 longs), and then reading those longs.
Used DataInputStream to write this.