Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wetneb/refine-memory-benchmark
small prototype to evaluate memory usage of OpenRefine grids
https://github.com/wetneb/refine-memory-benchmark
Last synced: about 2 months ago
JSON representation
small prototype to evaluate memory usage of OpenRefine grids
- Host: GitHub
- URL: https://github.com/wetneb/refine-memory-benchmark
- Owner: wetneb
- Created: 2022-12-07T08:12:01.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2023-06-27T07:43:16.000Z (over 1 year ago)
- Last Synced: 2024-10-13T14:15:17.487Z (3 months ago)
- Language: Jupyter Notebook
- Size: 37.1 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# refine-memory-benchmark
Small application to test the amount of memory taken up by an OpenRefine grid with its new architecture.
It consists in two parts:
* a Java application which loads up various OpenRefine projects (to be dropped as an OpenRefine workspace initialized in the `workspace/` subdirectory of this repository), and measures the size of their grid when loaded in memory (in bytes). Those statistics are collected in the `stats.tsv` file
* a Jupyter notebook to analyze the statistics and train a model to predict the memory size of a grid given various characteristics.## Current results
The size in bytes of a grid can be predicted as: `(85 * columnsNotReconciled + 343 * columnsReconciled + 980) * rows`.
This could be refined by taking more samples, computing the sparsity of the grid (which could then be estimated on a sample of rows), or other features.
MIT license