{"id":18414896,"url":"https://github.com/xerial/larray","last_synced_at":"2025-05-15T20:05:22.050Z","repository":{"id":7495299,"uuid":"8844630","full_name":"xerial/larray","owner":"xerial","description":"Large off-heap arrays and mmap files for Scala and Java","archived":false,"fork":false,"pushed_at":"2022-11-30T18:43:55.000Z","size":1727,"stargazers_count":403,"open_issues_count":26,"forks_count":43,"subscribers_count":23,"default_branch":"master","last_synced_at":"2025-05-14T12:10:00.303Z","etag":null,"topics":["memory-allocation","scala"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xerial.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-03-18T01:19:50.000Z","updated_at":"2025-05-05T09:55:04.000Z","dependencies_parsed_at":"2023-01-13T14:24:44.996Z","dependency_job_id":null,"html_url":"https://github.com/xerial/larray","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xerial%2Flarray","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xerial%2Flarray/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xerial%2Flarray/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xerial%2Flarray/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xerial","download_url":"https://codeload.github.com/xerial/larray/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254414499,"owners_count":22067272,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["memory-allocation","scala"],"created_at":"2024-11-06T03:52:37.401Z","updated_at":"2025-05-15T20:05:13.643Z","avatar_url":"https://github.com/xerial.png","language":"Scala","readme":"LArray [![Maven Central](https://maven-badges.herokuapp.com/maven-central/org.xerial.larray/larray_2.12/badge.svg)](https://maven-badges.herokuapp.com/maven-central/org.xerial.larray/larray_2.12/) [![Build Status](https://travis-ci.org/xerial/larray.svg?branch=master)](https://travis-ci.org/xerial/larray)\n=== \nA library for managing large off-heap arrays that can hold more than 2G (2^31) entries in Java and Scala. Notably LArray is *disposable* by calling `LArray.free`. Even if you forget to release it, GC will automatically deallocate the memory acquired by LArray. LArray also supports  `mmap` (memory-mapped file) whose size is more than 2GB. \n\n## Features \n * LArray can create arrays with more than 2G(2^31) entries.\n   * 2^31 -1 (2G) is the limitation of the default Java/Scala array size, because these arrays use 32-bit signed integer (int) as indexes. LArray uses long type indexes of 64-bit signed integers to resolve this limitation.\n   * For example, the entire human genome data (3GB) can be stored in LArray. \n * LArray can be released immediately from the memory.\n   * Call `LArray.free`.\n   * The default arrays in Java/Scala stay in JVM heaps until they are collected by GC, so it is generally difficult to avoid `OutOfMemoryException` when working with large amount of data. For example, call `new Array[Int](1000)` x 10,000 times. You are lucky if you don't see OutOfMemoryException.\n * LArray can be collected by Garbage Collector (GC)\n   * Even if you forget to call LArray.free, the acquired memory will be released when GC sweeps LArray instances.\n   * To prevent accidental memory release, keep a reference to LArray somewhere (e.g., in List) as in the standard Java/Scala program.\n * LArray resides in off-heap memory \n   * LArray uses a memory space outside the JVM heap, so creating LArrays with more than -Xmx(maximum heap size) is possible. This is useful when you need large amount of memory, or it is unknown how much memory is required in your application.\n * Fast memory allocation\n   * LArray internally uses a concurrent memory allocator suited to multi-threaded programs, which is faster than the default JVM memory allocator.\n   * LArray by default skips the array initialization (zero-filling), which improves the memory allocation speed significantly.\n * LArray can be used as DirectBuffer\n   * Enables zero-copy transfer to/from files, network, etc.\n   * Zero-copy compression with [snappy-java](https://github.com/xerial/snappy-java) (supported since version 1.1.0-M4. Pass LArray.address to Snappy.rawCompress etc.) \n * Rich set of operations for LArray[A]\n   * map, filter, reduce, zip, etc. Almost all collection operations in Scala are already implemented for LArray[A].\n * Supports Memory-mapped file larger than 2GB \n   * Use `LArray.mmap`\n   * It can create memory regions that can be shared between processes.\n\n## Limitations\n\n  * LArray[A] of generic objects (e.g., LArray[String], LArray[AnyRef]) cannot be released immedeately from the main memory, because objects other than primitive types need to be created on JVM heaps and they are under the control of GC. \n    * To release objects from main memory, you need to create *off-heap* objects. For example, create a large `LArray[Byte]`, then align your object data on the array. Object parameters can be retrieved with `LArray[Byte].getInt(offset)`, `getFloat(offset)`, etc.  \n\n\n## Performance\n\n### Memory allocation\nHere is a simple benchmark result that compares concurrent memory-allocation performances of LArray (with or without zero-filling), java arrays, `ByteBuffer.allocate` and `ByteBuffer.allocateDirect`, using Mac OS X with 2.9GHz Intelli Core i7. This test allocates 100 x 1MB of memory space concurrently using multiple threads, and repeats this process 20 times. \n\n```\n-concurrent allocation\ttotal:2.426 sec. , count:   10, avg:0.243 sec. , core avg:0.236 sec. , min:0.159 sec. , max:0.379 sec.\n  -without zero-filling\ttotal:0.126 sec. , count:   20, avg:6.279 msec., core avg:2.096 msec., min:1.405 msec., max:0.086 sec.\n  -with zero-filling\ttotal:0.476 sec. , count:   20, avg:0.024 sec. , core avg:0.023 sec. , min:0.017 sec. , max:0.037 sec.\n  -java array     \t    total:0.423 sec. , count:   20, avg:0.021 sec. , core avg:0.021 sec. , min:0.014 sec. , max:0.029 sec.\n  -byte buffer    \t    total:1.028 sec. , count:   20, avg:0.051 sec. , core avg:0.044 sec. , min:0.014 sec. , max:0.216 sec.\n  -direct byte buffer   total:0.360 sec. , count:   20, avg:0.018 sec. , core avg:0.018 sec. , min:0.015 sec. , max:0.026 sec.\n```\n\nAll allocators except LArray are orders of magnitude slower than LArray, and consumes CPUs because they need to fill the allocated memory with zeros due to their specification.\n\nIn a single thread execution, you can see more clearly how fast LArray can allocate memories.    \n```\n-single-thread allocation\ttotal:3.655 sec. , count:   10, avg:0.366 sec. , core avg:0.356 sec. , min:0.247 sec. , max:0.558 sec.\n  -without zero-filling\ttotal:0.030 sec. , count:   20, avg:1.496 msec., core avg:1.125 msec., min:0.950 msec., max:8.713 msec.\n  -with zero-filling\ttotal:0.961 sec. , count:   20, avg:0.048 sec. , core avg:0.047 sec. , min:0.044 sec. , max:0.070 sec.\n  -java array     \t    total:0.967 sec. , count:   20, avg:0.048 sec. , core avg:0.037 sec. , min:0.012 sec. , max:0.295 sec.\n  -byte buffer    \t    total:0.879 sec. , count:   20, avg:0.044 sec. , core avg:0.033 sec. , min:0.014 sec. , max:0.276 sec.\n  -direct byte buffer\ttotal:0.812 sec. , count:   20, avg:0.041 sec. , core avg:0.041 sec. , min:0.032 sec. , max:0.049 sec.\n```\n\n### Snappy Compression\n\nLArray (and LBuffer) has memory address that can be used for seamlessly interacting with fast native methods through JNI. Here is an example of using `rawCompress(...)` in [snappy-java](http://github.com/xerial/snappy-java), which can take raw-memory address to compress/uncompress the data using C++ code, and is generally faster than [Dain's pure-java version of Snappy](http://github.com/dain/snappy).\n\n```\n[SnappyCompressTest]\n-compress       \ttotal:0.017 sec. , count:   10, avg:1.669 msec., core avg:0.769 msec., min:0.479 msec., max:0.010 sec.\n  -LBuffer -\u003e LBuffer (raw)\ttotal:1.760 msec., count:   50, avg:0.035 msec., core avg:0.030 msec., min:0.024 msec., max:0.278 msec.\n  -Array -\u003e Array (raw) \ttotal:1.450 msec., count:   50, avg:0.029 msec., core avg:0.027 msec., min:0.023 msec., max:0.110 msec.\n  -Array -\u003e Array (dain)\ttotal:0.011 sec. , count:   50, avg:0.225 msec., core avg:0.141 msec., min:0.030 msec., max:4.441 msec.\n[SnappyCompressTest]\n-decompress     \ttotal:7.722 msec., count:   10, avg:0.772 msec., core avg:0.473 msec., min:0.418 msec., max:3.521 msec.\n  -LBuffer -\u003e LBuffer (raw)\ttotal:1.745 msec., count:   50, avg:0.035 msec., core avg:0.029 msec., min:0.020 msec., max:0.331 msec.\n  -Array -\u003e Array (raw) \ttotal:1.189 msec., count:   50, avg:0.024 msec., core avg:0.021 msec., min:0.018 msec., max:0.149 msec.\n  -Array -\u003e Array (dain)\ttotal:2.571 msec., count:   50, avg:0.051 msec., core avg:0.027 msec., min:0.025 msec., max:1.240 msec.\n```\n\n * [Test code](larray/src/test/scala/xerial/larray/SnappyCompressTest.scala)\n\n  \n\n## Modules\n\nLArray consists of three-modules.\n\n * **larray-buffer** (Java) Off-heap memory buffer `LBuffer` and its allocator with GC support.\n * **larray-mmap**   (Java + JNI (C code)) Memory-mapped file implementation `MMapBuffer`\n * **larray** (Scala and Java API) Provides rich set of array operations through `LArray` interface.\n\nYou can use each module independently. For example, if you only need an off-heap memory allocator that collects memory upon GC, use `LBuffer` in **larray-buffer**. \n\nSimply you can include **larray** to the dependency in Maven or SBT so that all modules will be added to your classpaths.\n\n## Supported Platforms\n\nA standard JVM, (e.g. Oracle JVM (standard JVM, HotSpotVM) or OpenJDK) must be used since \n**larray-buffer** depends on `sun.misc.Unsafe` class to allocate off-heap memory.\n\n**larray-mmap** (MMapBuffer and LArray.mmap) uses JNI and is available for the following major CPU architectures:\n\n * Windows (32/64-bit)\n * Linux (i368, amd64 (Intel 64-bit), arm, armhf)\n * Mac OSX (Intel 64bit)\n\n\n## History\n * 2016-12-13: vesrion 0.4.0 - Fix #52. Support Scala 2.12. Use [wvlet-log](https://github.com/wvlet/log) for internal logging.\n * 2016-05-12: version 0.3.4 - Minor performance improvement release\n * 2016-03-04: version 0.3.3 - Add Scala 2.11.7, 2.10.6 support\n * March 4th, 2016  version 0.3.0 - Scala 2.11.7 support\n * November 11, 2013  version 0.2.1 - Use orgnization name `org.xerial.larray`. Add LBuffer.view.  \n * November 11, 2013  version 0.2 - Extracted pure-java modules (larray-buffer.jar and larray-mmap.jar) from larray.jar (for Scala). \n * August 28, 2013  version 0.1.2 - improved memory layout\n * August 28, 2013  version 0.1.1 (for Scala 2.10.2)\n * Apr 23, 2013   Released version 0.1\n\n## Usage (Scala)\n\n### sbt settings\nAdd the following sbt dependency to your project settings:\n\n```scala\nlibraryDependencies += \"org.xerial.larray\" %% \"larray\" % \"0.4.0\"\n```\n\n * Using snapshot versions:\n\n```scala\nresolvers += \"Sonatype shapshot repo\" at \"https://oss.sonatype.org/content/repositories/snapshots/\"\n\nlibraryDependencies += \"org.xerial.larray\" %% \"larray\" % \"0.4.1-SNAPSHOT\"\n```\n### Example\n\nLArray can be used in the same manner with the standard Scala Arrays: \n\n```scala\nimport xerial.larray._\n\nval l = LArray(1, 2, 3)\nval e = l(0) // 1\nprintln(l.mkString(\", \")) // 1, 2, 3\nl(1) = 5\nprintln(l.mkString(\", \")) // 1, 5, 3\n    \n// Create an LArray of Int type\nval l2 = LArray.of[Int](10000L)\n\n// Release the memory resource\nl2.free \n\nl2(0) // The result of accessing released LArray is undefined\n```\n\nFor more examples, see [xerial/larray/example/LArrayExample.scala](larray/src/main/scala/xerial/larray/example/LArrayExample.scala)\n\n## Usage (Java)\n\nAdd the following dependency to your pom.xml (Maven):\n```xml\n\u003cdependency\u003e\n  \u003cgroupId\u003eorg.xerial.larray\u003c/groupId\u003e\n  \u003cartifactId\u003elarray_2.12\u003c/artifactId\u003e\n  \u003cversion\u003e0.4.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n### Example \n\nIn Java we cannot provide concise syntaxes as in Scala. Instead, use `apply` and `update` methods to read/write values in an array.\n\n```java\nimport xerial.larray.japi.LArrayJ;\nimport xerial.larray.*;\n\nLIntArray l = LArrayJ.newLIntArray(10000L);\nl.update(0L, 20); // Set l[0L] = 20\nint e0 = l.apply(0L);  //  Get l[0L]\n\n// release \nl.free();\n```\nFor more examples, see [xerial/larray/example/LArrayJavaExample.java](larray/src/main/scala/xerial/larray/example/LArrayJavaExample.java)\n\n## Scaladoc\n\n * [LArray Scala API](https://oss.sonatype.org/service/local/repositories/releases/archive/org/xerial/larray/larray_2.12/0.4.0/larray_2.12-0.4.0-javadoc.jar/!/index.html#xerial.larray.package)\n * [larray-buffer Java API](https://oss.sonatype.org/service/local/repositories/releases/archive/org/xerial/larray/larray-buffer/0.4.0/larray-buffer-0.4.0-javadoc.jar/!/index.html#)\n * [larray-mmap Java API](https://oss.sonatype.org/service/local/repositories/releases/archive/org/xerial/larray/larray-mmap/0.4.0/larray-mmap-0.4.0-javadoc.jar/!/index.html?xerial/larray/mmap/package-summary.html)\n \n## For developers\n\n* Building LArray: `./sbt compile`\n* Run tests: `./sbt ~test`\n* Creating IntelliJ IDEA project: `./sbt gen-idea`\n\n\n","funding_links":[],"categories":["Extensions"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxerial%2Flarray","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxerial%2Flarray","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxerial%2Flarray/lists"}