https://github.com/marschall/line-parser

a line parser for Java based on mmap()
https://github.com/marschall/line-parser

file-input java lines mmap

Last synced: 3 months ago
JSON representation

a line parser for Java based on mmap()

Host: GitHub
URL: https://github.com/marschall/line-parser
Owner: marschall
Created: 2015-11-22T14:04:52.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2020-09-23T07:33:12.000Z (almost 5 years ago)
Last Synced: 2025-03-27T02:43:12.524Z (3 months ago)
Topics: file-input, java, lines, mmap
Language: Java
Homepage:
Size: 115 KB
Stars: 7
Watchers: 3
Forks: 2
Open Issues: 8
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        line-parser  [![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.github.marschall/line-parser/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.github.marschall/line-parser) [![Javadocs](https://www.javadoc.io/badge/com.github.marschall/line-parser.svg)](https://www.javadoc.io/doc/com.github.marschall/line-parser) [![Build Status](https://travis-ci.org/marschall/line-parser.svg?branch=master)](https://travis-ci.org/marschall/line-parser)

===========

```xml

    com.github.marschall

    line-parser

    0.5.0

```

An `mmap()` based line parser for cases when:

 * the start byte position of a line in the file is required

 * the length in bytes of a line is required

 * only a few character of every line are required

In these cases this library can theoretically be more efficient than `BufferedReader` because:

 * the copy operations of buffered IO are avoided

 * the allocation and resizing of an intermediate `StringBuffer` is avoided

 * the allocation of the final `String` is avoided, only the required substrings

   are allocated

The performance may still be slower than a than `BufferedReader` based approach but it should consume much less memory bandwidth and produce only a fraction of the garbage.

As this project gives you a `CharSequence` instead of a `String` you may want to have a look at the [charsequences](https://github.com/marschall/charsequences) which gives you some the `String` convenience methods while avoiding allocation.

Misc

----

 * the main parsing loop is likely to benefit from [on-stack replacement (OSR)](http://openjdk.java.net/groups/hotspot/docs/HotSpotGlossary.html#onStackReplacement)

 * if you're using UTF-8 with a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) then the BOM is returned as well

 * if you're using UTF-16 with a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) then the BOM is returned as well

 * the library runs on Java 8 but is also a Java 9 module that only requires the `jdk.unsupported` module besides the `java.base` module

Usage

-----

```java

LineParser parser = new LineParser();

parser.forEach(path, cs, (line) -> {

  System.out.printf("[%d,%d]%s%n", line.getOffset(), line.getLength(), line.getContent());

});

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/marschall/line-parser

Awesome Lists containing this project

README