https://github.com/marschall/line-parser
a line parser for Java based on mmap()
https://github.com/marschall/line-parser
file-input java lines mmap
Last synced: about 1 month ago
JSON representation
a line parser for Java based on mmap()
- Host: GitHub
- URL: https://github.com/marschall/line-parser
- Owner: marschall
- Created: 2015-11-22T14:04:52.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2020-09-23T07:33:12.000Z (over 4 years ago)
- Last Synced: 2025-03-27T02:43:12.524Z (about 2 months ago)
- Topics: file-input, java, lines, mmap
- Language: Java
- Homepage:
- Size: 115 KB
- Stars: 7
- Watchers: 3
- Forks: 2
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
line-parser [](https://maven-badges.herokuapp.com/maven-central/com.github.marschall/line-parser) [](https://www.javadoc.io/doc/com.github.marschall/line-parser) [](https://travis-ci.org/marschall/line-parser)
===========```xml
com.github.marschall
line-parser
0.5.0```
An `mmap()` based line parser for cases when:
* the start byte position of a line in the file is required
* the length in bytes of a line is required
* only a few character of every line are requiredIn these cases this library can theoretically be more efficient than `BufferedReader` because:
* the copy operations of buffered IO are avoided
* the allocation and resizing of an intermediate `StringBuffer` is avoided
* the allocation of the final `String` is avoided, only the required substrings
are allocatedThe performance may still be slower than a than `BufferedReader` based approach but it should consume much less memory bandwidth and produce only a fraction of the garbage.
As this project gives you a `CharSequence` instead of a `String` you may want to have a look at the [charsequences](https://github.com/marschall/charsequences) which gives you some the `String` convenience methods while avoiding allocation.
Misc
----* the main parsing loop is likely to benefit from [on-stack replacement (OSR)](http://openjdk.java.net/groups/hotspot/docs/HotSpotGlossary.html#onStackReplacement)
* if you're using UTF-8 with a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) then the BOM is returned as well
* if you're using UTF-16 with a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) then the BOM is returned as well
* the library runs on Java 8 but is also a Java 9 module that only requires the `jdk.unsupported` module besides the `java.base` moduleUsage
-----```java
LineParser parser = new LineParser();
parser.forEach(path, cs, (line) -> {
System.out.printf("[%d,%d]%s%n", line.getOffset(), line.getLength(), line.getContent());
});
```