https://github.com/marschall/read-file-java
Reads a file using Java and prints some statistics
https://github.com/marschall/read-file-java
Last synced: 3 months ago
JSON representation
Reads a file using Java and prints some statistics
- Host: GitHub
- URL: https://github.com/marschall/read-file-java
- Owner: marschall
- Created: 2019-01-09T12:48:36.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-01-14T20:24:54.000Z (over 6 years ago)
- Last Synced: 2025-01-16T02:45:02.866Z (4 months ago)
- Language: Java
- Size: 12.7 KB
- Stars: 1
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Java Large File / Data Reading & Performance Testing
====================================================A reimplementation of [paigen11/read-file-java](https://github.com/paigen11/read-file-java) using [marschall/line-parser](https://github.com/marschall/line-parser), [marschall/charsequences](https://github.com/marschall/charsequences) and [marschall/mini-csv](https://github.com/marschall/mini-csv).
We use the following approach for parsing
* Use [marschall/mini-csv](https://github.com/marschall/mini-csv) for CSV parsing, which uses [marschall/line-parser](https://github.com/marschall/line-parser).
* This allows us to drastically cut down on string allocations as just a reused CharSequence view is created for every line instead of a full String.
* Since the file is in ASCII we can safe us the decoding and turn every byte into a char.
* Use an [Eclipse Collections](https://www.eclipse.org/collections/) [Bag](https://github.com/eclipse/eclipse-collections/blob/master/docs/guide.md#-bag) or counting it occurrences of months and first names.
* This allows us to not to have to hold on to every first name and is more efficient than a `HashMap`.
* Unfortunately this adds about 10 MB.
* Use YearMonth instead of a formatted String for representing a month.
* Use Integer.parseInt for parsing the YearMonth instead of DateTimeFormatterBuilder because is drastically cuts down on allocations. This causes a noticeable speed improvement.```
time java -Xmx16m -cp target/read-file-java-0.1.0-SNAPSHOT-shaded.jar com.github.marschall.readfilejava.ReadFile /path/to/file
``````
time java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler
``````
time java -Xmx6g -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC
``````
-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:StartFlightRecording:filename=read-file-java.jfr:settings=$HOME/git/read-file-java/read-file-java.jfc -XX:FlightRecorderOptions:stackdepth=128
```