{"id":20474262,"url":"https://github.com/marschall/read-file-java","last_synced_at":"2026-04-24T05:32:52.884Z","repository":{"id":138923135,"uuid":"164861767","full_name":"marschall/read-file-java","owner":"marschall","description":"Reads a file using Java and prints some statistics","archived":false,"fork":false,"pushed_at":"2019-01-14T20:24:54.000Z","size":13,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-05-07T23:02:56.533Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marschall.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-01-09T12:48:36.000Z","updated_at":"2019-03-31T18:46:16.000Z","dependencies_parsed_at":"2023-05-04T15:47:30.258Z","dependency_job_id":null,"html_url":"https://github.com/marschall/read-file-java","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/marschall/read-file-java","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marschall%2Fread-file-java","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marschall%2Fread-file-java/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marschall%2Fread-file-java/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marschall%2Fread-file-java/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marschall","download_url":"https://codeload.github.com/marschall/read-file-java/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marschall%2Fread-file-java/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32211024,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T03:15:14.334Z","status":"ssl_error","status_checked_at":"2026-04-24T03:15:11.608Z","response_time":64,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T14:28:49.013Z","updated_at":"2026-04-24T05:32:52.871Z","avatar_url":"https://github.com/marschall.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"Java Large File / Data Reading \u0026 Performance Testing\n====================================================\n\nA reimplementation of [paigen11/read-file-java](https://github.com/paigen11/read-file-java) using [marschall/line-parser](https://github.com/marschall/line-parser), [marschall/charsequences](https://github.com/marschall/charsequences) and [marschall/mini-csv](https://github.com/marschall/mini-csv).\n\nWe use the following approach for parsing\n\n* Use [marschall/mini-csv](https://github.com/marschall/mini-csv) for CSV parsing, which uses [marschall/line-parser](https://github.com/marschall/line-parser).\n  * This allows us to drastically cut down on string allocations as just a reused CharSequence view is created for every line instead of a full String.\n  * Since the file is in ASCII we can safe us the decoding and turn every byte into a char.\n* Use an [Eclipse Collections](https://www.eclipse.org/collections/) [Bag](https://github.com/eclipse/eclipse-collections/blob/master/docs/guide.md#-bag) or counting it occurrences of months and first names.\n  * This allows us to not to have to hold on to every first name and is more efficient than a `HashMap\u003cString, Integer\u003e`.\n  * Unfortunately this adds about 10 MB.\n* Use YearMonth instead of a formatted String for representing a month.\n  * Use Integer.parseInt for parsing the YearMonth instead of DateTimeFormatterBuilder because is drastically cuts down on allocations. This causes a noticeable speed improvement.\n\n```\ntime java -Xmx16m -cp target/read-file-java-0.1.0-SNAPSHOT-shaded.jar com.github.marschall.readfilejava.ReadFile /path/to/file\n```\n\n```\ntime java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler\n```\n\n```\ntime java -Xmx6g -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC\n```\n\n```\n-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:StartFlightRecording:filename=read-file-java.jfr:settings=$HOME/git/read-file-java/read-file-java.jfc -XX:FlightRecorderOptions:stackdepth=128\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarschall%2Fread-file-java","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarschall%2Fread-file-java","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarschall%2Fread-file-java/lists"}