https://github.com/code402/batch-vs-index-warc
A benchmark to explore the speed of reading WARC entries in bulk vs individually.
https://github.com/code402/batch-vs-index-warc
Last synced: about 2 months ago
JSON representation
A benchmark to explore the speed of reading WARC entries in bulk vs individually.
- Host: GitHub
- URL: https://github.com/code402/batch-vs-index-warc
- Owner: code402
- License: apache-2.0
- Created: 2020-02-20T16:56:42.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-03-05T01:18:21.000Z (almost 6 years ago)
- Last Synced: 2025-01-17T16:57:56.772Z (about 1 year ago)
- Language: Java
- Size: 10.7 KB
- Stars: 0
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# batch-vs-index-warc
_See the blog post: [S3 Throughput: Scans vs Indexes](https://code402.com/blog/s3-scans-vs-index/)._
A benchmark to explore the speed of reading WARC entries in bulk vs individually.
```bash
mvn clean install assembly:single # Build the JAR
```
```bash
NUM_RECORDS=100000 NUM_CORES=16 java -Xmx20g -Dhttp.maxConnections=1000 -cp target/batch-vs-index-warc-1.0-SNAPSHOT-jar-with-dependencies.jar com.code402.Single
NUM_RECORDS=100000 NUM_CORES=16 java -Xmx20g -Dhttp.maxConnections=1000 -cp target/batch-vs-index-warc-1.0-SNAPSHOT-jar-with-dependencies.jar com.code402.Batch
```