https://github.com/mayankpratap/samchika
A fast and light-weight multithreaded file processing library for Java.
https://github.com/mayankpratap/samchika
concurrency file-processing java kotlin multithreading open-source parallel-processing performance scala
Last synced: 6 months ago
JSON representation
A fast and light-weight multithreaded file processing library for Java.
- Host: GitHub
- URL: https://github.com/mayankpratap/samchika
- Owner: MayankPratap
- License: mit
- Created: 2025-05-19T13:31:25.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-06-22T13:21:10.000Z (6 months ago)
- Last Synced: 2025-06-22T13:36:25.970Z (6 months ago)
- Topics: concurrency, file-processing, java, kotlin, multithreading, open-source, parallel-processing, performance, scala
- Language: Java
- Homepage:
- Size: 175 KB
- Stars: 59
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Samchika
**Samchika** (meaning _File_ in Sanskrit) is a **re-usable**, **easy-to-use**, and **extremely fast** file processing library for the Java language.
It is built with a strong focus on **multithreading** to handle CPU-intensive file processing tasks in parallel, enabling high performance even with massive files.
---
## π Features
- π Fully multithreaded β optimized for parallel file processing.
- π§© Simple API β just plug in your file path and logic.
- π Optional runtime stats β time taken, memory used, thread-level info.
- π§ͺ Ideal for processing and analyzing **large text files** (e.g. logs, datasets).
- π Open-source friendly β contributions are welcome!
---
## Use Cases
Samchika excels in several scenarios where multithreaded file processing provides significant advantages:
- Log Analysis & Processing
- ETL (Extract, Transform, Load) Operations
- Large Text Corpus Processing
- Batch Report Generation
- Data Transformation Pipelines
- Real-time Data Processing
See the examples directory for detailed implementations of these use cases.
### Quick Example
java// Transform a large CSV file with optimal performance
```
SmartFileProcessor.builder()
.inputPath("large_dataset.csv")
.outputPath("transformed_dataset.csv")
.batchSize(10000)
.lineProcessor(line -> line.toUpperCase())
.displayStats(true)
.build()
.execute();
```
## π¦ Installation
### Maven
```
jitpack.io
https://jitpack.io
com.github.mayankpratap
samchika
1.0.0
```
### Gradle
```
repositories {
maven { url 'https://jitpack.io' }
}
dependencies {
implementation 'com.github.mayankpratap:samchika:1.0.0'
}
```
## π οΈ How to Use
### Step 1: Import the Library
```
import com.samchika.SmartFileProcessor;
```
### Step 2: Client Code
```
public static void main(String[] args) {
SmartFileProcessor processor = SmartFileProcessor.builder()
.inputPath("input.txt") // Path to the file to be processed
.outputPath("output.txt") // Path to write the output
.lineProcessor(Main::processLine) // Your business logic per line
.displayStats(true) // Optional: display runtime stats
.build();
processor.execute();
}
```
Sample **200 MB** file to download and test : https://drive.google.com/file/d/1CWUgdFpXBC3N-YDanKbrCTnhJN4RGRZP/view?usp=drive_link
## π Performance
Benchmarked against naΓ―ve BufferedReader-based implementations on files of various sizes:
β
200 MB
β
1 GB
β
5 GB
β
16 GB
Significant performance improvements were observed, especially in multi-core systems ( More than 70% performance gain )
The gain in time saved improves in comparison to naive code, as we increase the size of input file. Also for huge performance gain in time, the memory used for even large files ( 16GB ) is manageable ( ~ 800 MB ).

## License
This library is licensed under the MIT License, which means you can freely use, modify, and distribute it, even in commercial applications. All we ask is that you include the original copyright notice and license text in any copy of the library or substantial portion of it.
## π‘ Inspiration
This project was inspired by:
1) Shubham Maurya ( https://github.com/complex1 ) , a dear friend, who published a JavaScript library β which sparked the motivation to do something similar in Java.
2) A LinkedIn post discussing the challenges of processing large text files β which gave me the idea to solve it with an elegant API and fast multithreaded architecture.