https://github.com/brzzznko/uniqueiptracker
Tracks unique IPv4 addresses from large input files
https://github.com/brzzznko/uniqueiptracker
bitwise concurrency java junit optimization parallel-computing
Last synced: 7 months ago
JSON representation
Tracks unique IPv4 addresses from large input files
- Host: GitHub
- URL: https://github.com/brzzznko/uniqueiptracker
- Owner: brzzznko
- License: mit
- Created: 2025-03-03T18:27:34.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2025-03-06T18:34:37.000Z (about 1 year ago)
- Last Synced: 2025-06-04T22:16:00.245Z (10 months ago)
- Topics: bitwise, concurrency, java, junit, optimization, parallel-computing
- Language: Java
- Homepage:
- Size: 72.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Unique IP Tracker

[](https://app.codacy.com/gh/brzzznko/UniqueIpTracker/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[](https://app.codacy.com/gh/brzzznko/UniqueIpTracker/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_coverage)
## **Overview**
The Unique IP Tracker is a high-performance tool for processing large files and counting unique IP addresses. It supports multiple processing methods and is optimized for parallel execution.
---
## **Algorithm**
The Unique IP Tracker efficiently processes large files using bit operations and parallel processing techniques:
1. Bit Array Representation: Uses a large bit array (`AtomicBitArrayIpTracker.class`) to store seen IP addresses efficiently in memory.
2. IP to Long Conversion: Converts IPv4 addresses into a single long value to map them into the bit array.
3. Parallel Processing Methods:
- Parallel Streams (default) - Uses Java's built-in parallel streams for high-speed processing. (most efficient)
- Chunks - Splits the file into chunks and processes them concurrently.
- Single-threaded - Processes line by line without parallel execution for minimal memory usage.
4. Efficient I/O Handling: Streams the file without loading it fully into memory to support very large files.
5. Bitwise Operations for Fast Lookups: Uses bitwise shifts and masks to set and check bits in the bit array, reducing memory overhead compared to HashSets.
## **Features**
- **Parallel Processing** - Uses Parallel Streams, multi-threading for fast execution.
- **Customizable Processing Modes** - Choose from `parallel-streams`, `chunks`, or `single-threaded`.
- **Memory Optimized** - Uses bit operations and bit arrays to efficiently track unique IPs while minimizing memory usage.
- **Command-Line Interface** - Easily select processing mode and input file.
- **Pre-built JAR Available** - Download from [GitHub Releases](https://github.com/brzzznko/unique-ip-tracker/releases).
---
## **Installation & Setup**
### **Clone the Repository**
```sh
git clone https://github.com/brzzznko/unique-ip-tracker.git
cd unique-ip-tracker
```
### **Build the Project**
```sh
./gradlew clean shadowJar
```
This generates a **fat JAR** in `build/libs/` containing all dependencies.
Alternatively, download the latest **pre-built JAR** from [GitHub Releases](https://github.com/brzzznko/UniqueIpTracker/releases) and skip the build step.
---
## **Usage**
### **Example Input File (`example.txt`)**
```
97.71.174.4
97.71.173.241
97.71.173.235
161.71.174.27
215.10.61.107
```
### **Run Locally**
```sh
java -jar build/libs/unique-ip-tracker-1.0.0.jar --processor=parallel-streams --filename=path/input.txt
```
**Available Processors:** `parallel-streams` (default), `chunks`, `single-threaded`
### **Run in Docker**
```sh
docker build -t unique-ip-tracker .
docker run --rm -v ${PWD}/ip_addresses:/app/example.txt unique-ip-tracker
```
---
Check how it handles this [file](https://ecwid-vgv-storage.s3.eu-central-1.amazonaws.com/ip_addresses.zip). Attention - the file weighs about 20Gb, and unzips to about 120Gb.