{"id":23408531,"url":"https://github.com/maysker/ip-counter","last_synced_at":"2025-04-09T01:32:49.762Z","repository":{"id":268810870,"uuid":"905525176","full_name":"Maysker/ip-counter","owner":"Maysker","description":"Efficiently processes large datasets of IP addresses, identifies unique valid IPs, and logs invalid entries. Optimized for performance and scalability, this project demonstrates a professional approach to handling massive data files.","archived":false,"fork":false,"pushed_at":"2024-12-19T03:22:25.000Z","size":23,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-14T19:51:55.722Z","etag":null,"topics":["badgerdb","data-handling","golang","multithreading","optimization","performance","scalability"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Maysker.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-19T02:26:39.000Z","updated_at":"2024-12-19T03:22:28.000Z","dependencies_parsed_at":"2024-12-19T04:24:44.442Z","dependency_job_id":"65d80347-7971-4f1b-9dbc-8a39e980cf90","html_url":"https://github.com/Maysker/ip-counter","commit_stats":null,"previous_names":["maysker/ip-counter"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Maysker%2Fip-counter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Maysker%2Fip-counter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Maysker%2Fip-counter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Maysker%2Fip-counter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Maysker","download_url":"https://codeload.github.com/Maysker/ip-counter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247957323,"owners_count":21024691,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["badgerdb","data-handling","golang","multithreading","optimization","performance","scalability"],"created_at":"2024-12-22T15:15:27.411Z","updated_at":"2025-04-09T01:32:49.740Z","avatar_url":"https://github.com/Maysker.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/Maysker/ip-counter\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/Maysker/ip-counter/refs/heads/master/assets/logo.png\" alt=\"Logo\" width=\"200\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n## Overview\n\nThis application efficiently processes a large file of IP addresses, identifying unique valid IPs and logging invalid entries. Designed to handle extremely large datasets, the application demonstrates professional-grade performance optimization and scalability.\n\nThis project was developed as part of an assignment from a potential employer, but due to visa issues, the process was not continued. However, the task was completed, as it was much more interesting than typical tests or banal questions.\n\n## Key Features\n\n- **Multithreaded processing**: Utilizes all available CPU cores for maximum efficiency.\n- **BadgerDB integration**: High-performance key-value database for storing unique IP hashes.\n- **Error handling and logging**: Ensures robustness and reliability.\n- **Memory usage tracking**: Provides insights during runtime.\n- **Progress reporting**: Displays the current processing status in real-time.\n\n## Tools and Technologies Used\n\n- **Programming Language**: Go (Golang)\n- **Database**: BadgerDB\n- **Hashing**: `cespare/xxhash` for fast and efficient hashing.\n- **Memory Management**: Built-in `sync` and `sync/atomic` packages for concurrency control.\n\n## Installation and Setup\n\n### Clone the repository:\n```bash\ngit clone https://github.com/Maysker/ip-counter.git\ncd ip-counter\n```\n\n## Install dependencies:\n\n- go mod tidy\n    \n- Run the application:\n    \n- go run main.go \u003cfile_path\u003e\n\n## How It Works\n\nFile Reading:\n- Reads the IP file in chunks to optimize memory usage.\n- Splits chunks into individual IP lines.\n\nIP Validation:\n\n- Validates each IP using net.ParseIP.\n- Logs invalid IPs in warnings.log.\n\nUnique IP Tracking:\n\n- Hashes valid IPs using xxhash.\n- Stores unique hashes in BadgerDB.\n\nProgress Reporting:\n\n- Displays the number of processed lines and memory usage.\n\nError Handling:\n\n- Retries file reading on errors.\n- Logs critical issues for debugging.\n\n## Example Output\n```bash\n=== IP Address Processing Program ===\nProcessing file: ip_addresses\nUsing 24 workers...\nProgress: 1,000,000 lines processed...\nProgress: 2,000,000 lines processed...\nMemory usage: 150 MB\n...\nNumber of unique valid IP addresses: 1,234,567\nNumber of invalid IP addresses: 12\nExecution time: 15m30s\n```\n\n## Key Optimizations Implemented:\n\n- Switched to BadgerDB for better performance with large datasets.\n- Implemented batch writes to minimize database overhead.\n- Added real-time memory usage tracking.\n\n## Future Improvements\n\n- Add configuration options for batch size, logging verbosity, and worker count.\n- Implement a web-based UI for easier monitoring.\n- Explore further optimizations with low-level I/O operations.\n\n## Acknowledgments\n\nSpecial thanks to the creators of:\n\n- Golang\n- BadgerDB\n- cespare/xxhash\n## \nFor questions or feedback, please contact [Maysker](https://github.com/Maysker).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaysker%2Fip-counter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaysker%2Fip-counter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaysker%2Fip-counter/lists"}