{"id":22474949,"url":"https://github.com/Yamil-Serrano/Bloom-Checker","last_synced_at":"2025-08-02T11:32:25.754Z","repository":{"id":263329723,"uuid":"890033015","full_name":"Yamil-Serrano/Bloom-Checker","owner":"Yamil-Serrano","description":"Bloom Checker: A smart tool using Bloom filters to verify email lists efficiently with a user-friendly GUI, handling large datasets with ease and accuracy.","archived":false,"fork":false,"pushed_at":"2025-02-12T00:42:52.000Z","size":84,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-07T09:03:47.514Z","etag":null,"topics":["algorithms","bloom-filter","csv","python","tkinter"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Yamil-Serrano.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-11-17T20:55:32.000Z","updated_at":"2025-02-12T00:42:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"d7dfb093-b524-4edb-b549-766b1d76340c","html_url":"https://github.com/Yamil-Serrano/Bloom-Checker","commit_stats":null,"previous_names":["yamil-serrano/bloom-checker","nekyro/bloom-checker"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Yamil-Serrano/Bloom-Checker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yamil-Serrano%2FBloom-Checker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yamil-Serrano%2FBloom-Checker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yamil-Serrano%2FBloom-Checker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yamil-Serrano%2FBloom-Checker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Yamil-Serrano","download_url":"https://codeload.github.com/Yamil-Serrano/Bloom-Checker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yamil-Serrano%2FBloom-Checker/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268378965,"owners_count":24240907,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-02T02:00:12.353Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","bloom-filter","csv","python","tkinter"],"created_at":"2024-12-06T13:12:49.079Z","updated_at":"2025-08-02T11:32:25.744Z","avatar_url":"https://github.com/Yamil-Serrano.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Bloom Checker\n\n## Overview\n\nBloom Checker is a fast and efficient tool for verifying whether an email or dataset item is present in a database. Using the Bloom Filter algorithm, it provides quick results with low memory usage, perfect for handling large datasets.\n\n## Background \u0026 Problem Context\n\n### The Cache Penetration Problem\n\nImagine an email verification service that needs to check if millions of email addresses exist in a database. A common implementation might look like this:\n\n```python\ndef check_email(email):\n    # First, check cache\n    if cache.get(email):\n        return True\n    \n    # If not in cache, check database\n    if database.exists(email):\n        cache.set(email, True)\n        return True\n        \n    return False\n```\n\nThis approach faces two significant challenges:\n\n1. **Cache Miss**: When a valid email isn't in the cache but exists in the database:\n   ```\n   Client → Cache (Miss) → Database (Found) → Update Cache\n   ```\n   This creates one extra unnecessary lookup, but it's manageable.\n\n2. **Cache Penetration**: When checking non-existent emails:\n   ```\n   Client → Cache (Miss) → Database (Not Found) → No Cache Update\n   ```\n   This becomes problematic when:\n   - Attackers deliberately query non-existent emails\n   - Each query unnecessarily hits both cache and database\n   - System resources are wasted on known-invalid queries\n\n### The Bloom Filter Solution\n\nBloom Checker solves this by adding a Bloom Filter as a preliminary check:\n\n```\nClient → Bloom Filter → Cache → Database\n```\n\nWhen checking an email:\n- If Bloom Filter says \"No\" → Email definitely doesn't exist (stop here)\n- If Bloom Filter says \"Yes\" → Email might exist (proceed to cache/database)\n\nReal-world example:\n```python\n# Without Bloom Filter:\ncheck_email(\"attacker@fake.com\")  # Cache miss + DB query wasted\ncheck_email(\"attacker2@fake.com\") # Cache miss + DB query wasted\ncheck_email(\"attacker3@fake.com\") # Cache miss + DB query wasted\n\n# With Bloom Checker:\ncheck_email(\"attacker@fake.com\")  # Bloom Filter: No (stops here)\ncheck_email(\"attacker2@fake.com\") # Bloom Filter: No (stops here)\ncheck_email(\"attacker3@fake.com\") # Bloom Filter: No (stops here)\n```\n\nBenefits:\n- Protects against DoS attacks using non-existent emails\n- Reduces unnecessary database load\n- Extremely memory efficient (10 million emails ≈ 15MB of memory)\n- Quick response times (O(k) where k is number of hash functions)\n\n\n## Key Features\n\n- **Fast Email Verification**: Quickly checks whether an email is probably in the database or definitely not.\n- **Bloom Filter Algorithm**: Implements the space-efficient probabilistic data structure to minimize memory usage.\n- **Low False Positive Rate**: Configurable false positive rates to suit different application needs.\n- **Customizable Parameters**: Adjust the size of the Bloom Filter and the number of hash functions based on the dataset size.\n- **Graphical User Interface (GUI)**: Intuitive and easy-to-use interface built with Tkinter.\n- **File Input**: Supports CSV files for email lists and results display.\n\n## Installation\n\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/Yamil-Serrano/Bloom-Checker.git\n   ```\n\n2. Navigate to the project directory:\n   ```bash\n   cd Bloom-Checker\n   ```\n\n3. Install required dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n## Usage\n\n1. Run the application:\n   ```bash\n   python main.py\n   ```\n\n2. Use the interface to:\n   - Select the **initial database** CSV file.\n   - Select the **verification** CSV file.\n   - View the verification results in the interface, with color-coded outputs:\n     - **Green**: The email is probably in the database.\n     - **Red**: The email is definitely not in the database.\n\n3. Adjust the false positive rate directly in the `main.py` script if needed.\n\n## Example CSV Format\n\n### Initial Database File\n| Email Address       |\n|---------------------|\n| example1@gmail.com  |\n| example2@yahoo.com  |\n| example3@hotmail.com|\n\n### Verification File\n| Email Address       |\n|---------------------|\n| example1@gmail.com  |\n| unknown@gmail.com   |\n\n## Screenshot of the Interface\n\n![image](https://github.com/user-attachments/assets/da225619-89de-47f2-977b-a6f9d5e0ec15)\n\n\n## Icon Attribution\n\n- **[Lotus flower icons](https://www.flaticon.com/free-icons/lotus-flower)** created by [Freepik](https://www.flaticon.com/authors/freepik) - Flaticon\n- **[File icons](https://www.flaticon.com/free-icons/file)** created by [Good Ware](https://www.flaticon.com/authors/good-ware) - Flaticon\n\n## License\nThis project is licensed under the MIT License – see the [LICENSE](LICENSE.md) file for details.\n\n## Contact\n\nFor questions, suggestions, or contributions, please reach out via:\n\n- GitHub: [Neowizen](https://github.com/Yamil-Serrano)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYamil-Serrano%2FBloom-Checker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FYamil-Serrano%2FBloom-Checker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYamil-Serrano%2FBloom-Checker/lists"}