{"id":20811463,"url":"https://github.com/cyberfantics/pii_extractor","last_synced_at":"2026-04-27T02:31:11.366Z","repository":{"id":248889888,"uuid":"830094448","full_name":"cyberfantics/pii_extractor","owner":"cyberfantics","description":null,"archived":false,"fork":false,"pushed_at":"2024-07-17T15:24:41.000Z","size":9,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-26T13:45:08.300Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cyberfantics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-17T15:19:43.000Z","updated_at":"2024-07-17T15:24:44.000Z","dependencies_parsed_at":"2024-07-17T19:12:27.887Z","dependency_job_id":null,"html_url":"https://github.com/cyberfantics/pii_extractor","commit_stats":null,"previous_names":["cyberfantics/pii_extractor"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cyberfantics/pii_extractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberfantics%2Fpii_extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberfantics%2Fpii_extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberfantics%2Fpii_extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberfantics%2Fpii_extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cyberfantics","download_url":"https://codeload.github.com/cyberfantics/pii_extractor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberfantics%2Fpii_extractor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32320179,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T23:26:28.701Z","status":"online","status_checked_at":"2026-04-27T02:00:06.769Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-17T20:42:51.253Z","updated_at":"2026-04-27T02:31:11.345Z","avatar_url":"https://github.com/cyberfantics.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Advanced PII Extractor\n\nWelcome to the Advanced PII Extractor! This Python script scans various file formats for Personally Identifiable Information (PII) such as email addresses, phone numbers, Social Security Numbers (SSNs), credit card numbers, and IP addresses.\n\n## Author\n**Syed Mansoor ul Hassan Bukhari**  \n[GitHub Profile](https://github.com/cyberfantics)  \u003c/br\u003e\n[LinkedIn](https://www.linkedin.com/in/mansoor-bukhari-77549a264/)\n\n## Repository\n[GitHub Repository](https://github.com/cyberfantics/pii_extractor.git)\n\n## Description\nThe `advanced_pii_extractor.py` script parses and scans the following file types for PII:\n- `.docx` files\n- `.txt`, `.doc`, `.csv`, `.log`, and `.html` files\n- `.xlsx` files\n- `.pdf` files\n\n## Features\n- Identifies and extracts email addresses, phone numbers, SSNs, credit card numbers, and IP addresses.\n- Saves matches to separate files (`email_matches.txt`, `phone_matches.txt`, etc.).\n- Supports parsing from ZIP archives for `.docx` files and text extraction from PDFs.\n\n## Usage\n1. Ensure all dependencies are installed:\n   ```bash\n   pip install openpyxl PyPDF2\n   ```\n2. Download And Run:\n   ```bash\n   git clone https://github.com/cyberfantics/pii_extractor.git\n   cd pii_extractor.git\n   python pii_extractor.py\n   ```\n\n## Example Output\n\n**Matches are saved in respective files:**\n  ```bash\n  1. email_matches.txt\n  2. phone_matches.txt\n  3. ssn_matches.txt\n  4. credit_card_matches.txt\n  5. ip_matches.txt\n```\nEach file contains the file path and matched PII entries.\n\n## License\nThis project is licensed under the MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberfantics%2Fpii_extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcyberfantics%2Fpii_extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberfantics%2Fpii_extractor/lists"}