{"id":20340022,"url":"https://github.com/0xn0ne/sensitive-helper","last_synced_at":"2025-04-11T23:17:15.056Z","repository":{"id":214212299,"uuid":"735774040","full_name":"0xn0ne/sensitive-helper","owner":"0xn0ne","description":"基于正则表达式的本地文件敏感信息数据挖掘助手。Regular Expression Based Data Mining Assistant for Local File Sensitive Information.","archived":false,"fork":false,"pushed_at":"2024-06-25T15:21:07.000Z","size":51,"stargazers_count":9,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-11T23:17:09.292Z","etag":null,"topics":["aksk","infomation","jwt","scanner","security","sensitive"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/0xn0ne.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-26T03:30:39.000Z","updated_at":"2025-02-25T17:33:52.000Z","dependencies_parsed_at":"2023-12-28T10:26:52.579Z","dependency_job_id":"b4d1e25e-63b8-46cd-b53d-8ba8e9df722c","html_url":"https://github.com/0xn0ne/sensitive-helper","commit_stats":null,"previous_names":["0xn0ne/sensitive-helper"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xn0ne%2Fsensitive-helper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xn0ne%2Fsensitive-helper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xn0ne%2Fsensitive-helper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xn0ne%2Fsensitive-helper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/0xn0ne","download_url":"https://codeload.github.com/0xn0ne/sensitive-helper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248492885,"owners_count":21113163,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aksk","infomation","jwt","scanner","security","sensitive"],"created_at":"2024-11-14T21:19:22.684Z","updated_at":"2025-04-11T23:17:15.031Z","avatar_url":"https://github.com/0xn0ne.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"参考链接：\u003chttps://github.com/securing/DumpsterDiver\u003e\n\n# Sensitive Helper\n\n简体中文 | [English](./README_EN.md)\n\n最近项目要搜索本地的敏感数据工作太多了，网上用了一些工具效果一般，被老板DISS了很多次，当然也可能是我不会用，如：SO文件无法读取、多进程报错、配置看不懂、识别原理云里雾里等，想提issues的，但大家都要养家想想算了，自己改一个。\n\n基于正则表达式的本地文件敏感信息数据挖掘助手。如果要搜索网页上的敏感数据，可以把敏感数据导出到本地再进行搜索。优化了一下多线程的使用，优化了配置的使用方式。\n\n**注意**：如果默认规则不满足您的匹配需求，请自行调整 configs.yaml 文件中的 `rules` 部分内容进行匹配\n\n# 快速开始\n\n### 依赖\n\n+ python \u003e= 3.6\n\n进入项目目录，使用以下命令安装依赖库\n\n```bash\npip3 install toml PyYAML tqdm pandas rarfile py7zr openpyxl\n```\n\n或者使用 PIP 的 `requirement` 参数安装依赖库\n\n```bash\npip3 install -r requirements.txt\n```\n\n### 基础用法\n\n使用 `-t` 参数直接对目标路径进行搜索\n\n```$ python3 sensitive-helper.py -t \u003c你的搜索文件路径\u003e```\n\n当想要排除部分类型文件，可以使用 `-e` 参数排除指定的文件，要注意这里是使用正则表达式进行文件名匹配的，比如程序可能搜索到以下文件 /tmp/aaa.so，如果不想搜索 `.so` 文件类型，可以使用正则表达式 `.*so` 程序会将 `aaa.so` 字符串与正则表达式进行匹配 `.*so`，即可对 `so` 格式文件进行过滤\n\n```$ python3 sensitive-helper.py -t \u003c你的搜索文件路径\u003e -e \".*so\" \".*gz\"```\n\n如果觉得搜索速度太慢，可以使用 `-p` 参数调整搜索的进程数（默认为：5）以提高搜索速度，虽然Python 的多进程很差劲，但有总比没有好\n**注意**：计算机性能不好设置不要超过20个进程数，程序涉及大量的IO、内存操作，计算机可能会崩溃，比如我的电脑。\n\n```$ python3 sensitive-helper.py -t \u003c你的搜索文件路径\u003e -p 10```\n\n有保存数据的需求话，可以使用 `-o` 参数输出 json 格式的结果文件\n\n```$ python3 sensitive-helper.py -t \u003c你的搜索文件路径\u003e -o results.json```\n\n默认情况下，程序使用正则表达式进行匹配的时候，匹配到 1 条表达式就会退出当前文件的搜索。可以使用 `-a` 参数，强制程序将每条正则表达式都匹配完毕，挖掘更多可能有用的数据\n\n```$ python3 sensitive-helper.py -t \u003c你的搜索文件路径\u003e -a```\n\n**注意**：程序内置默认匹配规则，规则优先级为：默认配置 \u003c configs.yaml 配置 \u003c 用户输入配置\n\n### 使用说明\n\n```\n$ python3 sensitive-helper.py -h                                  \nusage: sensitive-helper.py [-h] -t TARGET_PATH [-p PROCESS_NUMBER] [-c CONFIG_PATH] [-o OUTPUT_FORMAT] [-e EXCLUDE_FILES [EXCLUDE_FILES ...]] [-a] [-s]\n\n    ███████╗███████╗███╗   ██╗███████╗██╗████████╗██╗██╗   ██╗███████╗\n    ██╔════╝██╔════╝████╗  ██║██╔════╝██║╚══██╔══╝██║██║   ██║██╔════╝\n    ███████╗█████╗  ██╔██╗ ██║███████╗██║   ██║   ██║██║   ██║█████╗  \n    ╚════██║██╔══╝  ██║╚██╗██║╚════██║██║   ██║   ██║╚██╗ ██╔╝██╔══╝  \n    ███████║███████╗██║ ╚████║███████║██║   ██║   ██║ ╚████╔╝ ███████╗\n    ╚══════╝╚══════╝╚═╝  ╚═══╝╚══════╝╚═╝   ╚═╝   ╚═╝  ╚═══╝  ╚══════╝\n    v0.1.3\n    by 0xn0ne, https://github.com/0xn0ne/sensitive-helper\n\noptions:\n  -h, --help            显示帮助信息并退出程序\n  -t TARGET_PATH, --target-path TARGET_PATH\n                        搜索敏感信息的文件路径或文件夹路径（例如：~/download/folder）\n  -p PROCESS_NUMBER, --process-number PROCESS_NUMBER\n                        程序进程数（默认值：5）\n  -c CONFIG_PATH, --config-path CONFIG_PATH\n                        yaml 配置文件的路径（默认值：configs.yaml）\n  -o OUTPUT_FORMAT, --output-format OUTPUT_FORMAT\n                        输出文件格式，可用格式为 json、csv（默认值：csv）\n  -e EXCLUDE_FILES [EXCLUDE_FILES ...], --exclude-files EXCLUDE_FILES [EXCLUDE_FILES ...]\n                        排除的文件，使用正则匹配（例如：\\.DS_Store .*bin .*doc）\n  -a, --is-re-all       每个文件的被单个正则表达式规则后退出匹配循环，或匹配所有正则表达式才退出匹配循环\n  -s, --is-silent       静默模式：开启后，命令行不会输出命中的信息，会使用进度条来显示进度\n```\n\n### 默认模式输出样例\n\n```bash\n$ python3 sensitive-helper.py -t \"cache/\" -a\n[*] file loading...\n[*] analyzing...\n\n[+] group: FUZZY MATCH, match: AppId\":\"123456\", file: cache/heapdump\n[+] group: BASE64, match: ZjY2MTQyNDEtYTIyYS00YjNlLTg1NTgtOTQ4NmUwZDFkZjM1, file: cache/heapdump\n[+] group: FUZZY MATCH, match: password\":\"123456\", file: cache/heapdump\n[+] group: FILE PATH, match: C:\\Windows\\system32\\drivers, file: cache/heapdump-BAK\n[+] group: URL, match: http://hello.world/123456.jpg, file: cache/heapdump-BAK  \ntotal file number: 5\n```\n\n### 静默模式输出样例\n\n```bash\n$ python3 sensitive-helper.py -t \"cache/\" -a -s\n[*] file loading...\n[*] analyzing...\n\n53792/53792 [██████████████████████████████████████████] 00:28\u003c00:00,1856.73it/s\ntotal file number: 53792\n```\n\n# Q\u0026A\n\n+ Q：为什么不做网页的敏感数据搜索？\n+ A：因为网页千变万化，改动一个API接口或是一个css或者id都可能要更新代码，不如导出到本地，统一使用文本识别的方式对数据处理。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0xn0ne%2Fsensitive-helper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F0xn0ne%2Fsensitive-helper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0xn0ne%2Fsensitive-helper/lists"}