{"id":20839287,"url":"https://github.com/ailln/proces","last_synced_at":"2025-05-08T21:43:20.929Z","repository":{"id":57454906,"uuid":"424324019","full_name":"Ailln/proces","owner":"Ailln","description":"🐨 text preprocess.","archived":false,"fork":false,"pushed_at":"2023-09-09T03:28:00.000Z","size":43,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-08T21:43:16.835Z","etag":null,"topics":["python-package","python3","text-preprocessing","text-processing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ailln.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-11-03T17:42:03.000Z","updated_at":"2024-12-24T19:27:42.000Z","dependencies_parsed_at":"2023-11-22T20:38:10.742Z","dependency_job_id":"9fa750f7-b9e0-4fb3-b2f5-10a6f8de1f52","html_url":"https://github.com/Ailln/proces","commit_stats":{"total_commits":9,"total_committers":1,"mean_commits":9.0,"dds":0.0,"last_synced_commit":"622d37aa378cdae2010ee40834067e4698f6ec3a"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":"Ailln/python-package-template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ailln%2Fproces","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ailln%2Fproces/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ailln%2Fproces/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ailln%2Fproces/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ailln","download_url":"https://codeload.github.com/Ailln/proces/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253153728,"owners_count":21862399,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python-package","python3","text-preprocessing","text-processing"],"created_at":"2024-11-18T01:13:10.847Z","updated_at":"2025-05-08T21:43:20.883Z","avatar_url":"https://github.com/Ailln.png","language":"Python","readme":"# Proces\n\n[![Pypi](https://img.shields.io/pypi/v/proces.svg)](https://pypi.org/project/proces/)\n[![MIT License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/Ailln/proces/blob/master/LICENSE)\n[![stars](https://img.shields.io/github/stars/Ailln/proces.svg)](https://github.com/Ailln/proces/stargazers)\n\n🐨 文本预处理。\n\n## 1 安装\n\n\u003e ⚠️ 注意：\n\u003e 1. 本地安装仅支持 Python 的 3.6 以上版本；\n\u003e 2. 尽可能使用 `proces` 的最新版本。\n\n### 使用 pip 安装\n\n```shell\npip install proces -U\n```\n\n### 从代码库安装\n\n```shell\ngit clone https://github.com/Ailln/proces.git\n\ncd proces \u0026\u0026 python setup.py install\n```\n\n## 2 使用\n\n```python\nfrom proces import preprocess\n\n# 默认会按照顺序执行，处理空白字符、大写转小写、繁体转简体、全角转半角\nresult = preprocess(\"Today, 你 幹 什 麼 ！\")\n# result: today,你干什么!\n\n# 配置 pipeline，比如只去除空白字符\nresult = preprocess(\"Today, 你 幹 什 麼 ！\", pipelines=[\"handle_blank_character\"])\n# result: Today,你幹什麼！\n\n# 单独使用子方法\nfrom proces import filter_unusual_characters, filter_\nfrom proces import handle_blank_character\nfrom proces import uppercase_to_lowercase\nfrom proces import traditional_to_simplified\nfrom proces import full_angle_to_half_angle\nfrom proces import handle_substitute\n\n# 删除不常见字符\nresult = filter_unusual_characters(\"【你是个恶魔😈啊�】\")\n# result: 【你是个恶魔啊】\n# 也可以使用短方法 filter_\nresult = filter_(\"【你是个恶魔😈啊�】\")\n# result: 【你是个恶魔啊】\n\n# 处理空白字符\nresult = handle_blank_character(\"空 白 字 符\")\n# result: 空白字符\nresult = handle_blank_character(\"空 白 字 符\", \",\")\n# result: 空,白,字,符\n\n# 大写转小写\nresult = uppercase_to_lowercase(\"UP to low\")\n# result: up to low\n\n# 繁体转简体\nresult = traditional_to_simplified(\"我幹什麼不干你事\")\n# result: 我干什么不干你事\n\n# 全角转半角\nresult = full_angle_to_half_angle(\"你好！\")\n# result: 你好!\n\n# 替换一些字符\nresult = handle_substitute(\"你好！/:-\", r\"/:-\", \"表情\")\n# result: 你好！表情\n```\n\n```python\n## 敏感信息过滤\nfrom proces import mask_phone, mask_address\n\n# 过滤手机号\nresult = mask_phone(\"手机号 13397238231\")\n# result: 手机号 133********\n\n# 过滤地址\nresult = mask_address(\"我在浙江杭州余杭区\")\n# result: 我在浙江杭州***\n```\n\n## 3 TODO\n\n- [x] add get all methods of preprocess\n- [ ] 装饰器\n\n## 4 许可\n\n[![](https://award.dovolopor.com?lt=License\u0026rt=MIT\u0026rbc=green)](./LICENSE)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failln%2Fproces","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Failln%2Fproces","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failln%2Fproces/lists"}