{"id":34060908,"url":"https://github.com/kusen-alpha/dupfilter","last_synced_at":"2026-03-17T16:13:53.581Z","repository":{"id":193019098,"uuid":"687940477","full_name":"kusen-alpha/dupfilter","owner":"kusen-alpha","description":"强大的去重方案实现，归纳整合常见的去重方案，快速集成到项目中。","archived":false,"fork":false,"pushed_at":"2025-06-26T08:06:11.000Z","size":69,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-11-27T20:13:22.745Z","etag":null,"topics":["dupfilter","duplicate","duplication","filter","filters"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kusen-alpha.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-06T10:19:35.000Z","updated_at":"2025-06-26T08:06:14.000Z","dependencies_parsed_at":"2024-04-02T07:27:49.695Z","dependency_job_id":"77fa4fbf-1c31-4a9d-a2de-eb3d0feecaa7","html_url":"https://github.com/kusen-alpha/dupfilter","commit_stats":null,"previous_names":["kusen-alpha/dupfilter"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/kusen-alpha/dupfilter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kusen-alpha%2Fdupfilter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kusen-alpha%2Fdupfilter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kusen-alpha%2Fdupfilter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kusen-alpha%2Fdupfilter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kusen-alpha","download_url":"https://codeload.github.com/kusen-alpha/dupfilter/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kusen-alpha%2Fdupfilter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30626935,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T14:16:03.965Z","status":"ssl_error","status_checked_at":"2026-03-17T14:16:03.380Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dupfilter","duplicate","duplication","filter","filters"],"created_at":"2025-12-14T04:25:31.382Z","updated_at":"2026-03-17T16:13:53.570Z","avatar_url":"https://github.com/kusen-alpha.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 简介\n\n去重过滤器,提供常见的去重方案，开发便捷、性能极高。\n\n# 去重方案\n\n\u003ctable style=\"text-align: center\"\u003e\n    \u003ctr\u003e\n        \u003cth\u003e种类\u003c/th\u003e\n        \u003cth\u003e去重方案\u003c/th\u003e\n        \u003cth\u003e说明\u003c/th\u003e\n        \u003cth\u003e特点\u003c/th\u003e\n        \u003cth\u003e缺点\u003c/th\u003e\n        \u003cth\u003e置出方案\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr \u003e\n        \u003ctd \u003eMemory\u003c/td\u003e\n        \u003ctd\u003eMemoryFilter\u003c/td\u003e\n        \u003ctd\u003e基于内存集合类型实现\u003c/td\u003e\n        \u003ctd\u003e准确性高\u003c/td\u003e\n        \u003ctd\u003e不能持久化 \u003c/td\u003e\n        \u003ctd\u003e随机删除 \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eFile\u003c/td\u003e\n        \u003ctd\u003eFileFiler\u003c/td\u003e\n        \u003ctd\u003e基于文件+集合类型实现\u003c/td\u003e\n        \u003ctd\u003e准确性高\u003c/td\u003e\n        \u003ctd\u003e本地内存和存储占用大\u003c/td\u003e\n        \u003ctd\u003e利用文件指针区间删除\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd rowspan=\"4\"\u003eRedis\u003c/td\u003e\n        \u003ctd\u003eRedisBloomFilter\u003cbr\u003eAsyncRedisBloomFilter\u003c/td\u003e\n        \u003ctd\u003e基于Redis Bitmap和布隆过滤器算法实现\u003c/td\u003e\n        \u003ctd\u003e占用内存极小\u003c/td\u003e\n        \u003ctd\u003e有误判的情况且不容易删除元素\u003c/td\u003e\n        \u003ctd\u003e随机删除\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eRedisStringFilter\u003cbr\u003eAsyncRedisStringFilter\u003c/td\u003e\n        \u003ctd\u003e基于Redis String数据结构实现\u003c/td\u003e\n        \u003ctd\u003e不会误判，能基于过期时间实现查询去重和确认机制\u003c/td\u003e\n        \u003ctd\u003e占用资源很大，需尽可能压缩和设置过期时间\u003c/td\u003e\n        \u003ctd\u003e设置过期时间\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eRedisSetFilter\u003cbr\u003eAsyncRedisSetFilter\u003c/td\u003e\n        \u003ctd\u003e基于Redis Set数据结构实现\u003c/td\u003e\n        \u003ctd\u003e准确性高\u003c/td\u003e\n        \u003ctd\u003e占用资源较大\u003c/td\u003e\n        \u003ctd\u003e随机删除\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eRedisSortedSetFilter\u003cbr\u003eAsyncRedisSortedSetFilter\u003c/td\u003e\n        \u003ctd\u003e基于Redis SortedSet数据结构实现\u003c/td\u003e\n        \u003ctd\u003e准确性高\u003c/td\u003e\n        \u003ctd\u003e占用资源较大\u003c/td\u003e\n        \u003ctd\u003e根据分值删除\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr \u003e\n        \u003ctd \u003eSQL\u003c/td\u003e\n        \u003ctd\u003eSQLFilter\u003c/td\u003e\n        \u003ctd\u003e基于SQL关系数据库表主键来实现\u003c/td\u003e\n        \u003ctd\u003e准确性高\u003c/td\u003e\n        \u003ctd\u003e在大规模去重场景性能差 \u003c/td\u003e\n        \u003ctd\u003e按时间删除 \u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n# 项目特点\n\n1. 多种方案提供不同场景需求。\n2. 基于Lua脚本支持批量操作，速度快。\n3. 支持异步，可快速集成到异步代码和异步框架中。\n\n# 去重示例\n\n## RedisBloomFilter\n\n```python\nimport redis\nfrom dupfilter import RedisBloomFilter\n\nserver = redis.Redis(host=\"127.0.0.1\", port=6379)\nrbf = RedisBloomFilter(server=server, key=\"bf\", block_num=2)\nprint(rbf.exists_many([\"1\", \"2\", \"3\"]))\nrbf.insert_many([\"1\", \"2\", \"3\"])\nprint(rbf.exists_many([\"1\", \"2\", \"3\"]))\n```\n\n## AsyncRedisBloomFilter\n\n```python\nimport asyncio\nimport aioredis\nfrom dupfilter import AsyncRedisBloomFilter\n\n\nasync def test():\n    server = aioredis.from_url('redis://127.0.0.1:6379/0')\n    arbf = AsyncRedisBloomFilter(server, key='bf')\n    stats = await arbf.exists_many([\"1\", \"2\", \"3\"])\n    print(stats)\n    await arbf.insert_many([\"1\", \"2\", \"3\"])\n    stats = await arbf.exists_many([\"1\", \"2\", \"3\"])\n    print(stats)\n\n\nloop = asyncio.get_event_loop()\nloop.run_until_complete(test())\n\n```\n\n## DefaultFilter\n在项目中，可能在外层参数确认是否走去重逻辑，这时为了方法的逻辑一致性，预留默认去重类。\n```python\n\nfrom dupfilter import MemoryFilter\nfrom dupfilter import DefaultFilter\n\nis_dup = True  # 全局设置是否去重\nif is_dup:\n    flr = MemoryFilter()\nelse:\n    flr = DefaultFilter(default_stat=False)\n\nprint(flr.exists(\"1\"))\n```\n\n## FilterCounter\n对去重结果进行统计判断\n```python\nfrom dupfilter import MemoryFilter\nfrom dupfilter import FilterCounter\nflt = MemoryFilter()\nflt_counter = FilterCounter()\nvalues = ['1', '2', '3']\nfor value in values:\n    flt_counter.insert_stat(flt.exists(value))\n\n# 进行判断和统计\nprint(flt_counter.any(), flt_counter.all(), flt_counter.count())\n```\n\n## Others\n\n和上述示例类似\n\n# 相关库\n\n1. redis：redis/aioredis\n2. mysql：pymysql/aiomysql\n3. sqlite：sqlite3\n4. oracle：cx_Oracle/cx_Oracle_async\n\n# 后续优化\n\n1. 部分去重方案的重置逻辑完善\n\n# 关于作者\n\n1. 邮箱：1194542196@qq.com","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkusen-alpha%2Fdupfilter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkusen-alpha%2Fdupfilter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkusen-alpha%2Fdupfilter/lists"}