{"id":32159388,"url":"https://github.com/mterron/swuniq","last_synced_at":"2026-02-22T19:00:58.799Z","repository":{"id":65605251,"uuid":"153727954","full_name":"mterron/swuniq","owner":"mterron","description":"A command-line tool for deduplicating entries in a file or stream with constant memory usage","archived":false,"fork":false,"pushed_at":"2022-04-11T21:15:55.000Z","size":127,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-12-11T01:57:26.563Z","etag":null,"topics":["cli","dedupe","deduping","deduplicate","deduplication","filter","sliding-window","uniq"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mterron.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-19T04:47:52.000Z","updated_at":"2023-03-17T13:31:14.000Z","dependencies_parsed_at":"2023-01-31T12:16:00.981Z","dependency_job_id":null,"html_url":"https://github.com/mterron/swuniq","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/mterron/swuniq","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mterron%2Fswuniq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mterron%2Fswuniq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mterron%2Fswuniq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mterron%2Fswuniq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mterron","download_url":"https://codeload.github.com/mterron/swuniq/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mterron%2Fswuniq/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29723573,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-22T15:10:41.462Z","status":"ssl_error","status_checked_at":"2026-02-22T15:10:04.636Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","dedupe","deduping","deduplicate","deduplication","filter","sliding-window","uniq"],"created_at":"2025-10-21T13:02:32.946Z","updated_at":"2026-02-22T19:00:58.793Z","avatar_url":"https://github.com/mterron.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# swuniq\n![Travis (.org)](https://img.shields.io/travis/mterron/swuniq.svg) ![coverity result](https://img.shields.io/coverity/scan/17035.svg) [![Language grade: C/C++](https://img.shields.io/lgtm/grade/cpp/g/mterron/swuniq.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/mterron/swuniq/context:cpp)\n\nDeduplicate matching lines (within a configurable window) from a file or standard input, writing to standard output.\n\nLike uniq but works on unsorted input to be used as a pipe filter with constant memory usage.\n\n#### Why?\nSometimes you need consume a data stream (Certificate Transparency log for example) that have non consecutive duplicates and you don't want to deal with them. The usual solution involving `awk` has unbounded memory usage so that might be a problem, this one doesn't.\n\n#### Memory Usage\nswuniq uses a ringbuffer of configurable size (-w option) as a FIFO queue to store hashes of each line to keep memory use constant (64bits * -w value).\n\n\n#### Example\n```sh\n# swuniq -h\nUsage: swuniq [-w N] [INPUT]\nFilter matching lines (within a configurable window) from INPUT \n(or standard input), writing to standard output.\n\n\t-w N Size of the sliding window to use for deduplication\n Note: By default swuniq will use a window of 100 lines.\n\n# cat input.txt \napple\napple\napple\nbanana\nbanana\nstrawberry\nblueberry\napple\nbanana\nstrawberry\nblueberry\nkiwifruit\norange\npeach\nwatermelon\norange\nwatermelon\nkiwifruit\nbanana\nbanana\nbanana\napple\nkiwifruit\n\n# swuniq \u003c input.txt\napple\nbanana\nstrawberry\nblueberry\nkiwifruit\norange\npeach\nwatermelon\n\n# swuniq -w 4 \u003c input.txt\napple\nbanana\nstrawberry\nblueberry\nkiwifruit\norange\npeach\nwatermelon\nbanana\napple\nkiwifruit\n\n# swuniq -w 2 \u003c input.txt \napple\nbanana\nstrawberry\nblueberry\napple\nbanana\nstrawberry\nblueberry\nkiwifruit\norange\npeach\nwatermelon\norange\nkiwifruit\nbanana\napple\nkiwifruit\n \n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmterron%2Fswuniq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmterron%2Fswuniq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmterron%2Fswuniq/lists"}