{"id":13531700,"url":"https://github.com/nil0x42/duplicut","last_synced_at":"2025-04-13T00:46:16.319Z","repository":{"id":20008499,"uuid":"23276037","full_name":"nil0x42/duplicut","owner":"nil0x42","description":"Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)","archived":false,"fork":false,"pushed_at":"2022-06-25T09:47:36.000Z","size":1168,"stargazers_count":918,"open_issues_count":9,"forks_count":91,"subscribers_count":21,"default_branch":"master","last_synced_at":"2025-04-13T00:46:12.296Z","etag":null,"topics":["c","cracking","dedupe","dictionary","duplicate-detection","hashcat","hashes","password","password-cracking","remove-duplicates","uniq","unique","wordlist","wordlist-generator","wordlists"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nil0x42.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":"nil0x42","custom":["exdemia.com/donate-bitcoin","paypal.me/nil0x42"]}},"created_at":"2014-08-24T07:43:08.000Z","updated_at":"2025-04-10T13:03:38.000Z","dependencies_parsed_at":"2022-08-19T02:51:15.523Z","dependency_job_id":null,"html_url":"https://github.com/nil0x42/duplicut","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nil0x42%2Fduplicut","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nil0x42%2Fduplicut/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nil0x42%2Fduplicut/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nil0x42%2Fduplicut/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nil0x42","download_url":"https://codeload.github.com/nil0x42/duplicut/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248650432,"owners_count":21139672,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","cracking","dedupe","dictionary","duplicate-detection","hashcat","hashes","password","password-cracking","remove-duplicates","uniq","unique","wordlist","wordlist-generator","wordlists"],"created_at":"2024-08-01T07:01:04.971Z","updated_at":"2025-04-13T00:46:16.299Z","avatar_url":"https://github.com/nil0x42.png","language":"C","funding_links":["https://github.com/sponsors/nil0x42","exdemia.com/donate-bitcoin","paypal.me/nil0x42"],"categories":["Hash Cracking Tools","C","C (286)","Wordlist tools","C++","Pentesting"],"sub_categories":["Zealandia","Generation/Manipulation","Forensics","Enumeration"],"readme":"\u003ch1 align=\"center\"\u003eDuplicut :scissors:\u003c/h1\u003e\n\n\u003ch3 align=\"center\"\u003e\n    Quickly dedupe massive wordlists, without changing the order\n    \u003ca href=\"https://twitter.com/intent/tweet?text=Duplicut%3A%20Remove%20duplicates%20from%20MASSIVE%20wordlist%2C%20without%20sorting%20it%20(for%20dictionnary-based%20password%20cracking)%20-%20by%20%40nil0x42\u0026url=https://github.com/nil0x42/duplicut\"\u003e\n      \u003cimg src=\"https://img.shields.io/twitter/url?label=tweet\u0026logo=twitter\u0026style=social\u0026url=http%3A%2F%2F0\" alt=\"tweet\"\u003e\n    \u003c/a\u003e\n\u003c/h3\u003e\n\u003cbr\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/nil0x42/duplicut/actions?query=branch%3Amaster\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/workflow/status/nil0x42/duplicut/Unit%20Tests/master?logo=githubactions\" alt=\"github workflows\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://app.codacy.com/gh/nil0x42/duplicut/dashboard\"\u003e\n    \u003cimg src=\"https://img.shields.io/codacy/grade/b01c0228bd9148fb9d713a479dda4b25?logo=codacy\u0026logoColor=green\" alt=\"codacy code quality\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://lgtm.com/projects/g/nil0x42/duplicut/alerts/\"\u003e\n    \u003cimg src=\"https://img.shields.io/lgtm/alerts/github/nil0x42/duplicut?logo=lgtm\u0026logoColor=yellow\" alt=\"lgtm alerts\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://codecov.io/gh/nil0x42/duplicut\"\u003e\n    \u003cimg src=\"https://img.shields.io/codecov/c/github/nil0x42/duplicut?color=orange\u0026label=coverage\u0026logo=codecov\" alt=\"codecov coverage\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/enaqx/awesome-pentest#hash-cracking-tools\"\u003e\n    \u003cimg src=\"https://awesome.re/mentioned-badge.svg\" alt=\"Mentioned in awesome-pentest\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://inventory.raw.pm/tools.html#Duplicut\"\u003e\n    \u003cimg src=\"https://inventory.raw.pm/img/badges/Rawsec-inventoried-FF5050_flat.svg\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://www.blackarch.org/misc.html\"\u003e\n    \u003cimg src=\"https://img.shields.io/static/v1?label=BlackArch\u0026message=packaged\u0026color=red\u0026logo=archlinux\u0026logoColor=006\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://twitter.com/intent/follow?screen_name=nil0x42\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/twitter/follow/nil0x42.svg?logo=twitter\" akt=\"follow on twitter\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003csub\u003e\n    Created by\n    \u003ca href=\"https://twitter.com/nil0x42\"\u003enil0x42\u003c/a\u003e and\n    \u003ca href=\"https://github.com/nil0x42/duplicut/graphs/contributors\"\u003econtributors\u003c/a\u003e\n  \u003c/sub\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n\n### :book: Overview\n\nNowadays, password wordlist creation usually implies concatenating\nmultiple data sources.\n\nIdeally, most probable passwords should stand at start of the wordlist,\nso most common passwords are cracked instantly.\n\nWith existing *dedupe tools* you are forced to choose\nif you prefer to *preserve the order **OR** handle massive wordlists*.\n\nUnfortunately, **wordlist creation requires both**:\n\n![][img-1-comparison]\n\n\u003e **So i wrote duplicut in [highly optimized C][get-next-line] to address this very specific need :nerd\\_face: :computer:**\n\n* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n\n### :bulb: Quick start\n\n```sh\ngit clone https://github.com/nil0x42/duplicut\ncd duplicut/ \u0026\u0026 make\n./duplicut wordlist.txt -o clean-wordlist.txt\n```\n\n### :wrench: Options\n\n![][img-4-help]\n\n*   **Features**:\n    *   Handle massive wordlists, even those whose size exceeds available RAM\n    *   Filter lines by max length (`-l` option)\n    *   Can remove lines containing non-printable ASCII chars (`-p` option)\n    *   Press any key to show program status at runtime.\n\n*   **Implementation**:\n    *   Written in pure C code, designed to be fast\n    *   Compressed hashmap items on 64 bit platforms\n    *   Multithreading support\n\n*   **Limitations**:\n    *   Any line longer than 255 chars is ignored\n\n### :book: Technical Details\n\n#### :small_orange_diamond: 1- Memory optimized:\n\nAn `uint64` is enough to index lines in hashmap, by packing\n`size` info within pointer's [extra bits][tagged-pointer]:\n\n![][img-2-line-struct]\n\n#### :small_orange_diamond: 2- Massive file handling:\n\nIf whole file can't fit in memory, it is split into ![][latex-n]\nvirtual chunks, in such way that each chunk uses as much RAM as possible.\n\nEach chunk is then loaded into hashmap, deduped, and tested against\nsubsequent chunks.\n\nThat way, execution time decreases to at most ![][latex-n]th *triangle number*:\n\n![][img-3-chunked-processing]\n\n## :bulb: Throubleshotting\n\nIf you find a bug, or something doesn't work as expected,\nplease compile duplicut in debug mode and post an [issue] with\nattached output:\n\n```\n# debug level can be from 1 to 4\nmake debug level=1\n./duplicut [OPTIONS] 2\u003e\u00261 | tee /tmp/duplicut-debug.log\n```\n\n[get-next-line]: https://github.com/nil0x42/duplicut/blob/master/src/line.c#L39\n\n[img-1-comparison]: data/img/1-comparison.png\n[img-2-line-struct]: data/img/2-line-struct.png\n[img-3-chunked-processing]: data/img/3-chunked-processing.png\n[img-4-help]: data/img/4-help.png\n\n[issue]: https://github.com/nil0x42/duplicut/issues\n[tagged-pointer]: https://en.wikipedia.org/wiki/Tagged_pointer\n\n[latex-n]: http://www.sciweavers.org/tex2img.php?fs=15\u0026eq=n\n[latex-nth-triangle]: http://www.sciweavers.org/tex2img.php?fs=32\u0026eq=%5Csum_%7Bk%3D1%7D%5Enk\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnil0x42%2Fduplicut","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnil0x42%2Fduplicut","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnil0x42%2Fduplicut/lists"}