{"id":17856803,"url":"https://github.com/f483/dejavu","last_synced_at":"2025-08-20T02:04:54.492Z","repository":{"id":57501182,"uuid":"92765818","full_name":"F483/dejavu","owner":"F483","description":"Quickly detect already witnessed data.","archived":false,"fork":false,"pushed_at":"2024-07-11T14:59:46.000Z","size":312,"stargazers_count":157,"open_issues_count":4,"forks_count":5,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-08-14T12:21:59.136Z","etag":null,"topics":["command-line","command-line-tool","deduplication","duplicate-values","duplicates","go","golang","history","memory","probabilistic"],"latest_commit_sha":null,"homepage":"https://f483.github.io","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/F483.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-29T18:36:42.000Z","updated_at":"2025-03-30T00:43:44.000Z","dependencies_parsed_at":"2024-10-31T08:45:27.877Z","dependency_job_id":null,"html_url":"https://github.com/F483/dejavu","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/F483/dejavu","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/F483%2Fdejavu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/F483%2Fdejavu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/F483%2Fdejavu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/F483%2Fdejavu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/F483","download_url":"https://codeload.github.com/F483/dejavu/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/F483%2Fdejavu/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271252993,"owners_count":24726918,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-20T02:00:09.606Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line","command-line-tool","deduplication","duplicate-values","duplicates","go","golang","history","memory","probabilistic"],"created_at":"2024-10-28T03:09:20.834Z","updated_at":"2025-08-20T02:04:54.471Z","avatar_url":"https://github.com/F483.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n[![ci](https://github.com/f483/dejavu/actions/workflows/go.yml/badge.svg)](https://github.com/f483/dejavu/actions/workflows/go.yml)\n[![Issues](https://img.shields.io/github/issues/f483/dejavu.svg)](https://github.com/f483/dejavu/issues)\n[![Go Report Card](https://goreportcard.com/badge/github.com/f483/dejavu)](https://goreportcard.com/report/github.com/f483/dejavu)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/f483/dejavu/master/LICENSE)\n[![GoDoc](https://img.shields.io/badge/godoc-reference-blue.svg)](https://godoc.org/github.com/f483/dejavu)\n\n\n# Déjà vu\n\nQuickly detect already witnessed data, ideal for deduplication.\n\nLimited memory of witnessed data, oldest are forgotten. Library is thread safe.\nOffers deterministic and probabilistic (over an order of magnitude less memory\nconsuming) implementation. The probabilistic implementation uses bloom filters,\nmeaning false positives are possible but not false negatives.\n\n\n# Installation\n\n## Download binary release\n\nCompiled binaries for many platforms are available and can be downloaded for\nthe [latest release](https://github.com/F483/dejavu/releases/latest).\n\nExtract the binary for your platform and add it to your system path.\n\n## Compile from source\n\nRequires golang [environment/workspace](https://golang.org/doc/code.html).\n\n```\n# compile and install library\ngo get github.com/f483/dejavu\n\n# compile and install binary\ngo install github.com/f483/dejavu/dejavu\n```\n\n# Command line usage\n\n```\n$ dejavu -h\nUsage: dejavu [OPTION]... [FILE]...\n\nConcatenate FILE(s) and filter or output duplicate lines.\n\nWith no FILE, or when FILE is -, read standard input.\n\nOptions:\n  -D\tuse deterministic mode instead of probabilistic\n\tWARNING requires order of magnitude more memory\n  -d\toutput only duplicates instead of filtering\n  -f float\n    \tchance of false positive, between 0.0 and 1.0\n\tonly for probabilistic mode (default 1e-06)\n  -l uint\n    \tlimit after which entries are forgotton (default 1000000)\n  -o string\n    \toutput file, defaults to stdout\n  -v\toutput version information and exit\n\nExamples:\n  dejavu\n\tdefault probabilistic deduplication from stdin to std out with\n\t1mil entry limit and 1/1mil chance of false positive (~8M mem usage)\n  dejavu -o s f - g\n\tdeduplicat f, then stdin, then g, to output s\n  dejavu -l 10000000 -fp 0.000000001\n\tprobabilistic deduplication with 10mil entry limit\n\tand 1/1bil chance of false positive (~70M mem usage)\n  dejavu -d -D -l 65536\n\toutput duplicates and avoid false positives with deterministic mode\n\tlower entry limit to avoid excessive memory usage\n\nImplementation:\n  Efficient probabilistic and deterministic duplicate detection with O(1) \n  detection time and O(n) memory usage in relation to entry limit. Default\n  probabilistic implementation uses bloom filters, meaning false\n  positives are possible but not false negatives.\n\nAuthor: Fabian Barkhau \u003cf483@protonmail.com\u003e\nProject: https://github.com/f483/dejavu\nLicense: MIT https://raw.githubusercontent.com/f483/dejavu/master/LICENSE\n```\n\n# Library usage (golang)\n\n## Probabilistic example\n\n```\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com/f483/dejavu\"\n)\n\nfunc main() {\n\n\t// probably remembers last 65536 with 0.000001 chance of false positive\n\tp := dejavu.NewProbabilistic(65536, 0.000001)\n\n\tfmt.Println(p.Witness([]byte(\"bar\"))) // entry added\n\tfmt.Println(p.Witness([]byte(\"bar\"))) // probably remembers entry\n}\n```\n\n## Deterministic example\n\n```\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com/f483/dejavu\"\n)\n\nfunc main() {\n\n\t// always remembers last 1024 entries\n\td := dejavu.NewDeterministic(1024)\n\n\tfmt.Println(d.Witness([]byte(\"foo\"))) // entry added\n\tfmt.Println(d.Witness([]byte(\"foo\"))) // remembers entry\n}\n```\n\n# Performance\n\n## Linear memory usage: O(n)\n\n### Probabilistic\n\n0.000001 chance of false positive.\n\n![Benchmark Memory](https://github.com/f483/dejavu/raw/master/_benchmark/probabilistic-memory.png)\n\n### Deterministic\n\n![Benchmark Memory](https://github.com/f483/dejavu/raw/master/_benchmark/deterministic-memory.png)\n\n\n## Constant witness time: O(1)\n\n### Probabilistic\n\n0.000001 chance of false positive.\n\n![Benchmark Time](https://github.com/f483/dejavu/raw/master/_benchmark/probabilistic-time.png)\n\n### Deterministic\n\n![Benchmark Time](https://github.com/f483/dejavu/raw/master/_benchmark/deterministic-time.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ff483%2Fdejavu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ff483%2Fdejavu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ff483%2Fdejavu/lists"}