{"id":13584054,"url":"https://github.com/trendmicro/tlsh","last_synced_at":"2026-02-17T01:10:13.916Z","repository":{"id":11144070,"uuid":"13511038","full_name":"trendmicro/tlsh","owner":"trendmicro","description":null,"archived":false,"fork":false,"pushed_at":"2026-02-05T23:01:55.000Z","size":6823,"stargazers_count":817,"open_issues_count":25,"forks_count":144,"subscribers_count":44,"default_branch":"master","last_synced_at":"2026-02-13T11:40:54.827Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Max","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/trendmicro.png","metadata":{"files":{"readme":"README.md","changelog":"Change_History.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE.txt","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2013-10-11T22:21:54.000Z","updated_at":"2026-02-09T19:36:31.000Z","dependencies_parsed_at":"2026-02-06T01:01:52.568Z","dependency_job_id":null,"html_url":"https://github.com/trendmicro/tlsh","commit_stats":{"total_commits":285,"total_committers":38,"mean_commits":7.5,"dds":0.8140350877192982,"last_synced_commit":"74efd09a47aecb12f4457b542ebc21d098f901b1"},"previous_names":[],"tags_count":26,"template":false,"template_full_name":null,"purl":"pkg:github/trendmicro/tlsh","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trendmicro%2Ftlsh","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trendmicro%2Ftlsh/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trendmicro%2Ftlsh/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trendmicro%2Ftlsh/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/trendmicro","download_url":"https://codeload.github.com/trendmicro/tlsh/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trendmicro%2Ftlsh/sbom","scorecard":{"id":897989,"data":{"date":"2025-08-11","repo":{"name":"github.com/trendmicro/tlsh","commit":"188c9c87158bda183cee2199f94236e4551018bd"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.4,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":6,"reason":"Found 10/15 approved changesets -- score normalized to 6","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Binary-Artifacts","score":7,"reason":"binaries present in source code","details":["Warn: binary detected: java/gradle/wrapper/gradle-wrapper.jar:1","Warn: binary detected: tlsh_bh_tool/extended_file_properties/__init__.pyc:1","Warn: binary detected: tlsh_bh_tool/extended_file_properties/extended_file_properties.pyc:1"],"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"License","score":9,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Warn: project license file does not contain an FSF or OSI license."],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Vulnerabilities","score":0,"reason":"10 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-944j-8ch6-rf6x","Warn: Project is vulnerable to: PYSEC-2018-24 / GHSA-2rcm-phc9-3945","Warn: Project is vulnerable to: PYSEC-2013-31 / GHSA-6748-36qp-fx6r","Warn: Project is vulnerable to: PYSEC-2018-23 / GHSA-p28m-34f6-967q","Warn: Project is vulnerable to: PYSEC-2014-14 / GHSA-652x-xj99-gmcc","Warn: Project is vulnerable to: GHSA-9hjg-9r4m-mvj7","Warn: Project is vulnerable to: GHSA-9wx4-h78v-vm56","Warn: Project is vulnerable to: PYSEC-2014-13 / GHSA-cfj3-7x9c-4p3h","Warn: Project is vulnerable to: PYSEC-2018-28 / GHSA-x84v-xcm2-53pg","Warn: Project is vulnerable to: GHSA-9772-cwx9-r4cj"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 25 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-24T14:27:57.169Z","repository_id":11144070,"created_at":"2025-08-24T14:27:57.170Z","updated_at":"2025-08-24T14:27:57.170Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29528428,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-17T00:57:22.232Z","status":"ssl_error","status_checked_at":"2026-02-17T00:54:25.811Z","response_time":115,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T15:03:58.947Z","updated_at":"2026-02-17T01:10:13.908Z","avatar_url":"https://github.com/trendmicro.png","language":"Max","funding_links":[],"categories":["Max"],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/trendmicro/tlsh.svg?branch=master)](https://travis-ci.org/trendmicro/tlsh/)\n\n# TLSH - Trend Micro Locality Sensitive Hash\n\nTLSH is a fuzzy matching library.\nGiven a byte stream with a minimum length of 50 bytes\nTLSH generates a hash value which can be used for similarity comparisons.\nSimilar objects will have similar hash values which allows for\nthe detection of similar objects by comparing their hash values.  Note that\nthe byte stream should have a sufficient amount of complexity.  For example,\na byte stream of identical bytes will not generate a hash value.\n\n## What's New in TLSH 5.0.x\n17/02/2026\nThe default behaviour is to input / output TLSH digests that start with the prefix \"T1\"\n\n19/01/2026\nReleased py-tlsh 4.12.1\n\n2020\n- adopted by [Virus Total](https://developers.virustotal.com/v3.0/reference#files-tlsh)\n- adopted by [Malware Bazaar](https://bazaar.abuse.ch/api/#tlsh)\n\nWe have added a version identifier (\"T1\") to the start of the digest.\nThe \"T1\" prefix is intended for the standard TLSH - that is\n- 128 buckets\n- the checksum is 1 byte\nPlease use versions of TLSH that have the T1 header\nThe code is backwards compatible, it can still read and interpret 70 hex character strings as TLSH digests.\nAnd data sets can include mixes of the old and new digests.\nIf you need old style TLSH digests to be outputted, then use the command line option '-old'\n\n## Dedication\n\nThanks to Chun Cheng, who was a humble and talented engineer.\n\n## Minimum byte stream length\n\nThe program in default mode requires an input byte stream with a minimum length of 50 bytes\n(and a minimum amount of randomness - see note in Python extension below).\n\nFor consistency with older versions, there is a -conservative option which enforces a 256 byte limit.\nSee notes for version 3.17.0 of TLSH\n\n## Computed hash\n\nThe computed hash is 35 bytes of data (output as 'T1' followed 70 hexidecimal characters. Total length 72 characters).\nThe 'T1' has been added as a version number for the hash - so that we can adapt the algorithm and still maintain\nbackwards compatibility.\nTo get the old style 70 hex hashes, use the -old command line option.\n\nBytes 3,4,5 are used to capture the information about the file as a whole\n(length, ...), while the last 32 bytes are used to capture information about\nincremental parts of the file.  (Note that the length of the hash can be\nincreased by changing build parameters described below in [CMakeLists.txt](CMakeLists.txt),\nwhich will increase the information stored in the hash.\nFor some applications this might increase the accuracy in predicting similarities between files.)\n\n## Executables and library\n\nBuilding TLSH (see below) will create a static library in the `lib` directory,\nand the `tlsh` executable (a symbolic link to `tlsh_unittest`).\n'tlsh' links to the static library, in the `bin` directory.\nThe library has functionality to generate the hash value from a given\nfile, and to compute the similarity between two hash values.\n\n`tlsh` is a utility for generating TLSH hash values and comparing TLSH\nhash values to determine similarity.  Run it with no parameters for detailed usage.\n\n## Ports\n\n- A JavaScript port available in the `js_ext` directory.\n- A Java port is available in the `java` directory.\n\n## 3rd Party Ports\n\nWe list these ports just for reference.\nWe have not checked the code in these repositories, and we have not checked that the results are identical to TLSH here.\nWe also request that any ports include the files LICENSE and NOTICE.txt exactly as they appear in this repository.\n\n- Another Java port is available [here](https://github.com/idealista/tlsh).\n- Another Java port is available [here](https://github.com/kevemueller/kTLSH).\n- A Golang port is available [here](https://github.com/glaslos/tlsh).\n- A Ruby port is available [here](https://github.com/adamliesko/tlsh)\n\n# Downloading TLSH\n\nDownload TLSH as follows:\n\n```\nwget https://github.com/trendmicro/tlsh/archive/master.zip -O master.zip\nunzip master.zip\ncd tlsh-master\n```\n\n**or**\n\n```\ngit clone git://github.com/trendmicro/tlsh.git\ncd tlsh\ngit checkout master\n```\n\n# Building TLSH\n\nEdit [CMakeLists.txt](CMakeLists.txt) to build TLSH with different options.\n\n- TLSH_BUCKETS: determines using 128 or 256 buckets\n\tuse the default 128 buckets unless you are an expert and know you need 256 buckets\n- TLSH_CHECKSUM_1B: determines checksum length, longer means less collision\n\tuse the default 1 byte unless you are an expert and know you need a larger checksum\n\n## Linux\n\nExecute:\n\n```\nmake.sh\n```\n\n**Note:** *Building TLSH on Linux depends upon `cmake` to create the `Makefile` and then\n`make` the project, so the build will fail if `cmake` is not installed.*\nTo install cmake/gcc compiler on CentOs or Amazon Linux:\n\t$ sudo yum install cmake\n\t$ sudo yum install gcc-c++\n\n## Windows (MinGW)\n\nAdded in March 2020.\nSee the instructions in README.mingw\n\n## Windows (Visual Studio)\n\nUse the version-specific tlsh solution files ([tlsh.VC2005.sln](Windows/tlsh.VC2005.sln),\n[tlsh.VC2008.sln](Windows/tlsh.VC2008.sln), ...) under the Windows directory.\n\nSee [tlsh.h](include/tlsh.h) for the tlsh library interface and [tlsh_unittest.cpp](test/tlsh_unittest.cpp) and\n[simple_unittest.cpp](test/simple_unittest.cpp) under the `test` directory for example code.\n\n## Using TLSH in Python\n\n### Python Package\n\nWe have recently created a Python package on PyPi: [https://pypi.org/project/py-tlsh/](https://pypi.org/project/py-tlsh/)  \nThe py-tlsh replaces the python-tlsh package. For details see [issue 94](https://github.com/trendmicro/tlsh/issues/94)  \nTo install this package\n```\n\t$  pip install py-tlsh\n```\n\n### Python Extension\n\nIf you need to build your own Python package, then there is a README.python with notes about the python version\n\n```\n(1) compile the C++ code\n\t$./make.sh\n(2) build the python version\n\t$ cd py_ext/\n\t$ python ./setup.py build\n(3) install - possibly - sudo, run as root or administrator\n\t$ python ./setup.py install\n(4) test it\n\t$ cd ../Testing\n\t$ ./python_test.sh\n```\n\n#### Python Usage\n\n```python\nimport tlsh\n\ntlsh.hash(data)\n```\n\nNote data needs to be bytes - not a string.\nThis is because TLSH is for binary data and binary data can contain a NULL (zero) byte.\n\nIn default mode the data must contain at least 50 bytes to generate a hash value and that\nit must have a certain amount of randomness.\nTo get the hash value of a file, try\n\n```python\ntlsh.hash(open(file, 'rb').read())\n```\n\nNote: the open statement has opened the file in binary mode.\n\n#### Python Example\n```python\nimport tlsh\n\nh1 = tlsh.hash(data)\nh2 = tlsh.hash(similar_data)\nscore = tlsh.diff(h1, h2)\n\nh3 = tlsh.Tlsh()\nwith open('file', 'rb') as f:\n    for buf in iter(lambda: f.read(512), b''):\n        h3.update(buf)\n    h3.final()\n# this assertion is stating that the distance between a TLSH and itself must be zero\nassert h3.diff(h3) == 0\nscore = h3.diff(h1)\n```\n\n#### Python Extra Options\n\nThe `diffxlen` function removes the file length component of the tlsh header from the comparison.\n\n```python\ntlsh.diffxlen(h1, h2)\n```\n\nIf a file with a repeating pattern is compared to a file with only a single instance of the pattern,\nthen the difference will be increased if the file lenght is included.\nBut by using the `diffxlen` function, the file length will be removed from consideration.\n\n#### Python Backwards Compatibility Options\n\nIf you use the \"conservative\" option, then the data must contain at least 256 characters.\nFor example,\n\n```python\nimport os\ntlsh.conservativehash(os.urandom(256))\n```\n\nshould generate a hash, but\n\n```python\ntlsh.conservativehash(os.urandom(100))\n```\n\nwill generate TNULL as it is less than 256 bytes.\n\nIf you need to generate old style hashes (without the \"T1\" prefix) then use\n\n```python\ntlsh.oldhash(os.urandom(100))\n```\n\n\nThe old and conservative options may be combined:\n\n```python\ntlsh.oldconservativehash(os.urandom(500))\n```\n\n# Design Choices\n\n- To improve comparison accuracy, TLSH tracks counting bucket height\n  distribution in quartiles. Bigger quartile difference results in higher\n  difference score.\n- Use specially 6 trigrams to give equal representation of the bytes in the 5\n  byte sliding window which produces improved results.\n- Pearson hash is used to distribute the trigram counts to the counting buckets.\n- The global similarity score distances objects with significant size\n  difference. Global similarity can be disabled. It also distances objects with\n  different quartile distributions.\n- TLSH can be compiled to generate 70 or 134 characters hash strings.\n  The longer version has been created to use of the 70 char hash strings is not working\n  for your application.\n\nTLSH similarity is expressed as a difference score:\n\n- A score of 0 means the objects are almost identical.\n- For the 72 characters hash, there is a detailed table of experimental Detection rates and False Positive rates\n  based on the threshhold. see [Table II on page 5](https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf)\n\n# Clustering\n- See the Python code and Jupyter notebooks in tlshCluster.\n- We provide Python code for the HAC-T method.\n  We also provide code so that users can use DBSCAN.\n- We show users how to create dendograms for files, which are a useful diagram showing relationships between files and groups.\n- We provide tools for clustering the Malware Bazaar dataset, which contains a few hundred thousand samples.\n- The HAC-T method is described in [HAC-T and fast search for similarity in security](https://tlsh.org/papersDir/COINS_2020_camera_ready.pdf)\n\n# Publications\n\n- Jonathan Oliver, Chun Cheng, and Yanggui Chen,\n\t[TLSH - A Locality Sensitive Hash](https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf).\n\t4th Cybercrime and Trustworthy Computing Workshop, Sydney, November 2013\n- Jonathan Oliver, Scott Forman, and Chun Cheng,\n\t[Using Randomization to Attack Similarity Digests](https://github.com/trendmicro/tlsh/blob/master/Attacking_LSH_and_Sim_Dig.pdf).\n\tATIS 2014, November, 2014, pages 199-210\n- Jonathan Oliver, Muqeet Ali, and Josiah Hagen.\n\t[HAC-T and fast search for similarity in security](https://tlsh.org/papersDir/COINS_2020_camera_ready.pdf)\n\t2020 International Conference on Omni-layer Intelligent Systems (COINS). IEEE, 2020.\n\n# Current Version\n\n**5.0.0**\n\u003cPRE\u003e\n17/02/2026\n\tChange default behaviour to output T1 at the start of the digest\n\tA T1 digest requires that the number of buckets = 128 and the checksum is 1 byte\n\u003c/PRE\u003e\n\n# Change History\n\nsee [Change_History.md](Change_History.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrendmicro%2Ftlsh","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftrendmicro%2Ftlsh","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrendmicro%2Ftlsh/lists"}