{"id":16498905,"url":"https://github.com/simonschoelly/informationdistances.jl","last_synced_at":"2025-06-19T22:07:59.461Z","repository":{"id":45627017,"uuid":"327453509","full_name":"simonschoelly/InformationDistances.jl","owner":"simonschoelly","description":"A small Julia library for calculating the normalized compression distance.","archived":false,"fork":false,"pushed_at":"2021-05-24T20:13:07.000Z","size":274,"stargazers_count":5,"open_issues_count":2,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-19T22:06:27.436Z","etag":null,"topics":["compression","hacktoberfest","information-distance","kolmogorov-complexity","normalized-compression-distance","string-distance"],"latest_commit_sha":null,"homepage":"","language":"Julia","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simonschoelly.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-01-06T23:32:55.000Z","updated_at":"2025-03-17T21:50:03.000Z","dependencies_parsed_at":"2022-07-17T09:46:20.327Z","dependency_job_id":null,"html_url":"https://github.com/simonschoelly/InformationDistances.jl","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/simonschoelly/InformationDistances.jl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonschoelly%2FInformationDistances.jl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonschoelly%2FInformationDistances.jl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonschoelly%2FInformationDistances.jl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonschoelly%2FInformationDistances.jl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simonschoelly","download_url":"https://codeload.github.com/simonschoelly/InformationDistances.jl/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonschoelly%2FInformationDistances.jl/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260838646,"owners_count":23070609,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","hacktoberfest","information-distance","kolmogorov-complexity","normalized-compression-distance","string-distance"],"created_at":"2024-10-11T14:50:29.250Z","updated_at":"2025-06-19T22:07:54.438Z","avatar_url":"https://github.com/simonschoelly.png","language":"Julia","funding_links":[],"categories":[],"sub_categories":[],"readme":"# InformationDistances\n\n[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://simonschoelly.github.io/InformationDistances.jl/stable)\n[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://simonschoelly.github.io/InformationDistances.jl/dev)\n[![Build Status](https://github.com/simonschoelly/InformationDistances.jl/workflows/CI/badge.svg)](https://github.com/simonschoelly/InformationDistances.jl/actions)\n[![Coverage](https://codecov.io/gh/simonschoelly/InformationDistances.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/simonschoelly/InformationDistances.jl)\n\nThis package contains methods to calculate the [Normalized Compression Distance (NCD)](https://en.wikipedia.org/wiki/Normalized_compression_distance) - a metric for measuring how similar two strings are using a real life compression algorithm such as [bzip2](https://en.wikipedia.org/wiki/Bzip2).\n\n## Installation\n\nInformationDistances.jl is registered in the [general registry](https://github.com/JuliaRegistries/General) and can therefore be simply installed from the REPL with\n```julia\n] add InformationDistances\n```\n\n## Quick example\n\n```julia\njulia\u003e using InformationDistances\n\n# Create three strings that we want to compare - we expect s1 and s2 to be more similar than any of them to s3\njulia\u003e s1 = repeat(\"ab\", 100)\n\"abababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababab\"\n\njulia\u003e s2 = repeat(\"ba\", 100)\n\"babababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababa\"\n\njulia\u003e s3 = String(rand(('a', 'b'), 200))\n\"aabaaabaaababaabababbaaaaabaaaaaabbabbaaabbbabbbbaaaaababaabbbbaababbbbaaaaaaaaabababaaabbbbbbbabbbaabbabababbaababbbbabbbababaaaababaaababbababaaaaababbabbbbaabbaabbbaabaababbbaaaaaababbbabbbabbabbaa\"\n\n# Create a normalized compression distance with the default parameters\njulia\u003e d = NormalizedCompressionDistance();\n\njulia\u003e d(s1, s2)\n0.125\n\njulia\u003e d(s1, s3)\n0.4482758620689655\n\njulia\u003e d(s2, s3)\n0.4482758620689655\n\n# Create annother distance that uses Bzip2 for compression\njulia\u003e using CodecBzip2: Bzip2Compressor\n\njulia\u003e d_bzip2 = NormalizedCompressionDistance(CodecCompressor{Bzip2Compressor}(workfactor=250));\n\njulia\u003e d_bzip2(s1, s2)\n0.1\n\njulia\u003e d_bzip2(s1, s3)\n0.5903614457831325\n\njulia\u003e d_bzip2(s2, s3)\n0.5783132530120482\n```\n\n## Example Notebooks\nThe examples folder contains an interactive notebook that can be run with [Pluto.jl](https://github.com/fonsp/Pluto.jl). To quickly view the notebook online there is also a static non-interactive version where it is currently not possible to choose different options.\n\n* [mitochondrial-enome-phylogency.jl](https://github.com/simonschoelly/InformationDistances.jl/blob/master/examples/mitochondrial-genome-phylogency.jl) \u0026nbsp; \u0026nbsp; \u0026nbsp; [non interactive version](https://simonschoelly.github.io/InformationDistances.jl/examples/mitochondrial-genome-phylogency.jl.html)\n\n## References\n[Li, Ming, Xin Chen, Xin Li, Bin Ma, and Paul MB Vitányi. \"The similarity metric.\" IEEE transactions on Information Theory 50, no. 12 (2004): 3250-3264.](https://homepages.cwi.nl/~paulv/papers/similarity.pdf)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonschoelly%2Finformationdistances.jl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimonschoelly%2Finformationdistances.jl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonschoelly%2Finformationdistances.jl/lists"}