{"id":22901062,"url":"https://github.com/mpdn/skjul","last_synced_at":"2025-06-29T06:06:43.338Z","repository":{"id":67595341,"uuid":"177454769","full_name":"mpdn/skjul","owner":"mpdn","description":"Hide data in plaintext","archived":false,"fork":false,"pushed_at":"2019-03-25T16:44:28.000Z","size":152,"stargazers_count":16,"open_issues_count":1,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-05-08T01:44:47.879Z","etag":null,"topics":["steganography","word"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mpdn.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-24T18:44:02.000Z","updated_at":"2025-01-19T17:10:17.000Z","dependencies_parsed_at":"2023-02-26T19:31:20.503Z","dependency_job_id":null,"html_url":"https://github.com/mpdn/skjul","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mpdn/skjul","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpdn%2Fskjul","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpdn%2Fskjul/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpdn%2Fskjul/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpdn%2Fskjul/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mpdn","download_url":"https://codeload.github.com/mpdn/skjul/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpdn%2Fskjul/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262545032,"owners_count":23326659,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["steganography","word"],"created_at":"2024-12-14T01:31:31.206Z","updated_at":"2025-06-29T06:06:43.311Z","avatar_url":"https://github.com/mpdn.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Skjul - Text-based steganography\n\n*Steganography* is the practice of inconspicuously hiding data (a secret) within\nsome other data (a carrier). Often this is within images, where the lower bits\ncan be used to store a secret message. While having few real uses, steganography\ncan be a fun exercise in information theory.\n\nSkjul (Danish for *hide*, as in *to hide*), is a text-based steganography\nimplementation. Given a carrier message, Skjul can encode a secret bitstring\ninto it by slightly changing words - hopefully so little as to be imperceptible\nto an uninitiated reader.\n\n## Example\n\n    $ cat example.txt\n    ‘The Babel fish,’ said The Hitchhiker’s Guide to the Galaxy quietly, ‘is\n    small, yellow, and leech-like, and probably the oddest thing in the\n    Universe. It feeds on brainwave energy received not from its own carrier but\n    from those around it. It absorbs all unconscious mental frequencies from\n    this brainwave energy to nourish itself with. It then excretes into the mind\n    of its carrier a telepathic matrix formed by combining the conscious thought\n    frequencies with nerve signals picked up from the speech centres of the\n    brain which has supplied them. The practical upshot of all this is that if\n    you stick a Babel fish in your ear you can instantly understand anything\n    said to you in any form of language.\n\n    $ cat example.txt | ./skjul.py encode '101010' | tee 'encoded.txt'\n    ‘The Babel fish,’ said The Hitchhiker’s Guide to the Galaxy quietly, ‘is\n    small, yellow, and leech-like, and possibly the oddest thing in the\n    Universe. It feeds on brainwave energy recieved not to its own carrier but\n    to those around it. It absorbs all unconscious mental frequencies from this\n    brainwave energy to nourish itself with. It then excretes into the mind of\n    its carrier a telepathic matrix formed by combining the conscious thought\n    frequencies with nerve signals picked up from the speech centres of the\n    brain which has supplied them. The practical upshot of all that is that if\n    you stick a Babel fish in your ear you can instantly comprehend anything\n    said from you in any form of language.\n\n    $ wdiff example.txt encoded.txt\n    ‘The Babel fish,’ said The Hitchhiker’s Guide to the Galaxy quietly, ‘is\n    small, yellow, and leech-like, and [-probably-] {+possibly+} the oddest\n    thing in the Universe. It feeds on brainwave energy [-received-]\n    {+recieved+} not [-from-] {+to+} its own carrier but [-from-] {+to+} those\n    around it. It absorbs all unconscious mental frequencies from this brainwave\n    energy to nourish itself with. It then excretes into the mind of its carrier\n    a telepathic matrix formed by combining the conscious thought frequencies\n    with nerve signals picked up from the speech centres of the brain which has\n    supplied them. The practical upshot of all [-this-] {+that+} is that if you\n    stick a Babel fish in your ear you can instantly [-understand-]\n    {+comprehend+} anything said [-to-] {+from+} you in any form of language.\n\n    $ cat encoded.txt | ./skjul decode\n    101010\n\nThe secrets can only be bitstrings.\n\n## How it works\n\n### Word pairs\n\nGiven a carrier string and a secret bitstring, the basic idea is assign to each\nword in the carrier string a *paired word*. The secret message can then be\nencoded in our choice of word. To not have a noticeable difference, the paired\nword should be able to \"work\" in the same context as the original word, i.e. we\nwish to select words that are likely to share the same neighboring words.\n\nWord-vector models is a common way to model these *distributional properties* of\nwords. In such a model, each word has a vector embedding of e.g. 300 dimensions.\nThese embeddings are built such that words that tend to have similar contexts\nalso tend to have similar embeddings.\n\nUsing a word vector model, we pair each embedding with a neighbor using cosine\ndistance as the metric. Note that these pairings must be exclusive, i.e.\n`[(a,b), (a,c)]` is not valid because `a` participates in both pairs. Instead,\nwe find the k-nearest neighbors for each word then greedily pair words based on\nthe distance to the closest non-paired neighbor. This means that words are not\nalways paired with their closest neighbor and some words are not paired at all.\n\nThis repository includes a precomputed pair list based on\n[Facebook's fasttext vectors](https://fasttext.cc/docs/en/crawl-vectors.html).\n\nFor example, the string has \"this is a test\" has 3 words in the pair list:\n\n| 1    | 0     | Distance   |\n|------|-------|------------|\n| this | that  | 0.17533547 |\n| was  | is    | 0.28453428 |\n| test | tests | 0.20037645 |\n\nTo encode a *k*-bit message, we simply pick the *k* tokens with lowest distance\nto their paired word and swap or not depending on the corresponding bit in the\nsecret. Eg. to encode a single 1-bit, we change \"this is a test\" to \"that is a\ntest\".\n\n### Variable length coding\n\nThe method as outlined above requires the person decoding to know the length of\nthe secret. This makes it somewhat unpractical and it would be better to encode\nthe length as part of the message itself. To do this, we need a prefix-free\nencoding scheme, as we do not know the amount of bits for the length beforehand.\n\nFor this, we use\n[Elias gamma coding](https://en.wikipedia.org/wiki/Elias_gamma_coding). In gamma\ncoding, we first encode the length of the integer in unary zero bits followed by\nthe length integer itself.\n\nA downside of this is that it increases the length of the secret, especially for\nsmall secrets. This is due to how the number of bits in the length itself is\ncomparatively more significant than it would be for a longer message.\n\n### A pinch of noise\n\nLastly, we add XOR encryption to the secret using a pseudorandom number\ngenerator (PRNG). This breaks any predictable patterns that might be in the\nsecret. For example, a secret of only zeros and a carrier that contains the same\npair often would always pick the same word. This also makes it possible to\nspecify a key by using the key as seed for the PRNG.\n\nWe also add an small, optional amount of noise to each word pair distance. This\nmake the pairs chosen more varied, such that it is not always the minimum word\nthat is chosen.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpdn%2Fskjul","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmpdn%2Fskjul","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpdn%2Fskjul/lists"}