{"id":15169479,"url":"https://github.com/alex-snd/trecover","last_synced_at":"2025-10-08T17:14:37.889Z","repository":{"id":37844291,"uuid":"370017372","full_name":"alex-snd/TRecover","owner":"alex-snd","description":"📜 A python library for distributed training of a Transformer neural network across the Internet  to solve the Running Key Cipher, widely known in the field of Cryptography.","archived":false,"fork":false,"pushed_at":"2024-07-04T14:25:51.000Z","size":43869,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-31T08:05:15.144Z","etag":null,"topics":["celery","cryptography","deep-learning","distributed-systems","distributed-training","fastapi","hivemind","keyless-reading","llm","machine-learning","mkdocs","neural-network","nlp","python","pytorch","pytorch-lightning","streamlit","text-recovery","transformers","volunteer-computing"],"latest_commit_sha":null,"homepage":"https://alex-snd.github.io/TRecover/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alex-snd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-23T10:04:09.000Z","updated_at":"2024-07-31T21:55:33.000Z","dependencies_parsed_at":"2024-09-22T22:10:57.175Z","dependency_job_id":null,"html_url":"https://github.com/alex-snd/TRecover","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alex-snd%2FTRecover","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alex-snd%2FTRecover/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alex-snd%2FTRecover/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alex-snd%2FTRecover/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alex-snd","download_url":"https://codeload.github.com/alex-snd/TRecover/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238174139,"owners_count":19428631,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["celery","cryptography","deep-learning","distributed-systems","distributed-training","fastapi","hivemind","keyless-reading","llm","machine-learning","mkdocs","neural-network","nlp","python","pytorch","pytorch-lightning","streamlit","text-recovery","transformers","volunteer-computing"],"created_at":"2024-09-27T07:01:50.789Z","updated_at":"2025-10-08T17:14:32.838Z","avatar_url":"https://github.com/alex-snd.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eWelcome to \u003ca href=\"https://alex-snd.github.io/TRecover\"\u003eText Recovery Project\u003c/a\u003e 👋\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n  A python library for distributed training of a Transformer neural network across the Internet to solve the \u003ca href=\"https://en.wikipedia.org/wiki/Running_key_cipher\"\u003eRunning Key Cipher\u003c/a\u003e, widely known in the field of cryptography.\n\u003c/p\u003e\n\n![Preview Animation](https://github.com/alex-snd/TRecover/blob/assets/preview_animation.gif?raw=true)\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://huggingface.co/spaces/alex-snd/TRecover\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/demo-%F0%9F%A4%97%20Hugging%20Face-blue?color=%2348466D\" alt=\"Hugging Face demo\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://colab.research.google.com/github/alex-snd/TRecover/blob/master/notebooks/TRecover-train-alone.ipynb\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/open%20in-Colab-blue?color=%2348466D\" alt=\"Open%20In%20Colab\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://wandb.ai/snd/TRecover?workspace=user-snd\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/visualize%20in-W\u0026B-blue?color=%2348466D\" alt=\"Visualize%20in%20W\u0026B\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://alex-snd.github.io/TRecover\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/docs-MkDocs-blue.svg?color=%2348466D\" alt=\"MkDocs link\"/\u003e\n  \u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/badge/python-v3.8.5-blue.svg?color=%2348466D\" alt=\"Python version\"/\u003e\n  \u003ca href=\"https://badge.fury.io/py/trecover\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/v/trecover?color=%2348466D\" alt=\"PyPI version\"/\u003e\n  \u003c/a\u003e\n  \u003cimg src=\"https://static.pepy.tech/personalized-badge/trecover?period=total\u0026units=international_system\u0026left_color=grey\u0026right_color=%2348466D\u0026left_text=pypi downloads\" alt=\"PyPi Downloads\"/\u003e\n  \u003ca href=\"https://github.com/alex-snd/TRecover/blob/master/LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/license-Apache%202.0-blue.svg?color=%2348466D\" alt=\"License Apache 2.0\"/\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n## 🚀 Objective\n\nThe main goal of the project is to study the possibility of using Transformer neural network to “read” meaningful text\nin columns that can be compiled for a [Running Key Cipher](https://en.wikipedia.org/wiki/Running_key_cipher). You can\nread more about the problem [here](https://alex-snd.github.io/TRecover/).\n\nIn addition, the second rather fun 😅 goal is to train a large enough model so that it can handle the case described\nbelow.\nLet there be an original sentence:\n\n\u003e Hello, my name is ***Zendaya*** Maree Stoermer Coleman but you can just call me ***Zendaya***.\n\nThe columns for this sentence will be compiled in such a way that the last seven contain from ten to thirteen letters of\nthe English alphabet, and all the others from two to five. Thus, the last seven characters will be much harder to \"read\"\ncompared to the rest. However, we can guess from the meaning of the sentence that this is the name ***Zendaya***.\nIn other words, the goal is also to train a model that can understand and correctly “read” the last word.\n\n## ⚙ Installation\n\nTrecover requires Python 3.8 or higher and supports both Windows and Linux platforms.\n\n1. Clone the repository:\n\n```shell\ngit clone https://github.com/alex-snd/TRecover.git  \u0026\u0026 cd trecover\n```\n\n2. Create a virtual environment:\n    * Windows:\n    ```shell\n    python -m venv venv\n    ```\n    * Linux:\n    ```shell\n    python3 -m venv venv\n    ```\n3. Activate the virtual environment:\n    * Windows:\n    ```shell\n    venv\\Scripts\\activate.bat\n    ```\n    * Linux:\n    ```shell\n    source venv/bin/activate\n    ```\n\n5. Install the package inside this virtual environment:\n    * Just to run the demo:\n    ```shell\n    pip install -e \".[demo]\"\n    ```\n    * To train the Transformer:\n    ```shell\n    pip install -e \".[train]\"\n    ```\n    * For development and training:\n    ```shell\n    pip install -e \".[dev]\"\n    ```\n\n6. Initialize project's environment:\n   ```shell\n   trecover init\n   ```\n   For more options use:\n   ```shell\n   trecover init --help\n   ```\n\n## 👀 Demo\n\n* 🤗 Hugging Face \u003cbr\u003e\n  You can play with a pre-trained model hosted [here](https://huggingface.co/spaces/alex-snd/TRecover).\n\n  \u003cimg align=\"center\" src=\"https://github.com/alex-snd/TRecover/blob/assets/dashboard_demo.gif?raw=true\"/\u003e\n\n* 🐳 Docker Compose\u003cbr\u003e\n    * Pull from Docker Hub:\n      ```shell\n      docker-compose -f docker/compose/scalable-service.yml up\n      ```\n    * Build from source:\n      ```shell\n      trecover download artifacts\n      docker-compose -f docker/compose/scalable-service-build.yml up\n      ```\n* 💻 Local (requires docker) \u003cbr\u003e\n    * Download pretrained model:\n      ```shell\n      trecover download artifacts\n      ```\n    * Launch the service:\n      ```shell\n      trecover up\n      ```\n\n## 🗃️ Data\n\nThe [WikiText](https://huggingface.co/datasets/wikitext) and [WikiQA](https://huggingface.co/datasets/wiki_qa) datasets\nwere used to train the model, from which all characters except English letters were removed.\u003cbr\u003e\nYou can download the cleaned dataset:\n\n```shell\ntrecover download data\n```\n\n## 💪 Train\n\nTo quickly start training the model, open\nthe [Jupyter Notebook](https://colab.research.google.com/github/alex-snd/TRecover/blob/master/notebooks/TRecover-train-alone.ipynb)\n.\n\n* 🕸️ Collaborative \u003cbr\u003e\n  TODO\n* 💻 Local \u003cbr\u003e\n  After the dataset is loaded, you can start training the model:\n  ```\n  trecover train \\\n  --project-name {project_name} \\\n  --exp-mark {exp_mark} \\\n  --train-dataset-size {train_dataset_size} \\\n  --val-dataset-size {val_dataset_size} \\\n  --vis-dataset-size {vis_dataset_size} \\\n  --test-dataset-size {test_dataset_size} \\\n  --batch-size {batch_size} \\\n  --n-workers {n_workers} \\\n  --min-noise {min_noise} \\\n  --max-noise {max_noise} \\\n  --lr {lr} \\\n  --n-epochs {n_epochs} \\\n  --epoch-seek {epoch_seek} \\\n  --accumulation-step {accumulation_step} \\\n  --penalty-coefficient {penalty_coefficient} \\\n\n  --pe-max-len {pe_max_len} \\\n  --n-layers {n_layers} \\\n  --d-model {d_model} \\\n  --n-heads {n_heads} \\\n  --d-ff {d_ff} \\\n  --dropout {dropout}\n  ```\n  For more information use `trecover train local --help`\n\n## ✔️ Related work\n\nTODO: what was done, tech stack.\n\n## 🤝 Contributing\n\nContributions, issues and feature requests are welcome.\u003cbr /\u003e\nFeel free to check [issues page](https://github.com/alex-snd/TRecover/issues) if you want to contribute.\n\n## 👏 Show your support\n\nPlease don't hesitate to ⭐️ this repository if you find it cool!\n\n## 📜 License\n\nCopyright © 2022 [Alexander Shulga](https://www.linkedin.com/in/alex-snd).\u003cbr /\u003e\nThis project is [Apache 2.0](https://github.com/alex-snd/TRecover/blob/master/LICENSE) licensed.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falex-snd%2Ftrecover","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falex-snd%2Ftrecover","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falex-snd%2Ftrecover/lists"}