Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alex-snd/trecover
π A python library for distributed training of a Transformer neural network across the Internet to solve the Running Key Cipher, widely known in the field of Cryptography.
https://github.com/alex-snd/trecover
celery cryptography deep-learning distributed-systems distributed-training fastapi hivemind keyless-reading llm machine-learning mkdocs neural-network nlp python pytorch pytorch-lightning streamlit text-recovery transformers volunteer-computing
Last synced: 4 months ago
JSON representation
π A python library for distributed training of a Transformer neural network across the Internet to solve the Running Key Cipher, widely known in the field of Cryptography.
- Host: GitHub
- URL: https://github.com/alex-snd/trecover
- Owner: alex-snd
- License: apache-2.0
- Created: 2021-05-23T10:04:09.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-07-04T14:25:51.000Z (7 months ago)
- Last Synced: 2024-10-10T21:41:37.891Z (4 months ago)
- Topics: celery, cryptography, deep-learning, distributed-systems, distributed-training, fastapi, hivemind, keyless-reading, llm, machine-learning, mkdocs, neural-network, nlp, python, pytorch, pytorch-lightning, streamlit, text-recovery, transformers, volunteer-computing
- Language: Python
- Homepage: https://alex-snd.github.io/TRecover/
- Size: 41.8 MB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Welcome to Text Recovery Project π
A python library for distributed training of a Transformer neural network across the Internet to solve the Running Key Cipher, widely known in the field of cryptography.![Preview Animation](https://github.com/alex-snd/TRecover/blob/assets/preview_animation.gif?raw=true)
## π Objective
The main goal of the project is to study the possibility of using Transformer neural network to βreadβ meaningful text
in columns that can be compiled for a [Running Key Cipher](https://en.wikipedia.org/wiki/Running_key_cipher). You can
read more about the problem [here](https://alex-snd.github.io/TRecover/).In addition, the second rather fun π goal is to train a large enough model so that it can handle the case described
below.
Let there be an original sentence:> Hello, my name is ***Zendaya*** Maree Stoermer Coleman but you can just call me ***Zendaya***.
The columns for this sentence will be compiled in such a way that the last seven contain from ten to thirteen letters of
the English alphabet, and all the others from two to five. Thus, the last seven characters will be much harder to "read"
compared to the rest. However, we can guess from the meaning of the sentence that this is the name ***Zendaya***.
In other words, the goal is also to train a model that can understand and correctly βreadβ the last word.## β Installation
Trecover requires Python 3.8 or higher and supports both Windows and Linux platforms.
1. Clone the repository:
```shell
git clone https://github.com/alex-snd/TRecover.git && cd trecover
```2. Create a virtual environment:
* Windows:
```shell
python -m venv venv
```
* Linux:
```shell
python3 -m venv venv
```
3. Activate the virtual environment:
* Windows:
```shell
venv\Scripts\activate.bat
```
* Linux:
```shell
source venv/bin/activate
```5. Install the package inside this virtual environment:
* Just to run the demo:
```shell
pip install -e ".[demo]"
```
* To train the Transformer:
```shell
pip install -e ".[train]"
```
* For development and training:
```shell
pip install -e ".[dev]"
```6. Initialize project's environment:
```shell
trecover init
```
For more options use:
```shell
trecover init --help
```## π Demo
* π€ Hugging Face
You can play with a pre-trained model hosted [here](https://huggingface.co/spaces/alex-snd/TRecover).
* π³ Docker Compose
* Pull from Docker Hub:
```shell
docker-compose -f docker/compose/scalable-service.yml up
```
* Build from source:
```shell
trecover download artifacts
docker-compose -f docker/compose/scalable-service-build.yml up
```
* π» Local (requires docker)
* Download pretrained model:
```shell
trecover download artifacts
```
* Launch the service:
```shell
trecover up
```## ποΈ Data
The [WikiText](https://huggingface.co/datasets/wikitext) and [WikiQA](https://huggingface.co/datasets/wiki_qa) datasets
were used to train the model, from which all characters except English letters were removed.
You can download the cleaned dataset:```shell
trecover download data
```## πͺ Train
To quickly start training the model, open
the [Jupyter Notebook](https://colab.research.google.com/github/alex-snd/TRecover/blob/master/notebooks/TRecover-train-alone.ipynb)
.* πΈοΈ Collaborative
TODO
* π» Local
After the dataset is loaded, you can start training the model:
```
trecover train \
--project-name {project_name} \
--exp-mark {exp_mark} \
--train-dataset-size {train_dataset_size} \
--val-dataset-size {val_dataset_size} \
--vis-dataset-size {vis_dataset_size} \
--test-dataset-size {test_dataset_size} \
--batch-size {batch_size} \
--n-workers {n_workers} \
--min-noise {min_noise} \
--max-noise {max_noise} \
--lr {lr} \
--n-epochs {n_epochs} \
--epoch-seek {epoch_seek} \
--accumulation-step {accumulation_step} \
--penalty-coefficient {penalty_coefficient} \--pe-max-len {pe_max_len} \
--n-layers {n_layers} \
--d-model {d_model} \
--n-heads {n_heads} \
--d-ff {d_ff} \
--dropout {dropout}
```
For more information use `trecover train local --help`## βοΈ Related work
TODO: what was done, tech stack.
## π€ Contributing
Contributions, issues and feature requests are welcome.
Feel free to check [issues page](https://github.com/alex-snd/TRecover/issues) if you want to contribute.## π Show your support
Please don't hesitate to βοΈ this repository if you find it cool!
## π License
Copyright Β© 2022 [Alexander Shulga](https://www.linkedin.com/in/alex-snd).
This project is [Apache 2.0](https://github.com/alex-snd/TRecover/blob/master/LICENSE) licensed.