An open API service indexing awesome lists of open source software.

https://github.com/misaghsoltani/deepcubeai

Learning Discrete World Models for Heuristic Search
https://github.com/misaghsoltani/deepcubeai

a-star-search deep-learning deep-q-network deep-reinforcement-learning deepcubea deepcubeai deepxube heuristic-search heuristic-search-algorithms model-based-reinforcement-learning planning q-learning q-star q-star-search reinforcement-learning representation-learning rlc2024 rubiks-cube sokoban world-model

Last synced: 3 months ago
JSON representation

Learning Discrete World Models for Heuristic Search

Awesome Lists containing this project

README

          

# DeepCubeAI

This repository contains the code and materials for the paper [Learning Discrete World Models for Heuristic Search](https://rlj.cs.umass.edu/2024/papers/Paper225.html).


   
   
   
   

## Table of Contents

1. [About DeepCubeAI](#about-deepcubeai)
- [Key Contributions](#key-contributions)
- [Discrete World Model](#discrete-world-model)
- [Generalizable Heuristic Function](#generalizable-heuristic-function)
- [Optimized Search](#optimized-search)
- [Main Results](#main-results)
2. [Quick Start](#quick-start)
- [Installation using `pip`](#installation-using-pip)
- [Using the Repository Directly](#using-the-repository-directly)
- [Importing the Package in Python Code](#importing-the-package-in-python-code)
3. [Usage](#usage)
- [Running the `pip` Package](#running-the-pip-package)
- [Running the Code Directly from the Repository](#running-the-code-directly-from-the-repository)
- [Using the Package in Python Code](#using-the-package-in-python-code)
- [Reproducing the Results from the Paper](#reproducing-the-results-from-the-paper)
- [Running the `pip` Package](#running-the-pip-package-1)
- [Running the Code Directly from the Repository](#running-the-code-directly-from-the-repository-1)
- [Running on a SLURM Cluster](#running-on-a-slurm-cluster)
- [Distributed Data Parallel (DDP) Training](#distributed-data-parallel-ddp-training)
- [Necessary Environment Variables](#necessary-environment-variables)
4. [Environment Integration](#environment-integration)
- [Adding a New Environment](#adding-a-new-environment)
5. [Citation](#citation)
6. [Contact](#contact)

## About DeepCubeAI

DeepCubeAI is an algorithm that learns a discrete world model and employs Deep Reinforcement Learning methods to learn a heuristic function that generalizes over start and goal states. We then integrate the learned model and the learned heuristic function with heuristic search, such as Q* search, to solve sequential decision making problems. For more details, read the [paper](https://rlj.cs.umass.edu/2024/papers/Paper225.html).


### Key Contributions

DeepCubeAI is comprised of three key components:

1. **Discrete World Model**
- Learns a world model that represents states in a discrete latent space.
- This approach tackles two challenges: model degradation and state re-identification.
- Prediction errors less than 0.5 are corrected by rounding.
- Re-identifies states by comparing two binary vectors.



2. **Generalizable Heuristic Function**
- Utilizes Deep Q-Network (DQN) and hindsight experience replay (HER) to learn a heuristic function that generalizes over start and goal states.

3. **Optimized Search**
- Integrates the learned model and the learned heuristic function with heuristic search to solve problems. It uses [Q* search](https://prl-theworkshop.github.io/prl2024-icaps/papers/9.pdf), a variant of A* search optimized for DQNs, which enables faster and more memory-efficient planning.

### Main Results
* Accurate reconstruction of ground truth images after thousands of timesteps.
* Achieved 100% success on Rubik's Cube (canonical goal), Sokoban, IceSlider, and DigitJump.
* 99.9% success on Rubik's Cube with reversed start/goal states.
* Demonstrated significant improvement in solving complex planning problems and generalizing to unseen goals.

## Quick Start

You can install the package using `pip` and run it from the command line, or you can run the code directly from the repository by cloning it, or you can import the package in your Python code and use the functions directly. Below are the instructions for each method.

### Installation using `pip`

You can install the package using pip. First create a virtual environment and activate it. You can use `Conda` or Python's built-in `venv` module (or any other virtual environment manager) to create a virtual environment. For `Conda` check the [Conda documentation](https://docs.conda.io/projects/conda/en/latest/index.html) and for Python's `venv` module check the [Python documentation](https://docs.python.org/3/library/venv.html).

After installing the virtual environment, you can create the environment and activate it using the following commands.

For Conda:
```bash
# Create an environment using Conda and install the `pip` package
conda create -n deepcubeai_env pip
# Activate the environment
conda activate deepcubeai_env
```

For Python's `venv` module:
```bash
# Create a virtual environment using Python's `venv` module
python -m `venv` deepcubeai_env
# Activate the environment
source deepcubeai_env/bin/activate
```

Once you have activated the virtual environment, you can install the package using pip.
```bash
pip install deepcubeai
```

After installing the package, you can run the code from the command line using the `deepcubeai` command. For detailed instructions on how to use the package, see the [Running the `pip` Package](#running-the-pip-package) section.

### Using the Repository Directly

You can also run the code directly from the repository by cloning the repository and running the scripts from the command line. In this case, you can use the `deepcubeai.sh` script in the repository's root directory as the entry point. Below are the instructions for preparing the repository and the virtual environment for running the code directly from the repository.

1. Clone the repository:
```bash
git clone https://github.com/misaghsoltani/DeepCubeAI.git
```

2. Change to the repository directory:
```bash
cd DeepCubeAI
```

3. Create a Conda environment:
- **For macOS:** Create an environment with dependencies specified in `environment_macos.yaml` using the following command:

```bash
conda env create -f environment_macos.yaml
```
- **For Linux and Windows:** Create an environment with dependencies specified in `environment.yaml` using the following command:

```bash
conda env create -f environment.yaml
```

4. Activate the Conda environment:
```bash
conda activate deepcubeai_env
```

> [!NOTE]
> The only difference between the macOS environment and the Linux/Windows environments is that `pytorch-cuda` is not installed for macOS, as it is not supported.

> [!Note]
> If CUDA is available on your system, by default, GPU will be used for training the models and the inference.

After activating the environment, you can run the code using the `deepcubeai.sh` script. For detailed instructions on how to use the script, see the [Running the Code Directly from the Repository](#running-the-code-directly-from-the-repository) section.

### Importing the Package in Python Code
You can also import the package in your Python code and use the functions directly. For this, first you need to [intstall the `deepcubeai` package using pip](#installation-using-pip). Then, you can import the functions and use them in your code. For examples of how to use the functions, see the [Using the Package in Python Code](#using-the-package-in-python-code) section.

## Usage

### Running the `pip` Package

After [installing the package using pip](#installation-using-pip), you can run the code from the command line using the `deepcubeai` command. The general command structure is:

```bash
deepcubeai --stage --env [arguments]
```

Replace `` with the specific stage you are running. The available stages are: `gen_offline`, `gen_env_test`, `gen_search_test`, `train_model`, `test_model`, `train_model_cont`, `test_model_cont`, `encode_offline`, `train_heur`, `qstar`, `ucs`, `gbfs`, `visualize_data`, `gen_env_test_plot`, `disc_vs_cont`.

Replace `` with one of the available environments. In the current version, the available environments are: `cube3`, `sokoban`, `iceslider`, and `digitjump`.

**`IceSlider` and `DigitJump` are the environments from [puzzlegen](https://github.com/martius-lab/puzzlegen) and are included in the current implementation for comparison purposes.**

> [!NOTE]
>
> There are additional arguments used for running the Python code of each stage, that are not used as an entry point argument. You can find out about these arguments in the Python files related to each stage in the `deepcubeai`.

> [!TIP]
>
> For examples of running the code using the `deepcubeai` command, refer to the [`reproduce_results/run_package` folder](https://github.com/misaghsoltani/DeepCubeAI/tree/main/reproduce_results/run_package) in the repository.

> [!TIP]
>
> For examples of running the code on a SLURM cluster, refer to the [`job_submissions` folder](https://github.com/misaghsoltani/DeepCubeAI/tree/main/job_submissions) in the repository. See the section [Running on a SLURM Cluster](#running-on-a-slurm-cluster) for more details.

Below are the detailed instructions for each stage:

#### 1. Generate Offline Data

Generate training and validation offline data for training the world model:

```bash
deepcubeai --stage gen_offline --env --data_dir --data_file_name --num_offline_steps --num_train_eps --num_val_eps --num_cpus [--start_level --num_levels ]
```

**--data_dir**: The folder where the data will be saved. The directory for training and validation data is `deepcubeai/data//offline`. If not given, the environment name `` will be used.

**--data_file_name**: Specifies the name for the data file. The data will be saved in the following paths. Training data will be `deepcubeai/data//offline/_train_data.pkl` and validation data will be `deepcubeai/data//offline/_val_data.pkl`. If not specified, the default is `train_data` and `val_data`. If `` does not contain `train_data`, `_train_data` will be appended. The same applies for validation data with `val_data` and `_val_data`.

The directory structure for the offline data is as follows:

```bash
deepcubeai
└── data
└──
└── offline
├── _train_data.pkl
└── _val_data.pkl
```

**--num_offline_steps**: Specifies the number of steps for generating offline data.

**--num_train_eps**: Defines the number of training episodes.
- If both `--num_train_eps` and `--num_val_eps` are not provided, defaults to `9000`.
- If only `--num_train_eps` is provided, `--num_val_eps` will be set to 10% of `--num_train_eps`.
- If not provided and `--num_val_eps` is set, `--num_train_eps` will be calculated as 90% of the total episodes.

**--num_val_eps**: Defines the number of validation episodes.
- If both `--num_val_eps` and `--num_train_eps` are not provided, defaults to `1000`.
- If only `--num_val_eps` is provided, `--num_train_eps` will be set to 90% of the total episodes.
- If not provided and `--num_train_eps` is set, `--num_val_eps` will be calculated as 10% of `--num_train_eps`.

**--start_level**: Specifies the starting level for data generation.
- If provided along with `--num_levels`, this value sets the starting seed for training, and the subsequent validation levels will be calculated as `start_level + num_levels`.
- If only `--start_level` is provided without `--num_levels`, the validation starting level is adjusted based on the number of training episodes.
- If neither `--start_level` nor `--num_levels` is provided, defaults to `-1`, indicating that no specific levels are set and the data is generated using random levels.

**--num_levels**: Specifies the number of levels to generate data for.
- If provided along with `--start_level`, it defines the number of seeds for both training and validation levels.
- If only `--num_levels` is provided without `--start_level`, a random starting seed will be generated for training, and the validation levels will be calculated based on this random start.
- If neither `--start_level` nor `--num_levels` is specified, defaults to `-1`, indicating that the number of levels used for training and validation is based on the number of episodes.

> [!NOTE]
>
> If `--num_levels` is provided, and if it is less than the number of episodes, the same level may be used more than once.

> [!IMPORTANT]
>
> In the current version, the `--start_level` and `--num_levels` arguments are only used with the `iceslider` and `digitjump` environments. For the `cube3` and `sokoban` environments, the levels are generated randomly, and the `--start_level` and `--num_levels` arguments are not used.

**--num_cpus**: Specifies the number of CPU cores to use for processing. Default is 1.

---

#### 2. Generate World Model Test Data

Generate test data for the Discrete and Continuous world models to evaluate the performance of the models after training:

```bash
deepcubeai --stage gen_env_test --env --data_dir --data_file_name --num_offline_steps --num_test_eps --num_cpus [--start_level --num_levels ]
```

**--env**: Specifies the environment name for which the test data will be generated.

**--data_dir**: The directory where the environment test data will be saved. The data is stored at `deepcubeai/data//model_test`. If not specified, the default is the environment name ``.

**--data_file_name**: Specifies the name for the test data file. The test data will be saved in the path `deepcubeai/data//model_test/_env_test_data.pkl`. If not specified, the default is `env_test_data`. If `` does not contain `env_test_data`, `_env_test_data` will be appended.

The directory structure for the environment test data is as follows:

```bash
deepcubeai
└── data
└──
└── model_test
└── _env_test_data.pkl
```

**--num_offline_steps**: Specifies the number of steps for generating the environment test data.

**--num_test_eps**: Defines the number of test episodes. Default is `1000`.

**--start_level**: Specifies the starting level for generating test data.
- If provided along with `--num_levels`, this value sets the starting seed for test data generation, and the subsequent test levels will be calculated as `start_level + num_levels`.
- If only `--start_level` is provided without `--num_levels`, the test data levels are adjusted based on the number of test episodes.
- If neither `--start_level` nor `--num_levels` is provided, defaults to `-1`, indicating that no specific levels are set, and the data is generated from random levels.

**--num_levels**: Specifies the number of levels to generate test data for.
- If provided along with `--start_level`, it defines the number of seeds for the test levels.
- If only `--num_levels` is provided without `--start_level`, a random starting seed will be generated for the test data, and subsequent levels will be calculated based on this random start.
- If neither `--start_level` nor `--num_levels` is specified, defaults to `-1`, indicating that the number of levels used for test data is based on the number of episodes.

> [!NOTE]
>
> If `--num_levels` is provided, and if it is less than the number of episodes, the same level may be used more than once.

> [!IMPORTANT]
>
> In the current version, the `--start_level` and `--num_levels` arguments are only used with the `iceslider` and `digitjump` environments. For the `cube3` and `sokoban` environments, the levels are generated randomly, and the `--start_level` and `--num_levels` arguments are not used.

**--num_cpus**: Specifies the number of CPU cores to use for processing. Default is 1.

---

#### 3. Generate Search Test Data

Generate test data for the final search to solve the problem:

```bash
deepcubeai --stage gen_search_test --env --data_dir --data_file_name --num_test_eps --num_cpus [--start_level --num_levels ]
```

**--env**: Specifies the environment for which the search test data will be generated.

**--data_dir**: Specifies the directory where the search test data will be saved. The data will be stored in `deepcubeai/data//search_test`. If not specified, it defaults to ``.

**--data_file_name**: Specifies the name for the search test data file. The data will be saved as `deepcubeai/data//search_test/_search_test_data.pkl`. If `` does not contain `search_test_data`, `_search_test_data` will be appended. Default is `search_test_data`.

The directory structure for the search test data is as follows:

```bash
deepcubeai
└── data
└──
└── search_test
└── _search_test_data.pkl
```

**--num_test_eps**: Defines the number of search test episodes to generate.

**--num_cpus**: Specifies the number of CPU cores to use for processing. Default is 1.

**--start_level**: Specifies the starting level for search test data generation.
- If provided along with `--num_levels`, it sets the starting seed for generating test data, and subsequent levels are calculated as `start_level + num_levels`.
- If only `--start_level` is provided without `--num_levels`, levels are adjusted based on the number of test episodes.
- If neither `--start_level` nor `--num_levels` is provided, defaults to `-1`, indicating that no specific levels are set and the data is generated randomly.

**--num_levels**: Specifies the number of levels for generating test data.
- If provided with `--start_level`, it defines the number of seeds for the test levels.
- If only `--num_levels` is provided without `--start_level`, a random starting seed will be generated, and subsequent levels will be calculated accordingly.
- If neither `--start_level` nor `--num_levels` is specified, defaults to `-1`, indicating that no specific levels are set and the data is generated randomly.

> [!IMPORTANT]
>
> In the current version, this stage only works for the `iceslider` and `digitjump` environments. For the `cube3` and `sokoban` environments, the search test data is provided in the repository. You can find the search test data in the `deepcubeai/data/cube3/search_test` and `deepcubeai/data/sokoban/search_test` directories.
> However, you can generate your own search test data using the functions provided in the `deepcubeai.enviroments.cube3.Cube3` and `deepcubeai.enviroments.sokoban.Sokoban` classes.

---

#### 4. Train Discrete World Model

Train the discrete environment model:

```bash
deepcubeai --stage train_model --env --data_dir --data_file_name --env_batch_size --env_model_name
```

**--data_dir**: The directory where the training and validation data is located. This should match the folder where the offline data was saved, e.g., `deepcubeai/data//offline`. If not given, the environment name `` will be used.

**--data_file_name**: The name of the training and validation data files. This should match the filename used during offline data generation. Training data used will be `deepcubeai/data//offline/_train_data.pkl` and validation data used will be `deepcubeai/data//offline/_val_data.pkl`. If not specified, the default is `train_data` and `val_data`. If `` does not contain `train_data`, `_train_data` will be appended. The same applies for validation data with `val_data` and `_val_data`. This should follow the same structure mentioned in [Generate Offline Data](#1-generate-offline-data).

**--env_batch_size**: Specifies the batch size used during training of the environment model. Default is `100`.

**--env_model_name**: Defines the name of the discrete environment model to be trained. The trained environment model will be saved in the directory `deepcubeai/saved_env_models/`.

Below is the directory structure for the saved discrete world model:

```bash
deepcubeai
└── saved_env_models
└──
├── args.pkl
├── decoder_state_dict.pt
├── encoder_state_dict.pt
├── env_state_dict.pt
├── output.txt
├── train_itr.pkl
└── pics # sample pics of reconstructions of states during training
├── recon_itr0.jpg
├── recon_itr200.jpg
└── ...
```

Below is examples of saved images of reconstructions of states during training for iterations 0, 300, 1000, and 179999:


   
   
   
   

---

#### 5. Test Discrete World Model

Test the trained discrete environment model:

```bash
deepcubeai --stage test_model --env --data_dir --data_file_name --env_model_name --print_interval
```

Or, if you want to use a different test data file, you can specify the test data file name:

```bash
deepcubeai --stage test_model --env --model_test_data_dir --env_model_name --print_interval
```

**--data_dir**: The directory where the test data is located. This should match the folder where the environment test data was saved, e.g., `deepcubeai/data//model_test`. If not given, the environment name `` will be used.

**--data_file_name**: The name of the test data file. The test data used will be `deepcubeai/data//model_test/_test_data.pkl`. If not specified, the default is `test_data`. If `` does not contain `test_data`, `_test_data` will be appended. This should follow the same structure mentioned in [Generate World Model Test Data](#2-generate-world-model-test-data).

**--model_test_data_dir**: The directory where the test data is located. Use this argument if you want to use a different test data file than the one specified using `--data_dir` and `--data_file_name`.

**--env_model_name**: The name of the trained discrete environment model to be tested. This should match the model saved during the training stage, located in the directory `deepcubeai/saved_env_models/`. This should follow the same structure mentioned in [Train Discrete World Model](#4-train-discrete-world-model).

**--print_interval**: Specifies the frequency at which the test results will be printed, and reconstruction images will be saved. The default value is `1`.

The directory structure for the test results (the saved reconstructions of states) is as follows:

```bash
deepcubeai
└── saved_env_models
└──
└── pics # sample pics of reconstructions of states during training
├── model_test_disc_0.jpg
├── model_test_disc_1.jpg
└── ...
```

---

#### 6. Train Continuous World Model

Train the continuous environment model:

```bash
deepcubeai --stage train_model_cont --env --data_dir --data_file_name --env_batch_size --env_model_name
```

**--data_dir**: The directory where the training and validation data for the continuous model is located. This should match the folder where the offline data was saved, such as `deepcubeai/data//offline`. If not specified, the environment name `` will be used.

**--data_file_name**: The name of the training and validation data files for the continuous model. The training data used will be `deepcubeai/data//offline/_train_data.pkl` and validation data will be `deepcubeai/data//offline/_val_data.pkl`. If not specified, the default names are `train_data` and `val_data`. If `` does not contain `train_data`, `_train_data` will be appended. The same logic applies to validation data with `val_data`. This should follow the same structure mentioned in [Generate Offline Data](#1-generate-offline-data).

**--env_batch_size**: The batch size used during training of the continuous environment model. The default is set to `100`.

**--env_model_name**: Specifies the name of the continuous environment model to be trained. The trained model will be saved in the directory `deepcubeai/saved_env_models/`.

The directory structure for the saved continuous world model is as follows:

```bash
deepcubeai
└── saved_env_models
└──
├── args.pkl
├── model_state_dict.pt
├── output.txt
└── train_itr.pkl
```

---

#### 7. Test Continuous World Model

Test the trained continuous model:

```bash
deepcubeai --stage test_model_cont --env --data_dir --data_file_name --env_model_name --print_interval
```

Or, if you want to use a different test data file, you can specify the test data file name:

```bash
deepcubeai --stage test_model_cont --env --model_test_data_dir --env_model_name --print_interval
```

**--data_dir**: The directory where the test data for the continuous model is located. This should match the folder where the evnironment test data was saved, such as `deepcubeai/data//model_test`. If not specified, the environment name `` will be used.

**--data_file_name**: The name of the test data file for the continuous model. The test data used will be `deepcubeai/data//model_test/_test_data.pkl`. If not specified, the default is `test_data`. If `` does not contain `test_data`, `_test_data` will be appended. This should follow the same structure mentioned in [Generate World Model Test Data](#2-generate-world-model-test-data).

**--model_test_data_dir**: The directory where the test data is located. Use this argument if you want to use a different test data file than the one specified using `--data_dir` and `--data_file_name`, you can use this argument instead.

**--env_model_name**: The name of the trained continuous environment model to be tested. This should match the model saved during the training stage, located in `deepcubeai/saved_env_models/`.

**--print_interval**: Specifies how frequently the test results will be printed. The default value is `1`.

The directory structure for the saved continuous world model is as follows:

```bash
deepcubeai
└── saved_env_models
└──
└── pics # sample pics of reconstructions of states during training
├── model_test_cont_0.jpg
├── model_test_cont_1.jpg
└── ...
```

---

#### 8. Compare Discrete World Model vs Continuous World Model

Compare the performance of discrete and continuous models. This will run the models and take the same actions in the environment and compare the predictions with the ground truth given in the test data. Finally, it will plot the MSE of the predictions for both models over time steps.

```bash
deepcubeai --stage disc_vs_cont --env --data_dir --data_file_name --env_model_dir_disc deepcubeai/saved_env_models/ --env_model_dir_cont deepcubeai/saved_env_models/ --save_dir deepcubeai/ --num_steps --num_episodes --print_interval
```

**--env**: Specifies the environment for which the comparison will be made.

**--data_dir**: The directory where the data for the comparison plot is located. This should match the folder where the data was saved, such as `deepcubeai/data//model_test`. If not specified, the environment name `` will be used.

**--data_file_name**: The name of the data file for the comparison plot. The data used will be `deepcubeai/data//model_test/_env_test_data.pkl`. If not specified, the default is `env_test_data`. If `` does not contain `env_test_data`, `_env_test_data` will be appended. This should follow the same structure mentioned in [Generate World Model Test Data](#2-generate-world-model-test-data).

**--env_model_dir_disc**: The directory of the trained discrete environment model. This should follow the same files as mentioned in [Train Discrete World Model](#4-train-discrete-world-model).

**--env_model_dir_cont**: The directory of the trained continuous environment model. This should follow the same files as mentioned in [Train Continuous World Model](#6-train-continuous-world-model).

**--save_dir**: The directory where the comparison plot will be saved. If not given, the default path will be `deepcubeai/`.

The path structure for the comparison plot will be as follows:
plots/sokoban_mse_100eps_10000steps_1.pdf

```bash
# If --save_dir is provided

└── plots
└── _mse_eps_steps_.pdf

# If --save_dir is not provided
deepcubeai
└── plots
└── _mse_eps_steps_.pdf
```

**--num_steps**: The number of steps to use in the comparison. If this is less than the number of steps in the test data, the comparison will be made over the specified number of steps. Default is `-1` (all steps in the test data).

**--num_episodes**: The number of episodes to use in the comparison. If this is less than the number of episodes in the test data, the comparison will be made over the specified number of randomly selected episodes. Default is `-1` (all episodes in the test data).

**--print_interval**: Specifies the frequency at which the comparison results will be printed. The default value is `1`.

Here is an example of the comparison plot for the Rubik's Cube environment:


ubik's Cube MSE Comparison

---

#### 9. Encode Offline Data

Encode the offline data using the trained model:

```bash
deepcubeai --stage encode_offline --env --data_dir --data_file_name --env_model_name
```

**--data_dir**: The directory where the offline data to be encoded is located. This should match the folder where the data was saved, such as `deepcubeai/data//offline`. If not specified, the environment name `` will be used as the value for ``.

**--data_file_name**: The name of the offline data file that will be encoded. The training data used will be `deepcubeai/data//offline_enc/_train_data_enc.pkl` and validation data will be `deepcubeai/data//offline_enc/_val_data_enc.pkl`. If not specified, the default names are `train_data_enc` and `val_data_enc`. If `` does not contain `train_data_enc`, `_train_data_enc` will be appended. The same logic applies to validation data with `val_data`.

The directory structure for the encoded offline data will be as follows:

```bash
deepcubeai
└── data
└──
└── offline_enc
├── _train_data_enc.pkl
└── _val_data_enc.pkl
```

**--env_model_name**: The name of the trained discrete environment model used for encoding the data. This should match the model saved during the training stage, located in `deepcubeai/saved_env_models/`.

---

#### 10. Train Heuristic Network

Train the heuristic neural network. It uses Deep Q-Network (DQN) and hindsight experience replay (HER) to learn a heuristic function that generalizes over start and goal states.

```bash
deepcubeai --stage train_heur --env --data_dir --data_file_name --env_model_name --heur_nnet_name --per_eq_tol --heur_batch_size --states_per_update --start_steps --goal_steps --max_solve_steps --num_test [--use_dist]
```

**--env**: The environment for which the heuristic network will be trained.

**--data_dir**: The directory where the offline data used for training the heuristic network is located. This should match the folder where the data was saved, such as `deepcubeai/data//offline_enc`.

**--data_file_name**: The name of the training data file used for training the heuristic network. The data file used will be `deepcubeai/data//offline_enc/_train_data_enc.pkl`. If not specified, the default name is `train_data_enc`. If `` does not contain `train_data_enc`, `_train_data_enc` will be appended.

**--env_model_name**: The name of the trained environment model used in heuristic training. This should match the model saved during the environment model training stage, located in `deepcubeai/saved_env_models/`. Also, the model files should be present in this directory as `encoder_state_dict.pt`, `decoder_state_dict.pt`, and `env_state_dict.pt`. This should follow the same structure mentioned in [Train Discrete World Model](#4-train-discrete-world-model).

**--heur_nnet_name**: Specifies the name of the heuristic neural network to be trained. The trained model will be saved in the directory `deepcubeai/saved_heur_models/`. The structure of the heuristic model directory will be as follows:

```bash
deepcubeai
└── saved_heur_models
└──
├── args.pkl
├── current
│ ├── model_state_dict.pt
│ └── status.pkl
├── output.txt
└── target
├── model_state_dict.pt
└── status.pkl
```

**--per_eq_tol**: Percent of latent state elements that need to be equal to declare equal. Default is `100`.

**--heur_batch_size**: The batch size used for training the heuristic neural network. Default is `10000`.

**--states_per_update**: How many states to train on before checking if target network should be updated. Default is `50000000`.

**--start_steps**: Maximum number of steps to take from offline states to generate start states.

**--goal_steps**: Maximum number of steps to take from the start states to generate goal states.

**--max_solve_steps**: Number of steps to take when trying to solve training states with greedy best-first search (GBFS). Each state encountered when solving is added to the training set. Number of steps starts at 1 and is increased every update until the maximum number is reached. Increasing this number can make the cost-to-go function more robust by exploring more of the state space.

**--num_test**: Number of test states. Default is `1000`.

**--use_dist**: Use distributed training for the heuristic network. If this arg is given, it will use DDP for training. **Note:** Check out the [Distributed Data Parallel (DDP) Training](#distributed-data-parallel-ddp-training) section before using this argument.

---

#### 11. Run Q* Search

Perform a weighted and batched Q* search. This search uses the trained discrete world model and the trained Deep Q-Network to solve the problem.

```bash
deepcubeai --stage qstar --env --data_dir --data_file_name --env_model_name --heur_nnet_name --qstar_batch_size --qstar_weight --per_eq_tol --qstar_results_dir --save_imgs [--search_test_data ]
```

**--env**: Specifies the environment for which the Q* search will be run.

**--data_dir**: The directory where the search test data is located. This should match the folder where the search test data was saved, such as `deepcubeai/data//search_test`. If not specified, the environment name `` will be used as the value for ``.

**--data_file_name**: The name of the search test data file. The test data used will be `deepcubeai/data//search_test/_search_test_data.pkl`. If not specified, the default is `search_test_data`. If `` does not contain `search_test_data`, `_search_test_data` will be appended. This should follow the same structure mentioned in [Generate Search Test Data](#3-generate-search-test-data).

**--env_model_name**: The name of the trained discrete world model to be used in the Q* search. This should match the model saved during the training stage, located in `deepcubeai/saved_env_models/`, and follow the same structure mentioned in [Train Discrete World Model](#4-train-discrete-world-model).

**--heur_nnet_name**: The name of the trained heuristic neural network to be used in the Q* search. This should match the model saved during the heuristic network training stage, located in `deepcubeai/saved_heur_models/`, and follow the same structure mentioned in [Train Heuristic Network](#9-train-heuristic-network), except for the `target` directory which is not used in the Q* search.

**--qstar_batch_size**: The batch size for the Q* search. This indicates the number of nodes to expand in each iteration of the search. Default is `1`.

**--qstar_weight**: The weight for path costs used in the Q* algorithm. Default is `1`.

**--per_eq_tol**: The percentage of latent state elements that need to be equal to declare two states as equal. Default is `100`.

**--qstar_results_dir**: The directory where results will be stored. If given, the path will be `deepcubeai/results//`. If not given, the default path will be `deepcubeai/results//model=__heur=_QStar_results/path_cost_weight=`.

**--save_imgs**: A flag indicating whether to save a visualization of the states on the found solution path. The images will be saved to the `qstar_soln_images` directory in the results directory. Default is `false`.

The results directory will have a structure like:

```bash
# If --qstar_results_dir is provided
deepcubeai
└── results
└──
└──
├── output.txt
├── results.pkl
└── qstar_soln_images
├── state_0.png
├── state_1.png
└── ...

# If --qstar_results_dir is not provided
deepcubeai
└── results
└──
└── model=__heur=_QStar_results
├── output.txt
├── results.pkl
└── qstar_soln_images
├── state_0.png
├── state_1.png
└── ...
```

Below is an example of the images saved for the `IceSlider` environment:

![IceSlider Q* Search Solution](https://raw.githubusercontent.com/misaghsoltani/DeepCubeAI/master/images/dcai_results_state_61.png)

**--search_test_data**: Allows specifying a custom path for the search test data file. If not provided, the default path constructed from other arguments will be used.

---

#### 12. Run Uniform Cost Search

Run the Uniform Cost Search (UCS) algorithm. This implementation uses the trained discrete world model to perform the search. The search will be performed by greedily expanding the node with the lowest cost.

```bash
deepcubeai --stage ucs --env --data_dir --data_file_name --env_model_name --ucs_batch_size --per_eq_tol --ucs_results_dir "" --save_imgs [--search_test_data ]
```

**--env**: Specifies the environment for which the Uniform Cost Search will be run.

**--data_dir**: The directory where the search test data is located. This should match the folder where the search test data was saved, such as `deepcubeai/data//search_test`. If not specified, the environment name `` will be used as the value for ``.

**--data_file_name**: The name of the search test data file. The test data used will be `deepcubeai/data//search_test/_search_test_data.pkl`. If not specified, the default is `search_test_data`. If `` does not contain `search_test_data`, `_search_test_data` will be appended. This should follow the same structure mentioned in [Generate Search Test Data](#3-generate-search-test-data).

**--env_model_name**: The name of the trained discrete world model to be used in the UCS. This should match the model saved during the training stage, located in `deepcubeai/saved_env_models/`, and follow the same structure mentioned in [Train Discrete World Model](#4-train-discrete-world-model).

**--ucs_batch_size**: The batch size for the UCS. This indicates the number of nodes to expand in each iteration of the search. Default is `1`.

**--per_eq_tol**: The percentage of latent state elements that need to be equal to declare two states as equal. Default is `100`.

**--ucs_results_dir**: The directory where the UCS results will be saved. If given, the path will be `deepcubeai/results//`. If not given, the default path will be `deepcubeai/results//model=_UCS_results`.

**--save_imgs**: A flag indicating whether to save a visualization of the states on the found solution path. The images will be saved to the `ucs_soln_images` directory in the results directory. Default is `false`.

The results directory will have a similar structure to the [Q* Search results directory](#10-run-q-search). Also, the saved images will be similar to the example image in the [Run Q* Search](#10-run-q-search) section.

**--search_test_data**: Allows specifying a custom path for the search test data file. If not provided, the default path constructed from other arguments will be used.

---

#### 13. Run Greedy Best-First Search (GBFS)

Run the Greedy Best-First Search algorithm. This implementation
uses the trained discrete world model and heuristic neural network. The search will be performed by following the greedy policy based on the heuristic values for the given number of iterations.

```bash
deepcubeai --stage gbfs --env --data_dir --data_file_name --env_model_name --heur_nnet_name --per_eq_tol --gbfs_results_dir "" --search_itrs [--search_test_data ]
```

**--env**: Specifies the environment for which the GBFS will be run.

**--data_dir**: The directory where the search test data is located. This should match the folder where the search test data was saved, such as `deepcubeai/data//search_test`. If not specified, the environment name `` will be used as the value for ``.

**--data_file_name**: The name of the search test data file. The test data used will be `deepcubeai/data//search_test/_search_test_data.pkl`. If not specified, the default is `search_test_data`. If `` does not contain `search_test_data`, `_search_test_data` will be appended. This should follow the same structure mentioned in [Generate Search Test Data](#3-generate-search-test-data).

**--env_model_name**: The name of the trained discrete world model to be used in the GBFS. This should match the model saved during the training stage, located in `deepcubeai/saved_env_models/`, and follow the same structure mentioned in [Train Discrete World Model](#4-train-discrete-world-model).

**--heur_nnet_name**: The name of the trained heuristic neural network to be used in the GBFS. This should match the model saved during the heuristic network training stage, located in `deepcubeai/saved_heur_models/`, and follow the same structure mentioned in [Train Heuristic Network](#9-train-heuristic-network). except for the `target` directory which is not used in the GBFS.

**--per_eq_tol**: The percentage of latent state elements that need to be equal to declare two states as equal. Default is `100`.

**--gbfs_results_dir**: The directory where the GBFS results will be saved. If given, the path will be `deepcubeai/results//`. If not given, the default path will be `deepcubeai/results//model=__heur=_GBFS_results`.

**--search_itrs**: The number of search iterations to perform. Default is `100`.

**--search_test_data**: Allows specifying a custom path for the search test data file. If not provided, the default path constructed from other arguments will be used.

---

#### 14. Visualize Data

Saves samples of the training and validation data for visualization.

```bash
deepcubeai --stage visualize_data --env --data_dir --data_file_name --num_train_trajs_viz --num_train_steps_viz --num_val_trajs_viz --num_val_steps_viz
```

**--env**: Specifies the environment for which the data will be visualized.

**--data_dir**: The directory where the data for visualization is located. This should match the folder where the data was saved, such as `deepcubeai/data//offline`. If not specified, the environment name `` will be used.

**--data_file_name**: The name of the data file to visualize. The data used will be `deepcubeai/data//offline/_train_data.pkl` and `deepcubeai/data//offline/_val_data.pkl`. If not specified, the default is `train_data` and `val_data`. If `` does not contain `train_data`, `_train_data` will be appended. The same logic applies to validation data with `val_data`.

**--num_train_trajs_viz**: The number of training trajectories to visualize. Default is `8`.

**--num_train_steps_viz**: The number of steps per training trajectory to visualize. Default is `2`.

**--num_val_trajs_viz**: The number of validation trajectories to visualize. Default is `8`.

**--num_val_steps_viz**: The number of steps per validation trajectory to visualize. Default is `2`.

---

### Running the Code Directly from the Repository

For running the code directly from the repository, first follow the steps 1 to 4 in the [Using the Repository Directly](#using-the-repository-directly) section to set up the environment. Then, you can run the code using the following command structure:

5. Run the code using the `deepcubeai.sh` script:
```bash
sh deepcubeai.sh --stage