Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/crisostomi/latent-aggregation

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/crisostomi/latent-aggregation
Owner: crisostomi
License: mit
Created: 2023-03-17T10:31:18.000Z (almost 2 years ago)
Default Branch: master
Last Pushed: 2023-11-11T14:04:49.000Z (about 1 year ago)
Last Synced: 2024-10-26T22:24:53.311Z (3 months ago)
Language: Jupyter Notebook
Size: 11.7 MB
Stars: 2
Watchers: 5
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Latent Aggregation

Aggregating seemingly different latent spaces.

## Quickstart

[comment]: <> (> Fill me!)

## Development installation

Setup the development environment:

```bash
git clone [email protected]:crisostomi/latent-aggregation.git
cd latent-aggregation
conda env create -f env.yaml
conda activate la
pre-commit install
```

Run the tests:

```bash
pre-commit run --all-files
```
We use HuggingFace Datasets throughout the project; assuming you already have a HF account (create one if you don't), you will have to login via
```
huggingface-cli login
```
which will prompt you to either create a new token or paste an existing one.

### Update the dependencies

Re-install the project in edit mode:

```bash
pip install -e '.[dev]'
```

## Experiment flow

Each experiment `exp_name` in `part_shared_part_novel, same_classes_disj_samples, totally_disjoint` has three scripts:
- `prepare_data_${exp_name}.py` divides the data in tasks according to what the experiment expects;
- `run_${exp_name}.py` trains the task-specific models and uses them to embed the data for each task;
- `analyze_${exp_name}.py` obtains the results for the experiment.

Each script has a corresponding conf file in `conf/` with the same name.
So, to run the `part_shared_part_novel`, you have to first configure the experiment in `conf/prepare_data_part_shared_part_novel.yaml`. In this case, you have to choose a value for `num_shared_classes` and `num_novel_classes_per_task`. Now you will prepare the data via
```
python src/la/scripts/prepare_data_part_shared_part_novel.py
```
this will populate the `data/${dataset_name}/part_shared_part_novel/` folder. Then you'll embed the data by running
```
python src/la/scripts/run_part_shared_part_novel.py
```
so that now you will have the encoded data in `data/${dataset_name}/part_shared_part_novel/S${num_shared_classes}_N${num_novel_classes_per_task}`.
Having all the latent spaces, you can now run the actual experiment and collect the results by running
```
python src/la/scripts/analyze_part_shared_part_novel.py
```
The results can now be found in `results/part_shared_part_novel`.