Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rbroc/sbert-align

Compute alignment using SentenceBERT
https://github.com/rbroc/sbert-align

Last synced: 28 days ago
JSON representation

Compute alignment using SentenceBERT

Awesome Lists containing this project

README

        

# sbert-align
Compute parent-child alignment using SentenceBERT

### Usage
1. Create a virtual environment (not necessary).

You can do so by typing:

```
python3 -m venv PATH_TO_ENV
source PATH_TO_ENV/bin/activate
```
Replace `PATH_TO_ENV` with path for virtual environment

2. Install requirements
```pip install -r requirements.txt```

3. Run the `align.py` script.

`python3 align.py --lag 1 --model all-mpnet-base-v2`.

Arguments are customizable.

Note that the script will be looking for a `transcripts.txt` or `surrogates.txt` file in the `data` folder, and outputs will be saved in an `outputs` folder.

4. Deactivate once you're done, by running ```deactivate```.

### Output columns
- Turn metadata: (`ChildID|ID`, `Visit`, `Turn`)
- `Lag`: 1 if alignment is computed with previous turn, 2 if two turns back. Note that even numbers compute alignment with previous turns from same speaker;
- `ModelId`: Which SentenceBERT checkpoint we are using, see https://www.sbert.net/docs/pretrained_models.html for available models;
- `SemanticAlignment`: cosine similarity between sequence encodings;
- `AlignmentType`: 'child2caregiver' or 'caregiver2child'

### Notes on study 2
- We keep the second iteration of a conversation, when the Turn ID is repeated
- Turns where previous index is missing are not coded for current-to-1back alignment, and for 1back-to-2back alignment
- Turns where the preceding turn does not follow the previous one are not coded for 1back-to-2back alignment

### Potential expansion:
- Make synthetic raw data for better reproducibility