https://github.com/rbroc/sbert-align

Compute alignment using SentenceBERT
https://github.com/rbroc/sbert-align

Last synced: 3 months ago
JSON representation

Compute alignment using SentenceBERT

Host: GitHub
URL: https://github.com/rbroc/sbert-align
Owner: rbroc
Created: 2022-12-09T08:31:13.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-12-18T08:55:07.000Z (over 1 year ago)
Last Synced: 2025-02-05T16:15:25.026Z (5 months ago)
Language: Jupyter Notebook
Size: 11.1 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# sbert-align
Compute parent-child alignment using SentenceBERT

### Usage
1. Create a virtual environment (not necessary).

You can do so by typing:

```
python3 -m venv PATH_TO_ENV
source PATH_TO_ENV/bin/activate
```
Replace `PATH_TO_ENV` with path for virtual environment

2. Install requirements
```pip install -r requirements.txt```

3. Run the `align.py` script.

`python3 align.py --lag 1 --model all-mpnet-base-v2`.

Arguments are customizable.

Note that the script will be looking for a `transcripts.txt` or `surrogates.txt` file in the `data` folder, and outputs will be saved in an `outputs` folder.

4. Deactivate once you're done, by running ```deactivate```.

### Output columns
- Turn metadata: (`ChildID|ID`, `Visit`, `Turn`)
- `Lag`: 1 if alignment is computed with previous turn, 2 if two turns back. Note that even numbers compute alignment with previous turns from same speaker;
- `ModelId`: Which SentenceBERT checkpoint we are using, see https://www.sbert.net/docs/pretrained_models.html for available models;
- `SemanticAlignment`: cosine similarity between sequence encodings;
- `AlignmentType`: 'child2caregiver' or 'caregiver2child'

### Notes on study 2
- We keep the second iteration of a conversation, when the Turn ID is repeated
- Turns where previous index is missing are not coded for current-to-1back alignment, and for 1back-to-2back alignment
- Turns where the preceding turn does not follow the previous one are not coded for 1back-to-2back alignment

### Potential expansion:
- Make synthetic raw data for better reproducibility

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rbroc/sbert-align

Awesome Lists containing this project

README