Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rbroc/sbert-align
Compute alignment using SentenceBERT
https://github.com/rbroc/sbert-align
Last synced: 28 days ago
JSON representation
Compute alignment using SentenceBERT
- Host: GitHub
- URL: https://github.com/rbroc/sbert-align
- Owner: rbroc
- Created: 2022-12-09T08:31:13.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-18T08:55:07.000Z (about 1 year ago)
- Last Synced: 2024-10-24T11:52:07.866Z (3 months ago)
- Language: Jupyter Notebook
- Size: 11.1 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# sbert-align
Compute parent-child alignment using SentenceBERT### Usage
1. Create a virtual environment (not necessary).You can do so by typing:
```
python3 -m venv PATH_TO_ENV
source PATH_TO_ENV/bin/activate
```
Replace `PATH_TO_ENV` with path for virtual environment2. Install requirements
```pip install -r requirements.txt```3. Run the `align.py` script.
`python3 align.py --lag 1 --model all-mpnet-base-v2`.
Arguments are customizable.
Note that the script will be looking for a `transcripts.txt` or `surrogates.txt` file in the `data` folder, and outputs will be saved in an `outputs` folder.
4. Deactivate once you're done, by running ```deactivate```.
### Output columns
- Turn metadata: (`ChildID|ID`, `Visit`, `Turn`)
- `Lag`: 1 if alignment is computed with previous turn, 2 if two turns back. Note that even numbers compute alignment with previous turns from same speaker;
- `ModelId`: Which SentenceBERT checkpoint we are using, see https://www.sbert.net/docs/pretrained_models.html for available models;
- `SemanticAlignment`: cosine similarity between sequence encodings;
- `AlignmentType`: 'child2caregiver' or 'caregiver2child'### Notes on study 2
- We keep the second iteration of a conversation, when the Turn ID is repeated
- Turns where previous index is missing are not coded for current-to-1back alignment, and for 1back-to-2back alignment
- Turns where the preceding turn does not follow the previous one are not coded for 1back-to-2back alignment### Potential expansion:
- Make synthetic raw data for better reproducibility