https://github.com/smartcat-labs/srbedding

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/smartcat-labs/srbedding
Owner: smartcat-labs
Created: 2024-04-19T09:57:58.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-10-23T11:44:08.000Z (7 months ago)
Last Synced: 2025-03-29T00:22:15.930Z (2 months ago)
Language: Python
Size: 67.2 MB
Stars: 2
Watchers: 4
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# SRBedding

## Project setup
Before runnning the project setup the environment
`poetry shell`
`poetry update`

## Evaluation jupyter notebook
Inside or evaluation-pipetine add datasets folder and results.
For loading SQuAD-sr you need to add the [squad-sr-lat.json](https://www.kaggle.com/datasets/aleksacvetanovic/squad-sr) into the datasets folder.
First run the make-evaluation-datasets.ipynb. This will create all the files needed
Then run
`cd evaluation-pipetine/`
`python evaluation-pipieline.py`

## Training dataset creation
Run the following commands for creating the training dataset:
- `cd training_dataset`
- `python .\main_training.py`
- `python .\batch_loading.py`
The .parquet files will be saved in the datasets folder.

## Translating dataset
The folder translation_pipeline is used for translating [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) and [natural_questions](https://huggingface.co/datasets/google-research-datasets/natural_questions) from English to Serbian. Translated queries and contexts from this datasets will be used for evaluation.
Run the following commands:
- `cd translation_pipeline`
- `python .\sending_batch.py`
- `python .\processing_batch.py`

The folder translation_sts is used for translating one sentence pair from the [sts dataset](https://huggingface.co/datasets/mteb/stsbenchmark-sts) for the distiladion evaluator.
Run the following commands:
- `cd translation_sts`
- `python .\sending_batch.py`
- `python .\processing_batch.py`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/smartcat-labs/srbedding

Awesome Lists containing this project

README