https://github.com/nacnudus/psalter

Psalter web app maybe one day
https://github.com/nacnudus/psalter

Last synced: 2 months ago
JSON representation

Psalter web app maybe one day

Host: GitHub
URL: https://github.com/nacnudus/psalter
Owner: nacnudus
Created: 2024-08-08T15:02:59.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-10-27T20:40:56.000Z (6 months ago)
Last Synced: 2025-01-05T18:26:48.485Z (4 months ago)
Language: Python
Size: 797 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Psalter

One day this might be a smartphone app similar to the website
https://www.rmjs.co.uk/psalter/psalter.php.

So far it's merely a JSON representation of the psalms.

## Wordlist

https://dhanswers.ach.org/topic/creating-a-wordlist-from-text/#post-1762

```sh
tr -sc '[A-Z][a-z]' '[\012*]' < psalmtext.txt | sort | uniq > wordlist.txt
```

## Transcription and alignment

Review https://github.com/nacnudus/audio-verse-splitter

## Consolidate timestamps files

```sh
find audio/priory-1 -type f -name "timestamps.txt" | xargs -I {} bash -c 'cp "$1" temp/$(basename $(dirname "$1")).txt' -- {}
```

## Copy a directory structure without files

```sh
rsync -a --include '*/' --exclude '*' recordings/priory-1 "timestamps"
```

## Split audio files at timestamps

```sh
find recordings/st-pauls-scott -type f -name "*.flac" | xargs -I {} bash -c './split.sh $1' -- {}
```

## Calculate the actual durations of audio files

```sh
fd -I --exec sox {} -n stat 2>&1 | grep "Length" | awk '{print $3}' > durations.txt
```

## Create audio samples and metadata

Use script `manifest2metadata.R` to create a JSON manifest of each audio file
and its transcription.

Also concatenate parts of a verse into a single file, as long as the total
length is within the limits of the Whisper model, which is 30 seconds. Doing so
is more efficient, because it will halve the amount of samples to use in
training, and it mitigates the inaccuracy of splitting parts of verses that,
when sung, did not pause at the colon.

## Create a Huggingface dataset

https://me.sakana.moe/2024/09/03/a-complete-guide-to-fine-tuning-and-deploying-whisper-models/

If huggingface login and dataset upload hang, you might have to disable IPv6.

```sh
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nacnudus/psalter

Awesome Lists containing this project

README