{"id":16857674,"url":"https://github.com/brawer/pronunbot","last_synced_at":"2025-03-18T12:14:54.669Z","repository":{"id":90606584,"uuid":"159292189","full_name":"brawer/PronunBot","owner":"brawer","description":"Tools for uploading recorded pronunciations to Wikimedia Commons and Wikidata","archived":false,"fork":false,"pushed_at":"2018-12-01T09:28:35.000Z","size":129,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-24T18:12:09.244Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brawer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-27T07:15:13.000Z","updated_at":"2022-08-15T13:08:35.000Z","dependencies_parsed_at":"2023-03-13T17:55:14.888Z","dependency_job_id":null,"html_url":"https://github.com/brawer/PronunBot","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brawer%2FPronunBot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brawer%2FPronunBot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brawer%2FPronunBot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brawer%2FPronunBot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brawer","download_url":"https://codeload.github.com/brawer/PronunBot/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244217948,"owners_count":20417677,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T14:09:07.322Z","updated_at":"2025-03-18T12:14:54.647Z","avatar_url":"https://github.com/brawer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PronunBot\n\nPronunBot is a tool for uploading a batch of recorded pronunciations\nto [Wikimedia Commons](https://commons.wikimedia.org/) and\n[Wikidata](https://www.wikidata.org).\n\n## Background\n\nWe’ve built this tool at the [Plurilinguism\nHackathon](https://forum-helveticum.ch/en/hackathon/) in November\n2018.  [Lia Rumantscha](http://www.liarumantscha.ch/?changeLang=_en)\ncontributed recorded pronunciations of about 5000 phrases in the [Sursilvan\nvariant](https://en.wikipedia.org/wiki/Sursilvan_dialects_(Romansh))\nof the [Romansh\nlanguage](https://en.wikipedia.org/wiki/Romansh_language) to the\nhackathon. Back in March 2007, the pronunciations had been recorded\nas language training material; at the 2018 hackathon, Lia Rumantscha kindly\ngave permission to upload them to Wikidata under the Creative Commons Zero\nlicense.\n\n\n## Setup\n\nWe’ve used a Macintosh laptop with\n[Docker](https://docs.docker.com/docker-for-mac/install/) running a\nLinux container. For setup instructions, see the comments in `Dockerfile`.\n\n\n## Splitting multi-word phrases\n\nMany of the original recordings are multi-word phrases.\nAn example is the phrase [“jeu savess prender” 🔉](https://cdn.jsdelivr.net/gh/brawer/PronunBot/testdata/split_phrases/jeu%20savess%20prender.mp3). Because\nthe initial recording was done for language training, the words are often\nseparated by spans of silence; this is rather unusual in recorded\nspeech. Also, the original recordings often contain a few seconds of silence\nbefore and after the spoken phrase.\n\nFor using the sound snippets in Wikidata lexemes, however, we need a\nseparate sound snippet for every word without surrounding silence.\nThe tool `split_phrases.py` helps to solve this problem: it goes over the\ninput files, calls [FFmpeg](https://www.ffmpeg.org/) to detect\nsilences, and then applies a simple heuristic to split the sound file\ninto single words.  Finally, the tool will tag each snippet with\nmetadata (such as license, performer, or language) and compress the\nsound in the lossless [FLAC format](https://en.wikipedia.org/wiki/FLAC).\n\nTo run the splitting script, we’ve used the following command inside\nthe Linux container:\n\n```\npython split_phrases.py -o split  \\\n  --language=rm-sursilv --date=2007-03-09  \\\n  --performer=\"Erwin Ardüser\"  \\\n  --organization=\"Lia Rumantscha / Conradin Klaiss, 7001 Chur, Switzerland\"  \\\n  --copyright=\"2007 Lia Rumantscha\"  \\\n  --license=\"Creative Commons Zero v1.0 Universal\"  \\\n  /recordings\n```\n\nSome input files, for example the recorded phrase [“bien\ndi” 🔉](https://cdn.jsdelivr.net/gh/brawer/PronunBot/testdata/split_phrases/bien%20di.mp3),\ndo not have enough silent spans for splitting the phrase into\nwords. The tool logs the problem cases into `split-failures.txt`\nnext to the output files.\n\n\n## Quality assessment\n\nTo check the quality of recorded phrases, run `python3 assess_quality.py split`\non the Mac command line. For each phrase or word, the tool plays all available\nrecordings. The user then picks the best variant, or `0` if they’re all bad.\nThe quality assessment gets recorded into a file `qa.txt`.\n\n\n## Uploading sound files to Wikimedia Commons\n\nTo upload the recordings to Wikimedia Commons, run this in the Linux container:\n\n```\nPYTHONPATH=/pywikibot:$PYTHONPATH python upload_to_commons.py split\n```\n\nTODO: Find out why `pywikibot` cannot be installed during\ncontainer creation. There is a pip package for pywikibot, but it does\nnot seem to work properly on Python 3; perhaps it just needs to be\nupdated.\n\n\n## Uploading to Wikidata\n\nTODO\n\n\n## License\n\nThe code in this repository is copyright 2018 by [Sascha\nBrawer](http://www.brawer.ch), and has been released as free software\nunder the [MIT license](https://spdx.org/licenses/MIT.html).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrawer%2Fpronunbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrawer%2Fpronunbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrawer%2Fpronunbot/lists"}