https://github.com/attwad/cdf
Worker and elasticsearch for automated College de France audio transcripts
https://github.com/attwad/cdf
elasticsearch gcp golang kubernetes text-to-speech tls
Last synced: 5 months ago
JSON representation
Worker and elasticsearch for automated College de France audio transcripts
- Host: GitHub
- URL: https://github.com/attwad/cdf
- Owner: attwad
- License: mit
- Created: 2017-07-16T07:05:26.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-10-16T13:44:00.000Z (over 8 years ago)
- Last Synced: 2024-11-15T12:35:15.695Z (over 1 year ago)
- Topics: elasticsearch, gcp, golang, kubernetes, text-to-speech, tls
- Language: Go
- Homepage: https://medium.com/@timothefaudot/searching-the-college-de-france-part-2-aec176deb91d
- Size: 79.1 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# College de France automated audio transcripts
Worker and elasticsearch for automated College de France audio transcripts
[](https://travis-ci.org/attwad/cdf)
[](https://godoc.org/github.com/attwad/cdf)
[](https://goreportcard.com/report/github.com/attwad/cdf)
## Worker
The worker periodically polls datastore for scheduled transcriptions, if any it downloads the mp3 files
from the College de France website, converts them to FLAC, stores them in a Google Storage bucket,
sends a Speech to Text request, stores the transcription in the same storage bucket, and index the transcripts
in an elasticsearch instance running in the same Kubernetes cluster.
A periodic job also runs to compute overall statistics about the transcriptions due to limitations of the datastore
in this regard.
## Elasticsearch
Elasticsearch runs as a single (thus "yellow") master&data node in a Kubernetes cluster, it does full text indexing of
the transcripts using the French analyzer.