{"id":43292697,"url":"https://github.com/attwad/cdf","last_synced_at":"2026-02-01T18:35:02.986Z","repository":{"id":57526062,"uuid":"97366446","full_name":"attwad/cdf","owner":"attwad","description":"Worker and elasticsearch for automated College de France audio transcripts","archived":false,"fork":false,"pushed_at":"2017-10-16T13:44:00.000Z","size":81,"stargazers_count":4,"open_issues_count":4,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-15T12:35:15.695Z","etag":null,"topics":["elasticsearch","gcp","golang","kubernetes","text-to-speech","tls"],"latest_commit_sha":null,"homepage":"https://medium.com/@timothefaudot/searching-the-college-de-france-part-2-aec176deb91d","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/attwad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-16T07:05:26.000Z","updated_at":"2023-02-16T12:19:18.000Z","dependencies_parsed_at":"2022-09-26T18:11:07.793Z","dependency_job_id":null,"html_url":"https://github.com/attwad/cdf","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/attwad/cdf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attwad%2Fcdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attwad%2Fcdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attwad%2Fcdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attwad%2Fcdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/attwad","download_url":"https://codeload.github.com/attwad/cdf/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attwad%2Fcdf/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28985818,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T18:17:03.387Z","status":"ssl_error","status_checked_at":"2026-02-01T18:16:57.287Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elasticsearch","gcp","golang","kubernetes","text-to-speech","tls"],"created_at":"2026-02-01T18:35:02.307Z","updated_at":"2026-02-01T18:35:02.977Z","avatar_url":"https://github.com/attwad.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# College de France automated audio transcripts\nWorker and elasticsearch for automated College de France audio transcripts\n\n[![Build Status](https://travis-ci.org/attwad/cdf.svg?branch=master)](https://travis-ci.org/attwad/cdf)\n[![GoDoc](https://godoc.org/github.com/attwad/cdf?status.png)](https://godoc.org/github.com/attwad/cdf)\n[![Go Report Card](https://goreportcard.com/badge/github.com/attwad/cdf)](https://goreportcard.com/report/github.com/attwad/cdf)\n\n## Worker\n\nThe worker periodically polls datastore for scheduled transcriptions, if any it downloads the mp3 files\nfrom the College de France website, converts them to FLAC, stores them in a Google Storage bucket,\nsends a Speech to Text request, stores the transcription in the same storage bucket, and index the transcripts\nin an elasticsearch instance running in the same Kubernetes cluster.\n\nA periodic job also runs to compute overall statistics about the transcriptions due to limitations of the datastore\nin this regard.\n\n## Elasticsearch\n\nElasticsearch runs as a single (thus \"yellow\") master\u0026data node in a Kubernetes cluster, it does full text indexing of\nthe transcripts using the French analyzer.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fattwad%2Fcdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fattwad%2Fcdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fattwad%2Fcdf/lists"}