{"id":50030021,"url":"https://github.com/bubustack/openai-stt-engram","last_synced_at":"2026-05-20T19:52:11.416Z","repository":{"id":352188615,"uuid":"1095860762","full_name":"bubustack/openai-stt-engram","owner":"bubustack","description":"OpenAI speech-to-text Engram for bobrapet — real-time transcription with Whisper and GPT-4o audio models.","archived":false,"fork":false,"pushed_at":"2026-05-02T12:42:20.000Z","size":100,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-02T14:16:02.737Z","etag":null,"topics":["batch","bubustack","engram","go","kubernetes","openai","speech-to-text","streaming","transcription","whisper"],"latest_commit_sha":null,"homepage":"https://bubustack.io/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bubustack.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":"SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["bubustack"]}},"created_at":"2025-11-13T16:02:21.000Z","updated_at":"2026-05-02T12:19:44.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/bubustack/openai-stt-engram","commit_stats":null,"previous_names":["bubustack/openai-stt-engram"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/bubustack/openai-stt-engram","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bubustack%2Fopenai-stt-engram","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bubustack%2Fopenai-stt-engram/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bubustack%2Fopenai-stt-engram/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bubustack%2Fopenai-stt-engram/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bubustack","download_url":"https://codeload.github.com/bubustack/openai-stt-engram/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bubustack%2Fopenai-stt-engram/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33273431,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-20T15:12:43.734Z","status":"ssl_error","status_checked_at":"2026-05-20T15:12:42.300Z","response_time":356,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["batch","bubustack","engram","go","kubernetes","openai","speech-to-text","streaming","transcription","whisper"],"created_at":"2026-05-20T19:52:10.629Z","updated_at":"2026-05-20T19:52:11.409Z","avatar_url":"https://github.com/bubustack.png","language":"Go","funding_links":["https://github.com/sponsors/bubustack"],"categories":[],"sub_categories":[],"readme":"# 🗣️ OpenAI Speech-to-Text Engram\n\nStreaming Engram that transcribes PCM audio using OpenAI Whisper/GPT-4o transcription APIs.\n\n## 🌟 Highlights\n\n- Supports batch and streaming runtime modes with the same transcription contract.\n- Emits incremental and final transcript events for low-latency downstream routing.\n- Supports identity allowlists/ignore lists so realtime stories can avoid self-transcription loops.\n- Preserves structured transcript payloads for downstream templating and transport delivery.\n\n## 🚀 Quick Start\n\n```bash\nmake lint\ngo test ./...\nmake docker-build\n```\n\nApply `Engram.yaml`, mount an `openai` secret with `API_KEY`, and reference the\ntemplate from your Story step.\n\n## ⚙️ Configuration (`Engram.spec.with`)\n\n| Field | Type | Description | Default |\n|-------|------|-------------|---------|\n| `model` | string | OpenAI transcription model ID to use for every request. | `gpt-4o-mini-transcribe` |\n| `responseFormat` | string | Default response shape (`auto`, `json`, `text`, `srt`, `verbose_json`, `vtt`). | `auto` |\n| `timestampGranularity` | string | `word`, `segment`, or `none`. | `none` |\n| `includeLogProbs` | bool | Include token-level log probabilities when the model supports them. | `false` |\n| `diarize` | bool | Enable speaker diarization for supported transcription models. | `false` |\n| `prompt` | string | Primer text used to steer transcription output. | unset |\n| `temperature` | number | Sampling temperature for transcription. | `0` |\n| `language` | string | BCP-47 language hint applied to every transcription request. | `en` |\n| `include` | []string | Values forwarded to OpenAI's `include` parameter (for example `logprobs`). | unset |\n| `chunking` | object | Server-side VAD chunking settings (`mode`, `prefixPaddingMs`, `silenceDurationMs`, `threshold`). | unset |\n| `task` | string | Default operation: `transcribe` or `translate`. | `transcribe` |\n| `stream` | bool | Enable OpenAI SSE streaming mode by default. | `false` |\n| `ignoreIdentities` | []string | Skip packets when the participant identity matches (supports `*`, `prefix*`). | unset |\n| `allowIdentities` | []string | Optional allowlist; when provided, only matching identities are transcribed. | unset |\n\nUse `ignoreIdentities` to prevent the engram from transcribing playback/agent participants (for\nexample `bubu-*` or a specific `{{ inputs.event.id }}-playback` identity). Pair it with\n`allowIdentities` when the Story should only capture speech from a curated set of users.\n\n## 🔐 Secrets\n\nSecret `openai` must provide `API_KEY`. Optional overrides include `BASE_URL`, `ORG_ID`, and `PROJECT_ID`.\n\n## 📥 Inputs\n\n```json\n{\n  \"audio\": {\n    \"encoding\": \"pcm\",\n    \"sampleRate\": 48000,\n    \"channels\": 1,\n    \"data\": \"\u003cbase64-encoded audio\u003e\"\n  },\n  \"responseFormat\": \"json\",\n  \"language\": \"en\"\n}\n```\n\n`audio` may be supplied inline via `data` or through shared-storage metadata.\n`format` is still accepted as a legacy alias, but `responseFormat` is preferred.\nRequest payloads can also override `timestampGranularity`, `includeLogProbs`,\n`diarize`, `prompt`, `temperature`, `include`, `chunking`, `task`, and `stream`.\n\n## 📤 Outputs\n\n```json\n{\n  \"text\": \"hello world\",\n  \"words\": [...],\n  \"segments\": [...]\n}\n```\n\n`words` and `segments` are populated only when the template or request enables timestamp granularity.\n\n## 🔄 Streaming Mode\n\nWhen the engram runs in real-time mode it fans out structured messages over the transport:\n\n| Type | Description |\n|------|-------------|\n| `speech.transcript.delta` | Incremental transcript text for low-latency rendering. |\n| `speech.transcript.done` | Final chunk for the current stream, including usage/logprobs metadata. |\n| `speech.transcript.v1` | Full transcription result (non-translation). |\n| `speech.translation.v1` | Full translation result. |\n\nEach payload includes provider/model metadata so downstream steps can decide whether to combine them with other vendors.\n\n## 🧪 Local Development\n\n- `make lint` – Run the shared lint and static-analysis checks.\n- `go test ./...` – Run the transcription unit/integration tests.\n- `make docker-build` – Build the engram image for local clusters.\n- Set `BUBU_DEBUG=true` to log sanitized request summaries and returned transcripts without printing raw audio bytes.\n\n## 🤝 Community \u0026 Support\n\n- [Contributing](./CONTRIBUTING.md)\n- [Support](./SUPPORT.md)\n- [Security Policy](./SECURITY.md)\n- [Code of Conduct](./CODE_OF_CONDUCT.md)\n- [Discord](https://discord.gg/dysrB7D8H6)\n\n\n## 📄 License\n\nCopyright 2025 BubuStack.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbubustack%2Fopenai-stt-engram","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbubustack%2Fopenai-stt-engram","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbubustack%2Fopenai-stt-engram/lists"}