{"id":25180385,"url":"https://github.com/shipclojure/voice-fn","last_synced_at":"2025-05-07T06:04:33.059Z","repository":{"id":269611540,"uuid":"907985917","full_name":"shipclojure/voice-fn","owner":"shipclojure","description":"A Clojure library for building real-time voice-enabled AI pipelines. voice-fn handles the orchestration of speech recognition, audio processing, and AI service integration with the elegance of functional programming.","archived":false,"fork":false,"pushed_at":"2025-05-04T08:28:28.000Z","size":2829,"stargazers_count":63,"open_issues_count":1,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-07T06:04:27.543Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"epl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shipclojure.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-24T19:33:06.000Z","updated_at":"2025-05-06T20:03:23.000Z","dependencies_parsed_at":"2025-01-14T06:39:30.147Z","dependency_job_id":"2606fbd1-b5b4-4ada-b255-7f38965e546a","html_url":"https://github.com/shipclojure/voice-fn","commit_stats":null,"previous_names":["ovistoica/voice-fn","shipclojure/voice-fn"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shipclojure%2Fvoice-fn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shipclojure%2Fvoice-fn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shipclojure%2Fvoice-fn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shipclojure%2Fvoice-fn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shipclojure","download_url":"https://codeload.github.com/shipclojure/voice-fn/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252823920,"owners_count":21809713,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-09T16:17:57.857Z","updated_at":"2025-05-07T06:04:33.046Z","avatar_url":"https://github.com/shipclojure.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"# voice-fn - Real-time Voice AI Pipeline Framework\n\n`voice-fn` is a Clojure framework for building real-time voice AI applications using a data-driven, functional approach. Built on top of `clojure.core.async.flow`, it provides a composable pipeline architecture for processing audio, text, and AI interactions with built-in support for major AI providers.\n\nThis project's status is **_experimental_**. Expect breaking changes.\n\n[![Watch the video](https://img.youtube.com/vi/HwoGMhIx5w0/0.jpg)](https://youtu.be/HwoGMhIx5w0?t=345)\n\n## Core Features\n\n-   **Flow-Based Architecture:** Built on `core.async.flow` for robust concurrent processing\n-   **Data-First Design:** Define AI pipelines as data structures for easy configuration and modification\n-   **Streaming Architecture:** Efficient real-time audio and text processing\n-   **Extensible Processors:** Simple protocol-based system for adding new processing components\n-   **Flexible Frame System:** Type-safe message passing between pipeline components\n-   **Built-in Services:** Ready-to-use integrations with major AI providers\n\n\n## Quick Start: Local example\n\nFirst, create a `resources/secrets.edn`:\n\n```edn\n{:deepgram {:api-key \"\"}\n :elevenlabs {:api-key \"\"\n              :voice-id \"\"}\n :groq {:api-key \"\"}\n :openai {:new-api-sk \"\"}}\n```\n\nObtain the API keys from the respective providers and fill in the blank values.\n\nStart a REPL and evaluate the snippets in the `(comment ...)` blocks to start the flows.\nAllow Microphone access when prompted.\n\n```clojure\n(ns voice-fn-examples.local\n  (:require\n   [clojure.core.async :as a]\n   [clojure.core.async.flow :as flow]\n   [taoensso.telemere :as t]\n   [voice-fn.processors.deepgram :as asr]\n   [voice-fn.processors.elevenlabs :as tts]\n   [voice-fn.processors.llm-context-aggregator :as context]\n   [voice-fn.processors.openai :as llm]\n   [voice-fn.secrets :refer [secret]]\n   [voice-fn.transport :as transport]\n   [voice-fn.utils.core :as u]))\n\n(defn make-local-flow\n  \"This example showcases a voice AI agent for the local computer.  Audio is\n  usually encoded as PCM at 16kHz frequency (sample rate) and it is mono (1\n  channel).\n    \"\n  ([] (make-local-flow {}))\n  ([{:keys [llm-context extra-procs extra-conns encoding debug?\n            sample-rate language sample-size-bits channels chunk-duration-ms]\n     :or {llm-context {:messages [{:role \"system\"\n                                   :content \"You are a helpful assistant \"}]}\n          encoding :pcm-signed\n          sample-rate 16000\n          sample-size-bits 16\n          channels 1\n          chunk-duration-ms 20\n          language :en\n          debug? false\n          extra-procs {}\n          extra-conns []}}]\n\n   (flow/create-flow\n     {:procs\n      (u/deep-merge\n        {;; Capture audio from microphone and send raw-audio-input frames further in the pipeline\n         :transport-in {:proc transport/microphone-transport-in\n                        :args {:audio-in/sample-rate sample-rate\n                               :audio-in/channels channels\n                               :audio-in/sample-size-bits sample-size-bits}}\n         ;; raw-audio-input -\u003e transcription frames\n         :transcriptor {:proc asr/deepgram-processor\n                        :args {:transcription/api-key (secret [:deepgram :api-key])\n                               :transcription/interim-results? true\n                               :transcription/punctuate? false\n                               :transcription/vad-events? true\n                               :transcription/smart-format? true\n                               :transcription/model :nova-2\n                               :transcription/utterance-end-ms 1000\n                               :transcription/language language\n                               :transcription/encoding encoding\n                               :transcription/sample-rate sample-rate}}\n\n         ;; user transcription \u0026 llm message frames -\u003e llm-context frames\n         ;; responsible for keeping the full conversation history\n         :context-aggregator  {:proc context/context-aggregator\n                               :args {:llm/context llm-context\n                                      :aggregator/debug? debug?}}\n\n         ;; Takes llm-context frames and produces new llm-text-chunk \u0026 llm-tool-call-chunk frames\n         :llm {:proc llm/openai-llm-process\n               :args {:openai/api-key (secret [:openai :new-api-sk])\n                      :llm/model \"gpt-4o-mini\"}}\n\n         ;; llm-text-chunk \u0026 llm-tool-call-chunk -\u003e llm-context-messages-append frames\n         :assistant-context-assembler {:proc context/assistant-context-assembler\n                                       :args {:debug? debug?}}\n\n         ;; llm-text-chunk -\u003e sentence speak frames (faster for text to speech)\n         :llm-sentence-assembler {:proc context/llm-sentence-assembler}\n\n         ;; speak-frames -\u003e audio-output-raw frames\n         :tts {:proc tts/elevenlabs-tts-process\n               :args {:elevenlabs/api-key (secret [:elevenlabs :api-key])\n                      :elevenlabs/model-id \"eleven_flash_v2_5\"\n                      :elevenlabs/voice-id (secret [:elevenlabs :voice-id])\n                      :voice/stability 0.5\n                      :voice/similarity-boost 0.8\n                      :voice/use-speaker-boost? true\n                      :flow/language language\n                      :audio.out/encoding encoding\n                      :audio.out/sample-rate sample-rate}}\n\n         ;; audio-output-raw -\u003e smaller audio-output-raw frames (used for sending audio in realtime)\n         :audio-splitter {:proc transport/audio-splitter\n                          :args {:audio.out/sample-rate sample-rate\n                                 :audio.out/sample-size-bits sample-size-bits\n                                 :audio.out/channels channels\n                                 :audio.out/duration-ms chunk-duration-ms}}\n\n         ;; speakers out\n         :transport-out {:proc transport/realtime-speakers-out-processor\n                         :args {:audio.out/sample-rate sample-rate\n                                :audio.out/sample-size-bits sample-size-bits\n                                :audio.out/channels channels\n                                :audio.out/duration-ms chunk-duration-ms}}}\n        extra-procs)\n      :conns (concat\n               [[[:transport-in :out] [:transcriptor :in]]\n\n                [[:transcriptor :out] [:context-aggregator :in]]\n                [[:context-aggregator :out] [:llm :in]]\n\n                ;; Aggregate full context\n                [[:llm :out] [:assistant-context-assembler :in]]\n                [[:assistant-context-assembler :out] [:context-aggregator :in]]\n\n                ;; Assemble sentence by sentence for fast speech\n                [[:llm :out] [:llm-sentence-assembler :in]]\n                [[:llm-sentence-assembler :out] [:tts :in]]\n\n                [[:tts :out] [:audio-splitter :in]]\n                [[:audio-splitter :out] [:transport-out :in]]]\n               extra-conns)})))\n\n(def local-ai (make-local-flow))\n\n(comment\n\n  ;; Start local ai flow - starts paused\n  (let [{:keys [report-chan error-chan]} (flow/start local-ai)]\n    (a/go-loop []\n      (when-let [[msg c] (a/alts! [report-chan error-chan])]\n        (when (map? msg)\n          (t/log! {:level :debug :id (if (= c error-chan) :error :report)} msg))\n        (recur))))\n\n  ;; Resume local ai -\u003e you can now speak with the AI\n  (flow/resume local-ai)\n\n  ;; Stop the conversation\n  (flow/stop local-ai)\n\n  ,)\n```\n\nWhich roughly translates to:\n\n![Flow Diagram](./resources/flow.png)\n\n\nSee [examples](./examples/src/voice_fn_examples/) for more usages.\n\n\n## Supported Providers\n\n\n\n### Text-to-Speech (TTS)\n\n-   **ElevenLabs**\n    -   Models: `eleven_multilingual_v2`, `eleven_turbo_v2`, `eleven_flash_v2` and more.\n    -   Features: Real-time streaming, multiple voices, multilingual support\n\n\n### Speech-to-Text (STT)\n\n-   **Deepgram**\n    -   Models: `nova-2`, `nova-2-general`, `nova-2-meeting` and more.\n    -   Features: Real-time transcription, punctuation, smart formatting\n\n\n### Text Based Large Language Models (LLM)\n\n-   **OpenAI**\n    -   Models: `gpt-4o-mini`(fastest, cheapest),  `gpt-4`, `gpt-3.5-turbo` and more\n    -   Features: Function calling, streaming responses\n\n\n## Key Concepts\n\n### Flows\n\nThe core building block of voice-fn pipelines:\n\n-   Composed of processes connected by channels\n-   Processes can be:\n    -   Input/output handlers\n    -   AI service integrations\n    -   Data transformers\n-   Managed by `core.async.flow` for lifecycle control\n\n### Transport\n\nThe modality through which audio comes and goes from the voice ai pipeline. Example transport modalities:\n\n- local (microphone + speakers)\n- telephony (twilio through websocket)\n- webRTC (browser support) - TODO\n- async (through in \u0026 out core async channels)\n\nYou will see processors like `:transport-in` \u0026 `:transport-out`\n\n### Frames\n\nThe basic unit of data flow, representing typed messages like:\n\n-   `:audio/input-raw` - Raw audio data\n-   `:transcription/result` - Transcribed text\n-   `:llm/text-chunk` - LLM response chunks\n-   `:system/start`, `:system/stop` - Control signals\n\nEach frame has a type and optionally a schema for the data contained in it.\n\nSee [frame.clj](./src/voice_fn/frame.clj) for all possible frames.\n\n\n### Processes\n\nComponents that transform frames:\n\n-   Define input/output requirements\n-   Can maintain state\n-   Use core.async for async processing\n-   Implement the `flow/process` protocol\n\n\n## Adding Custom Processes\n\n```clojure\n    (defn custom-processor []\n      (flow/process\n        {:describe (fn [] {:ins {:in \"Input channel\"}\n                           :outs {:out \"Output channel\"}})\n         :init identity\n         :transform (fn [state in msg]\n                      [state {:out [(process-message msg)]}])}))\n```\n\n\nRead core.async.flow docs for more information about flow precesses.\n\n\n## Built With\n\n-   [core.async](\u003chttps://github.com/clojure/core.async\u003e) - Concurrent processing\n-   [core.async.flow](\u003chttps://clojure.github.io/core.async/#clojure.core.async.flow\u003e) - Flow control\n-   [Hato](\u003chttps://github.com/gnarroway/hato\u003e) - WebSocket support\n-   [Malli](\u003chttps://github.com/metosin/malli\u003e) - Schema validation\n\n\n## Acknowledgements\n\nVoice-fn takes heavy inspiration from [pipecat](https://github.com/pipecat-ai/pipecat). Differences:\n- voice-fn uses a graph instead of a bidirectional queue for frame transport\n- voice-fn has a data centric implementation. The processors in voice-fn are\n  pure functions in the `core.async.flow` transform syntax\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshipclojure%2Fvoice-fn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshipclojure%2Fvoice-fn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshipclojure%2Fvoice-fn/lists"}