{"id":13524592,"url":"https://github.com/google/lyra","last_synced_at":"2025-04-10T10:49:32.507Z","repository":{"id":37746942,"uuid":"349522340","full_name":"google/lyra","owner":"google","description":"A Very Low-Bitrate Codec for Speech Compression","archived":false,"fork":false,"pushed_at":"2024-08-20T04:55:24.000Z","size":20340,"stargazers_count":3856,"open_issues_count":85,"forks_count":359,"subscribers_count":112,"default_branch":"main","last_synced_at":"2025-04-03T06:07:20.953Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-19T18:38:34.000Z","updated_at":"2025-03-31T05:14:21.000Z","dependencies_parsed_at":"2024-10-15T04:40:50.338Z","dependency_job_id":"30b306c6-d7e3-459c-886e-1a4249152199","html_url":"https://github.com/google/lyra","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Flyra","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Flyra/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Flyra/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Flyra/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google","download_url":"https://codeload.github.com/google/lyra/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248202599,"owners_count":21064379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T06:01:11.518Z","updated_at":"2025-04-10T10:49:32.490Z","avatar_url":"https://github.com/google.png","language":"C++","readme":"# Lyra: a generative low bitrate speech codec\n\n## What is Lyra?\n\n[Lyra](https://ai.googleblog.com/2021/08/soundstream-end-to-end-neural-audio.html)\nis a high-quality, low-bitrate speech codec that makes voice communication\navailable even on the slowest networks. To do this it applies traditional codec\ntechniques while leveraging advances in machine learning (ML) with models\ntrained on thousands of hours of data to create a novel method for compressing\nand transmitting voice signals.\n\n### Overview\n\nThe basic architecture of the Lyra codec is quite simple. Features are extracted\nfrom speech every 20ms and are then compressed for transmission at a desired\nbitrate between 3.2kbps and 9.2kbps. On the other end, a generative model uses\nthose features to recreate the speech signal.\n\nLyra harnesses the power of new natural-sounding generative models to maintain\nthe low bitrate of parametric codecs while achieving high quality, on par with\nstate-of-the-art waveform codecs used in most streaming and communication\nplatforms today.\n\nComputational complexity is reduced by using a cheaper convolutional generative\nmodel called SoundStream, which enables Lyra to not only run on cloud servers,\nbut also on-device on low-end phones in real time (with a processing latency of\n20ms). This whole system is then trained end-to-end on thousands of hours of\nspeech data with speakers in over 90 languages and optimized to accurately\nrecreate the input audio.\n\nLyra is supported on Android, Linux, Mac and Windows.\n\n## Prerequisites\n\nThere are a few things you'll need to do to set up your computer to build Lyra.\n\n### Common setup\n\nLyra is built using Google's build system, Bazel. Install it following these\n[instructions](https://docs.bazel.build/versions/master/install.html). Bazel\nverson 5.0.0 is required, and some Linux distributions may make an older version\navailable in their application repositories, so make sure you are using the\nrequired version or newer. The latest version can be downloaded via\n[Github](https://github.com/bazelbuild/bazel/releases).\n\nYou will also need python3 and numpy installed.\n\nLyra can be built from Linux using Bazel for an ARM Android target, or a Linux\ntarget, as well as Mac and Windows for native targets.\n\n### Android requirements\n\nBuilding on android requires downloading a specific version of the android NDK\ntoolchain. If you develop with Android Studio already, you might not need to do\nthese steps if ANDROID_HOME and ANDROID_NDK_HOME are defined and pointing at the\nright version of the NDK.\n\n1.  Download command line tools from https://developer.android.com/studio\n2.  Unzip and cd to the directory\n3.  Check the available packages to install in case they don't match the\n    following steps.\n\n    ```shell\n    bin/sdkmanager  --sdk_root=$HOME/android/sdk --list\n    ```\n\n    Some systems will already have the java runtime set up. But if you see an\n    error here like `ERROR: JAVA_HOME is not set and no 'java' command could be\n    found on your PATH.`, this means you need to install the java runtime with\n    `sudo apt install default-jdk` first. You will also need to add `export\n    JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64` (type `ls /usr/lib/jvm` to see\n    which path was installed) to your $HOME/.bashrc and reload it with `source\n    $HOME/.bashrc`.\n\n4.  Install the r21 ndk, android sdk 30, and build tools:\n\n    ```shell\n    bin/sdkmanager  --sdk_root=$HOME/android/sdk --install  \"platforms;android-30\" \"build-tools;30.0.3\" \"ndk;21.4.7075529\"\n    ```\n\n5.  Add the following to .bashrc (or export the variables)\n\n    ```shell\n    export ANDROID_NDK_HOME=$HOME/android/sdk/ndk/21.4.7075529\n    export ANDROID_HOME=$HOME/android/sdk\n    ```\n\n6.  Reload .bashrc (with `source $HOME/.bashrc`)\n\n## Building\n\nThe building and running process differs slightly depending on the selected\nplatform.\n\n### Building for Linux\n\nYou can build the cc_binaries with the default config. `encoder_main` is an\nexample of a file encoder.\n\n```shell\nbazel build -c opt lyra/cli_example:encoder_main\n```\n\nYou can run `encoder_main` to encode a test .wav file with some speech in it,\nspecified by `--input_path`. The `--output_dir` specifies where to write the\nencoded (compressed) representation, and the desired bitrate can be specified\nusing the `--bitrate` flag.\n\n```shell\nbazel-bin/lyra/cli_example/encoder_main --input_path=lyra/testdata/sample1_16kHz.wav --output_dir=$HOME/temp --bitrate=3200\n```\n\nSimilarly, you can build decoder_main and use it on the output of encoder_main\nto decode the encoded data back into speech.\n\n```shell\nbazel build -c opt lyra/cli_example:decoder_main\nbazel-bin/lyra/cli_example/decoder_main --encoded_path=$HOME/temp/sample1_16kHz.lyra --output_dir=$HOME/temp/ --bitrate=3200\n```\n\nNote: the default Bazel toolchain is automatically configured and likely uses\ngcc/libstdc++ on Linux. This should be satisfactory for most users, but will\ndiffer from the NDK toolchain, which uses clang/libc++. To use a custom clang\ntoolchain on Linux, see toolchain/README.md and .bazelrc.\n\n### Building for Android\n\n#### Android App\n\nThere is an example APK target called `lyra_android_example` that you can build\nafter you have set up the NDK.\n\nThis example is an app with a minimal GUI that has buttons for two options. One\noption is to record from the microphone and encode/decode with Lyra so you can\ntest what Lyra would sound like for your voice. The other option runs a\nbenchmark that encodes and decodes in the background and prints the timings to\nlogcat.\n\n```shell\nbazel build -c opt lyra/android_example:lyra_android_example --config=android_arm64 --copt=-DBENCHMARK\nadb install bazel-bin/lyra/android_example/lyra_android_example.apk\n```\n\nAfter this you should see an app called \"Lyra Example App\".\n\nYou can open it, and you will see a simple TextView that says the benchmark is\nrunning, and when it finishes.\n\nPress \"Record from microphone\", say a few words, and then press \"Encode and\ndecode to speaker\". You should hear your voice being played back after being\ncoded with Lyra.\n\nIf you press 'Benchmark', you should see something like the following in logcat\non a Pixel 6 Pro when running the benchmark:\n\n```shell\nlyra_benchmark:  feature_extractor:  max: 1.836 ms  min: 0.132 ms  mean: 0.153 ms  stdev: 0.042 ms\nlyra_benchmark: quantizer_quantize:  max: 1.042 ms  min: 0.120 ms  mean: 0.130 ms  stdev: 0.028 ms\nlyra_benchmark:   quantizer_decode:  max: 0.103 ms  min: 0.026 ms  mean: 0.029 ms  stdev: 0.003 ms\nlyra_benchmark:       model_decode:  max: 0.820 ms  min: 0.191 ms  mean: 0.212 ms  stdev: 0.031 ms\nlyra_benchmark:              total:  max: 2.536 ms  min: 0.471 ms  mean: 0.525 ms  stdev: 0.088 ms\n```\n\nThis shows that decoding a 50Hz frame (each frame is 20 milliseconds) takes\n0.525 milliseconds on average. So decoding is performed at around 38 (20/0.525)\ntimes faster than realtime.\n\nTo build your own android app, you can either use the cc_library target outputs\nto create a .so that you can use in your own build system. Or you can use it\nwith an\n[`android_binary`](https://docs.bazel.build/versions/master/be/android.html)\nrule within bazel to create an .apk file as in this example.\n\nThere is a tutorial on building for android with Bazel in the\n[bazel docs](https://docs.bazel.build/versions/master/android-ndk.html).\n\n#### Android command-line binaries\n\nThere are also the binary targets that you can use to experiment with encoding\nand decoding .wav files.\n\nYou can build the example cc_binary targets with:\n\n```shell\nbazel build -c opt lyra/cli_example:encoder_main --config=android_arm64\nbazel build -c opt lyra/cli_example:decoder_main --config=android_arm64\n```\n\nThis builds an executable binary that can be run on android 64-bit arm devices\n(not an android app). You can then push it to your android device and run it as\na binary through the shell.\n\n```shell\n# Push the binary and the data it needs, including the model and .wav files:\nadb push bazel-bin/lyra/cli_example/encoder_main /data/local/tmp/\nadb push bazel-bin/lyra/cli_example/decoder_main /data/local/tmp/\nadb push lyra/model_coeffs/ /data/local/tmp/\nadb push lyra/testdata/ /data/local/tmp/\n\nadb shell\ncd /data/local/tmp\n./encoder_main --model_path=/data/local/tmp/model_coeffs --output_dir=/data/local/tmp --input_path=testdata/sample1_16kHz.wav\n./decoder_main --model_path=/data/local/tmp/model_coeffs --output_dir=/data/local/tmp --encoded_path=sample1_16kHz.lyra\n```\n\nThe encoder_main/decoder_main as above should also work.\n\n### Building for Mac\n\nYou will need to install the XCode command line tools in addition to the\nprerequisites common to all platforms. XCode setup is a required step for using\nBazel on Mac. See this [guide](https://bazel.build/install/os-x) for how to\ninstall XCode command line tools. Lyra has been built successfully using XCode\n13.3.\n\nYou can follow the instructions in the [Building for Linux](#building-for-linux)\nsection once this is completed.\n\n### Building for Windows\n\nYou will need to install Build Tools for Visual Studio 2019 in addition to the\nprerequisites common to all platforms. Visual Studio setup is a required step\nfor building C++ for Bazel on Windows. See this\n[guide](https://bazel.build/install/windows) for how to install MSVC. You may\nalso need to install python 3 support, which is also described in the guide.\n\nYou can follow the instructions in the [Building for Linux](#building-for-linux)\nsection once this is completed.\n\n## API\n\nFor integrating Lyra into any project only two APIs are relevant:\n[LyraEncoder](lyra/lyra_encoder.h) and [LyraDecoder](lyra/lyra_decoder.h).\n\n\u003e DISCLAIMER: At this time Lyra's API and bit-stream are **not** guaranteed to\n\u003e be stable and might change in future versions of the code.\n\nOn the sending side, `LyraEncoder` can be used to encode an audio stream using\nthe following interface:\n\n```cpp\nclass LyraEncoder : public LyraEncoderInterface {\n public:\n  static std::unique_ptr\u003cLyraEncoder\u003e Create(\n      int sample_rate_hz, int num_channels, int bitrate, bool enable_dtx,\n      const ghc::filesystem::path\u0026 model_path);\n\n  std::optional\u003cstd::vector\u003cuint8_t\u003e\u003e Encode(\n      const absl::Span\u003cconst int16_t\u003e audio) override;\n\n  bool set_bitrate(int bitrate) override;\n\n  int sample_rate_hz() const override;\n\n  int num_channels() const override;\n\n  int bitrate() const override;\n\n  int frame_rate() const override;\n};\n```\n\nThe static `Create` method instantiates a `LyraEncoder` with the desired sample\nrate in Hertz, number of channels and bitrate, as long as those parameters are\nsupported (see `lyra_encoder.h` for supported parameters). Otherwise it returns\na nullptr. The `Create` method also needs to know if DTX should be enabled and\nwhere the model weights are stored. It also checks that these weights exist and\nare compatible with the current Lyra version.\n\nGiven a `LyraEncoder`, any audio stream can be compressed using the `Encode`\nmethod. The provided span of int16-formatted samples is assumed to contain 20ms\nof data at the sample rate chosen at `Create` time. As long as this condition is\nmet the `Encode` method returns the encoded packet as a vector of bytes that is\nready to be stored or transmitted over the network.\n\nThe bitrate can be dynamically modified using the `set_bitrate` setter. It\nreturns true if the desired bitrate is supported and correctly set.\n\nThe rest of the `LyraEncoder` methods are just getters for the different\npredetermined parameters.\n\nOn the receiving end, `LyraDecoder` can be used to decode the encoded packet\nusing the following interface:\n\n```cpp\nclass LyraDecoder : public LyraDecoderInterface {\n public:\n  static std::unique_ptr\u003cLyraDecoder\u003e Create(\n      int sample_rate_hz, int num_channels,\n      const ghc::filesystem::path\u0026 model_path);\n\n  bool SetEncodedPacket(absl::Span\u003cconst uint8_t\u003e encoded) override;\n\n  std::optional\u003cstd::vector\u003cint16_t\u003e\u003e DecodeSamples(int num_samples) override;\n\n  int sample_rate_hz() const override;\n\n  int num_channels() const override;\n\n  int frame_rate() const override;\n\n  bool is_comfort_noise() const override;\n};\n```\n\nOnce again, the static `Create` method instantiates a `LyraDecoder` with the\ndesired sample rate in Hertz and number of channels, as long as those parameters\nare supported. Else it returns a `nullptr`. These parameters don't need to be\nthe same as the ones in `LyraEncoder`. And once again, the `Create` method also\nneeds to know where the model weights are stored. It also checks that these\nweights exist and are compatible with the current Lyra version.\n\nGiven a `LyraDecoder`, any packet can be decoded by first feeding it into\n`SetEncodedPacket`, which returns true if the provided span of bytes is a valid\nLyra-encoded packet.\n\nThen the int16-formatted samples can be obtained by calling `DecodeSamples`. If\nthere isn't a packet available, but samples still need to be generated, the\ndecoder might switch to a comfort noise generation mode, which can be checked\nusing `is_comfort_noise`.\n\nThe rest of the `LyraDecoder` methods are just getters for the different\npredetermined parameters.\n\nFor an example on how to use `LyraEncoder` and `LyraDecoder` to encode and\ndecode a stream of audio, please refer to the\n[integration test](lyra/lyra_integration_test.cc).\n\n## License\n\nUse of this source code is governed by a Apache v2.0 license that can be found\nin the LICENSE file.\n\n## Papers\n\n1.  Kleijn, W. B., Lim, F. S., Luebs, A., Skoglund, J., Stimberg, F., Wang, Q.,\n    \u0026 Walters, T. C. (2018, April).\n    [Wavenet based low rate speech coding](https://arxiv.org/pdf/1712.01120). In\n    2018 IEEE international conference on acoustics, speech and signal\n    processing (ICASSP) (pp. 676-680). IEEE.\n2.  Denton, T., Luebs, A., Chinen, M., Lim, F. S., Storus, A., Yeh, H., Kleijn,\n    W. B., \u0026 Skoglund, J. (2020, November).\n    [Handling Background Noise in Neural Speech Generation](https://arxiv.org/pdf/2102.11906).\n    In 2020 54th Asilomar Conference on Signals, Systems, and Computers (pp.\n    667-671). IEEE.\n3.  Kleijn, W. B., Storus, A., Chinen, M., Denton, T., Lim, F. S., Luebs, A.,\n    Skoglund, J., \u0026 Yeh, H. (2021, June).\n    [Generative speech coding with predictive variance regularization](https://arxiv.org/pdf/2102.09660).\n    In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and\n    Signal Processing (ICASSP) (pp. 6478-6482). IEEE.\n4.  Zeghidour, N., Luebs, A., Omran, A., Skoglund, J., \u0026 Tagliasacchi, M.\n    (2021).\n    [SoundStream: An end-to-end neural audio codec](https://arxiv.org/pdf/2107.03312).\n    IEEE/ACM Transactions on Audio, Speech, and Language Processing.\n","funding_links":[],"categories":["C++","语音识别与合成_其他"],"sub_categories":["网络服务_其他"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle%2Flyra","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle%2Flyra","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle%2Flyra/lists"}