{"id":13415244,"url":"https://github.com/facebookresearch/audiocraft","last_synced_at":"2025-05-12T22:29:11.492Z","repository":{"id":173853942,"uuid":"650945129","full_name":"facebookresearch/audiocraft","owner":"facebookresearch","description":"Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.","archived":false,"fork":false,"pushed_at":"2025-03-13T16:07:04.000Z","size":25329,"stargazers_count":21929,"open_issues_count":334,"forks_count":2316,"subscribers_count":208,"default_branch":"main","last_synced_at":"2025-05-05T17:21:13.036Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-08T06:41:36.000Z","updated_at":"2025-05-05T11:08:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"aa18f94f-5e23-45ca-b2c6-33f0cfc6e1ae","html_url":"https://github.com/facebookresearch/audiocraft","commit_stats":{"total_commits":138,"total_committers":33,"mean_commits":4.181818181818182,"dds":0.5434782608695652,"last_synced_commit":"72cb16f9fb239e9cf03f7bd997198c7d7a67a01c"},"previous_names":["facebookresearch/audiocraft"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Faudiocraft","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Faudiocraft/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Faudiocraft/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Faudiocraft/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/audiocraft/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253832771,"owners_count":21971324,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T21:00:45.817Z","updated_at":"2025-05-12T22:29:11.452Z","avatar_url":"https://github.com/facebookresearch.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","💎 Official AudioCraft Repository","HarmonyOS","Python","📚 Learning \u0026 Resources","\u003cspan id=\"music\"\u003eMusic\u003c/span\u003e","Image Segmentation","GitHub projects","语音合成","Open-Source Music Generation Landscape","Music AI \u0026 Machine Learning","Summary","Music Generation","AI **工具**","🎨 Creative AI","Repos","Voice \u0026 Multimodal (local) (16)","模型","排行榜 [2025-03-18]","Audio \u0026 Music","🎤 Speech \u0026 Audio","Key Implementation Libraries","二、开源库（按数据类型分类，附实用场景）","Music Generation \u0026 AI","2. Open Foundation Models"],"sub_categories":["Windows Manager","\u003cspan id=\"tool\"\u003eLLM (LLM \u0026 Tool)\u003c/span\u003e","Creative Uses of Generative AI Image Synthesis Tools","网络服务_其他","LoRA Adapters and Quantized Models","音频","Music and Audio","音乐声音","3. 音频（音乐、语音生成）","Text-to-Speech"],"readme":"# AudioCraft\n![docs badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_docs/badge.svg)\n![linter badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_linter/badge.svg)\n![tests badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_tests/badge.svg)\n\nAudioCraft is a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code\nfor two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen.\n\n\n## Installation\nAudioCraft requires Python 3.9, PyTorch 2.1.0. To install AudioCraft, you can run the following:\n\n```shell\n# Best to make sure you have torch installed first, in particular before installing xformers.\n# Don't run this if you already have PyTorch installed.\npython -m pip install 'torch==2.1.0'\n# You might need the following before trying to install the packages\npython -m pip install setuptools wheel\n# Then proceed to one of the following\npython -m pip install -U audiocraft  # stable release\npython -m pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft  # bleeding edge\npython -m pip install -e .  # or if you cloned the repo locally (mandatory if you want to train).\npython -m pip install -e '.[wm]'  # if you want to train a watermarking model\n```\n\nWe also recommend having `ffmpeg` installed, either through your system or Anaconda:\n```bash\nsudo apt-get install ffmpeg\n# Or if you are using Anaconda or Miniconda\nconda install \"ffmpeg\u003c5\" -c conda-forge\n```\n\n## Models\n\nAt the moment, AudioCraft contains the training code and inference code for:\n* [MusicGen](./docs/MUSICGEN.md): A state-of-the-art controllable text-to-music model.\n* [AudioGen](./docs/AUDIOGEN.md): A state-of-the-art text-to-sound model.\n* [EnCodec](./docs/ENCODEC.md): A state-of-the-art high fidelity neural audio codec.\n* [Multi Band Diffusion](./docs/MBD.md): An EnCodec compatible decoder using diffusion.\n* [MAGNeT](./docs/MAGNET.md): A state-of-the-art non-autoregressive model for text-to-music and text-to-sound.\n* [AudioSeal](./docs/WATERMARKING.md): A state-of-the-art audio watermarking.\n* [MusicGen Style](./docs/MUSICGEN_STYLE.md): A state-of-the-art text-and-style-to-music model.\n* [JASCO](./docs/JASCO.md): \"High quality text-to-music model conditioned on chords, melodies and drum tracks\"\n\n\n## Training code\n\nAudioCraft contains PyTorch components for deep learning research in audio and training pipelines for the developed models.\nFor a general introduction of AudioCraft design principles and instructions to develop your own training pipeline, refer to\nthe [AudioCraft training documentation](./docs/TRAINING.md).\n\nFor reproducing existing work and using the developed training pipelines, refer to the instructions for each specific model\nthat provides pointers to configuration, example grids and model/task-specific information and FAQ.\n\n\n## API documentation\n\nWe provide some [API documentation](https://facebookresearch.github.io/audiocraft/api_docs/audiocraft/index.html) for AudioCraft.\n\n\n## FAQ\n\n#### Is the training code available?\n\nYes! We provide the training code for [EnCodec](./docs/ENCODEC.md), [MusicGen](./docs/MUSICGEN.md),[Multi Band Diffusion](./docs/MBD.md) and [JASCO](./docs/JASCO.md).\n\n#### Where are the models stored?\n\nHugging Face stored the model in a specific location, which can be overridden by setting the `AUDIOCRAFT_CACHE_DIR` environment variable for the AudioCraft models.\nIn order to change the cache location of the other Hugging Face models, please check out the [Hugging Face Transformers documentation for the cache setup](https://huggingface.co/docs/transformers/installation#cache-setup).\nFinally, if you use a model that relies on Demucs (e.g. `musicgen-melody`) and want to change the download location for Demucs, refer to the [Torch Hub documentation](https://pytorch.org/docs/stable/hub.html#where-are-my-downloaded-models-saved).\n\n\n## License\n* The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).\n* The models weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights).\n\n\n## Citation\n\nFor the general framework of AudioCraft, please cite the following.\n```\n@inproceedings{copet2023simple,\n    title={Simple and Controllable Music Generation},\n    author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},\n    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},\n    year={2023},\n}\n```\n\nWhen referring to a specific model, please cite as mentioned in the model specific README, e.g\n[./docs/MUSICGEN.md](./docs/MUSICGEN.md), [./docs/AUDIOGEN.md](./docs/AUDIOGEN.md), etc.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Faudiocraft","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2Faudiocraft","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Faudiocraft/lists"}