{"id":15929518,"url":"https://github.com/johnsutor/llama-jarvis","last_synced_at":"2025-07-25T04:05:05.312Z","repository":{"id":257812714,"uuid":"866146970","full_name":"johnsutor/llama-jarvis","owner":"johnsutor","description":"Turn any LLM into Jarvis","archived":false,"fork":false,"pushed_at":"2024-10-06T18:01:36.000Z","size":1363,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-07T23:52:19.372Z","etag":null,"topics":["llama","llm","seamlessm4t","speech-to-speech","transformer","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/johnsutor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-01T18:16:30.000Z","updated_at":"2025-03-07T15:42:24.000Z","dependencies_parsed_at":null,"dependency_job_id":"a0b51e43-dae3-4667-b615-0356c7ae5a95","html_url":"https://github.com/johnsutor/llama-jarvis","commit_stats":{"total_commits":6,"total_committers":1,"mean_commits":6.0,"dds":0.0,"last_synced_commit":"7de19cee7b5fef1bb143151f5abc7955f2db7761"},"previous_names":["johnsutor/llama-jarvis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/johnsutor/llama-jarvis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsutor%2Fllama-jarvis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsutor%2Fllama-jarvis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsutor%2Fllama-jarvis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsutor%2Fllama-jarvis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/johnsutor","download_url":"https://codeload.github.com/johnsutor/llama-jarvis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsutor%2Fllama-jarvis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266952478,"owners_count":24011502,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-25T02:00:09.625Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llama","llm","seamlessm4t","speech-to-speech","transformer","transformers"],"created_at":"2024-10-07T00:04:23.411Z","updated_at":"2025-07-25T04:05:05.290Z","avatar_url":"https://github.com/johnsutor.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🦙🎤 Llama-Jarvis\n![Lint Status](https://github.com/johnsutor/llama-jarvis/workflows/Lint/badge.svg)\n![Tests Status](https://github.com/johnsutor/llama-jarvis/workflows/Test/badge.svg)\n![contributions welcome](https://img.shields.io/badge/contributions-welcome-blue.svg?style=flat)\n[![Python Versions](https://img.shields.io/pypi/pyversions/llama-jarvis)](https://pypi.org/project/llama-jarvis/)\n[![PyPi](https://img.shields.io/pypi/v/llama-jarvis)](https://pypi.org/project/llama-jarvis/)\n\n![Llama Omni](https://raw.githubusercontent.com/johnsutor/llama-jarvis/refs/heads/main/assets/llama.webp)\nTrain a speech-to-speech model using your own language model. Currently based on the [Seamless Model](https://huggingface.co/collections/facebook/seamless-communication-6568d486ef451c6ba62c7724), but plan to support more models in the future.\n\nThis model is based on speech-to-speech models such as [Llama-Omni](https://github.com/ictnlp/LLaMA-Omni). However, it aims to take advantage of the joint speech-text embeddings of the Seamless Model.\n\nThis code is very much a work in progress. Any and all contributions are welcome!  \n\n## Why this Library? \nThis library aims to make speech-to-speech models more compatible with the HuggingFace ecosystem, rather than requiring you to modify your models and datasets to work with a new library. This allows us to take advantage of things like the [HuggingFace Trainer](https://huggingface.co/docs/transformers/en/main_classes/trainer).\n\n## Getting Started\n**NOTE** For some of the below, you may have to first [log in to HuggingFace](https://huggingface.co/docs/huggingface_hub/main/package_reference/authentication) to gain access to the gated models (especially Llama models).  \n\n\n### Installation \n```shell\npip install llama-jarvis\n```\n\n### Install Locally \n```shell \ngit clone https://github.com/johnsutor/llama-jarvis\ncd llama-jarvis \npip install -e . \n```\n\n### Phase One Loss\nThe example code will return the phase one loss (i.e., when training the first phase of Llama-Omni) \n```py \nfrom llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor\n\nBASE_LLM = \"meta-llama/Llama-3.2-1B\"\nSEAMLESS_MODEL = \"facebook/hf-seamless-m4t-medium\"\nLANGUAGE = \"eng\"\n\njarvis_config = JarvisConfig(\n    BASE_LLM,\n    SEAMLESS_MODEL\n)\njarvis_model = JarvisModel(jarvis_config)\njarvis_processor = JarvisProcessor(\n    BASE_LLM,\n    SEAMLESS_MODEL\n)\n\ninputs = processor(\n    instruction=[\"You are a language model who should respond to my speech\"],\n    text=[\"What is two plus two?\"],\n    label=[\"Two plus two is four\"],\n    src_lang=LANGUAGE,\n    return_tensors=\"pt\",\n    padding=True\n)\n\noutputs = model.forward(\n    **inputs,\n    tgt_lang=LANGUAGE\n)\n\nprint(output.loss)\n```\n\n### Phase One Two\nThe example code will return the phase two loss (i.e., when training the second phase of Llama-Omni) \n```py \nfrom llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor\n\nBASE_LLM = \"meta-llama/Llama-3.2-1B\"\nSEAMLESS_MODEL = \"facebook/hf-seamless-m4t-medium\"\nLANGUAGE = \"eng\"\n\njarvis_config = JarvisConfig(\n    BASE_LLM,\n    SEAMLESS_MODEL\n)\njarvis_model = JarvisModel(jarvis_config)\njarvis_processor = JarvisProcessor(\n    BASE_LLM,\n    SEAMLESS_MODEL\n)\n\ninputs = processor(\n    instruction=[\"You are a language model who should respond to my speech\"],\n    text=[\"What is two plus two?\"],\n    label=[\"Two plus two is four\"],\n    src_lang=LANGUAGE,\n    return_tensors=\"pt\",\n    padding=True\n)\n\noutputs = model.forward(\n    **inputs,\n    tgt_lang=LANGUAGE,\n    train_phase=2\n)\n\nprint(output.loss)\n```\n\n## Roadmap\n- [x] Release the code on PyPi \n- [ ] Train a baseline model using Llama 3.2 1B and Seamless Medium\n- [ ] Provide training example code \n- [ ] Fully document the code \n- [ ] Create an inference script for the model\n- [ ] Write thorough tests for the code (~85% coverage), and test with a multitude of open-source models \n\n## Other Cool Libraries \nWe take a lot of inspiration from some other nice open-source libraries out there. Shoutout to \n- [SLAM-LLM](https://github.com/X-LANCE/SLAM-LLM?tab=readme-ov-file)\n- [CosyVoice](https://github.com/FunAudioLLM/CosyVoice)\n- [Llama-Omni](https://github.com/ictnlp/LLaMA-Omni?tab=readme-ov-file)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnsutor%2Fllama-jarvis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjohnsutor%2Fllama-jarvis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnsutor%2Fllama-jarvis/lists"}