{"id":19156160,"url":"https://github.com/kyegomez/usm","last_synced_at":"2025-04-15T07:07:02.835Z","repository":{"id":211695473,"uuid":"729604444","full_name":"kyegomez/USM","owner":"kyegomez","description":"Implementation of Google's USM speech model in Pytorch","archived":false,"fork":false,"pushed_at":"2025-04-06T12:52:34.000Z","size":2298,"stargazers_count":30,"open_issues_count":0,"forks_count":5,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-15T07:06:58.397Z","etag":null,"topics":["ai","artificial-intelligence","dall3","deep-learning","gpt4","gpt4all","machine-learning","neural-networks"],"latest_commit_sha":null,"homepage":"https://discord.gg/GYbXvDGevY","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kyegomez.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["kyegomez"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2023-12-09T18:51:49.000Z","updated_at":"2025-03-04T14:46:31.000Z","dependencies_parsed_at":"2024-11-16T11:05:51.155Z","dependency_job_id":"9e618db5-edf1-4bd0-9700-659f204f1575","html_url":"https://github.com/kyegomez/USM","commit_stats":null,"previous_names":["kyegomez/usm"],"tags_count":0,"template":false,"template_full_name":"kyegomez/Python-Package-Template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FUSM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FUSM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FUSM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FUSM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kyegomez","download_url":"https://codeload.github.com/kyegomez/USM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249023700,"owners_count":21199958,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","artificial-intelligence","dall3","deep-learning","gpt4","gpt4all","machine-learning","neural-networks"],"created_at":"2024-11-09T08:33:26.759Z","updated_at":"2025-04-15T07:07:02.808Z","avatar_url":"https://github.com/kyegomez.png","language":"Python","funding_links":["https://github.com/sponsors/kyegomez"],"categories":[],"sub_categories":[],"readme":"[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# USM\nImplementation of Google's universal speech model from the paper: [Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages](https://arxiv.org/pdf/2303.01037.pdf)\nI'm implementing this mostly because Gemini the all-new multi-modality foundation model from google uses it! [Check out our Gemini implementation here:](https://github.com/kyegomez/Gemini)\n\n\n# Install\n`pip install usm-torch`\n\n\n## Usage\n```python\nimport torch\nfrom usm_torch import USMEncoder\n\n# Initialize model\nmodel = USMEncoder(\n    dim=80,  # Dimension of the input\n    heads=4,  # Number of attention heads\n    ff_dim=128,  # Dimension of the feed-forward layer\n    depth=4,  # Number of transformer layers\n    depthwise_conv_kernel_size=31,  # Kernel size for depthwise convolution\n    dropout=0.5,  # Dropout rate\n)\n\n# Example input\nbatch_size = 10  # Number of samples in a batch\nmax_length = 400  # Maximum length of the input sequence\nlengths = torch.randint(1, max_length, (batch_size,))  # Randomly generate sequence lengths\ninputs = torch.rand(batch_size, int(lengths.max()), 80)  # Randomly generate input tensor\n\n# Forward pass\noutputs, output_lengths = model(inputs, lengths)  # Perform forward pass\nprint(f\"outputs.shape: {outputs.shape}\")  # Print the shape of the output tensor\nprint(f\"output_lengths.shape: {output_lengths.shape}\")  # Print the shape of the output lengths tensor\n\n\n```\n\n# License\nMIT\n\n# Citation\n```bibtex\n@misc{zhang2023google,\n    title={Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages}, \n    author={Yu Zhang and Wei Han and James Qin and Yongqiang Wang and Ankur Bapna and Zhehuai Chen and Nanxin Chen and Bo Li and Vera Axelrod and Gary Wang and Zhong Meng and Ke Hu and Andrew Rosenberg and Rohit Prabhavalkar and Daniel S. Park and Parisa Haghani and Jason Riesa and Ginger Perng and Hagen Soltau and Trevor Strohman and Bhuvana Ramabhadran and Tara Sainath and Pedro Moreno and Chung-Cheng Chiu and Johan Schalkwyk and Françoise Beaufays and Yonghui Wu},\n    year={2023},\n    eprint={2303.01037},\n    archivePrefix={arXiv},\n    primaryClass={cs.CL}\n}\n\n```\n\n\n## Todo\n- [ ] Implement the proj -\u003e cosine similarity -\u003e codebook\n- [ ] Implement chunk wise attention\n- [ ] Implement on paired input, with the text encoder: embed extractor -\u003e resampler -\u003e refiner -\u003e text embedding, RNN-T reconstruction loss\n- [ ] Text input: text input -\u003e speech encoder -\u003e text decoder -\u003e rnn-t reconstruction\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyegomez%2Fusm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkyegomez%2Fusm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyegomez%2Fusm/lists"}