{"id":15601015,"url":"https://github.com/lucidrains/rvq-vae-gpt","last_synced_at":"2025-04-05T21:08:20.769Z","repository":{"id":96486596,"uuid":"595200545","full_name":"lucidrains/rvq-vae-gpt","owner":"lucidrains","description":"My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation","archived":false,"fork":false,"pushed_at":"2024-10-11T16:22:29.000Z","size":35738,"stargazers_count":85,"open_issues_count":1,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-29T20:09:24.280Z","etag":null,"topics":["artificial-intelligence","deep-learning","gpt","learned-tokenization","text-autoencoder"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucidrains.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-30T15:53:34.000Z","updated_at":"2025-03-26T23:31:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"3aafa522-91e7-45e3-8594-a7b7e68c50f3","html_url":"https://github.com/lucidrains/rvq-vae-gpt","commit_stats":{"total_commits":29,"total_committers":1,"mean_commits":29.0,"dds":0.0,"last_synced_commit":"1c885c7daba401e1ac98928a81852a4adf6aee10"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Frvq-vae-gpt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Frvq-vae-gpt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Frvq-vae-gpt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Frvq-vae-gpt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucidrains","download_url":"https://codeload.github.com/lucidrains/rvq-vae-gpt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247399877,"owners_count":20932876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","deep-learning","gpt","learned-tokenization","text-autoencoder"],"created_at":"2024-10-03T02:11:42.571Z","updated_at":"2025-04-05T21:08:20.749Z","avatar_url":"https://github.com/lucidrains.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## RVQ-VAE-GPT - Residual Vector Quantize VAE - GPT (wip)\n\nMy attempts at applying \u003ca href=\"https://github.com/lucidrains/audiolm-pytorch/blob/main/audiolm_pytorch/soundstream.py\"\u003eSoundstream\u003c/a\u003e design on learned tokenization of text and then applying a \u003ca href=\"https://github.com/lucidrains/RQ-Transformer/blob/main/rq_transformer/hierarchical_causal_transformer.py\"\u003ehierarchical transformer\u003c/a\u003e to text generation.\n\nThe Soundstream will be modified to use all local attention. Experiments will compare VQ, RVQ, and also multi-headed VQ\n\nWas told by a researcher friend this will likely fail 😂😂 but I will try it anyways, yolo. In the case it does not work, maybe it can still be useful for genomics. Come to think of it, why shouldn't it be able to at least learn bigrams (for english) and codons (for genomics)? Why don't we have \u003ca href=\"https://www.nature.com/articles/s41562-022-01516-2\"\u003ehierarchical predictive coding\u003c/a\u003e? We should\n\nUpdate: \u003ca href=\"https://api.wandb.ai/links/lucidrains/kpdfhad9\"\u003eSome live experiments\u003c/a\u003e\n\n## Todo\n\n- [ ] add a diff in the autoencoder training between input and reconstructed, so one can examine the failure cases easily\n\n## Citations\n\n```bibtex\n@misc{https://doi.org/10.48550/arxiv.2107.03312,\n  title  = {SoundStream: An End-to-End Neural Audio Codec},\n  author = {Zeghidour, Neil and Luebs, Alejandro and Omran, Ahmed and Skoglund, Jan and Tagliasacchi, Marco},\n  publisher = {arXiv},\n  url    = {https://arxiv.org/abs/2107.03312},\n  year   = {2021}\n}\n```\n\n```bibtex\n@unknown{unknown,\n    author  = {Lee, Doyup and Kim, Chiheon and Kim, Saehoon and Cho, Minsu and Han, Wook-Shin},\n    year    = {2022},\n    month   = {03},\n    title   = {Autoregressive Image Generation using Residual Quantization}\n}\n```\n\n```bibtex\n@article{Sunkara2022NoMS,\n    title   = {No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects},\n    author  = {Raja Sunkara and Tie Luo},\n    journal = {ArXiv},\n    year    = {2022},\n    volume  = {abs/2208.03641}\n}\n```\n\n```bibtex\n@inproceedings{Fifty2024RestructuringVQ,\n    title   = {Restructuring Vector Quantization with the Rotation Trick},\n    author  = {Christopher Fifty and Ronald G. Junkins and Dennis Duan and Aniketh Iger and Jerry W. Liu and Ehsan Amid and Sebastian Thrun and Christopher R'e},\n    year    = {2024},\n    url     = {https://api.semanticscholar.org/CorpusID:273229218}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Frvq-vae-gpt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucidrains%2Frvq-vae-gpt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Frvq-vae-gpt/lists"}