Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-byte-llm
A curated list of papers and resources on byte-based large language models (LLMs) — models that operate directly on raw bytes.
https://github.com/zjysteven/awesome-byte-llm
Last synced: 3 days ago
JSON representation
-
Papers
- MambaByte: Token-free Selective State Space Model - free selective state space model (SSM) that enables efficient language modeling of byte-level sequences with a fixed-sized memory state, outperforming subword and byte-level Transformers on language modeling tasks while offering improved robustness, efficiency, and speculative decoding for faster inference.</details> |
- Bytes Are All You Need: Transformers Operating Directly On File Bytes - independent transformer architecture that operates directly on file bytes, eliminating the need for modality-specific processing, and demonstrates superior performance on classification tasks across images, audio, and mixed-modality data.</details> |
- MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers - pytorch) | <details>The paper introduces MEGABYTE, a multiscale Transformer architecture that segments sequences into patches, enabling efficient modeling of million-byte sequences with sub-quadratic self-attention, enhanced feedforward computation, and improved decoding parallelism, achieving competitive performance on tasks like long-context language modeling, image generation, and audio modeling.</details> |
- ByT5: Towards a token-free future with pre-trained byte-to-byte models - research/byt5) | <details>The paper presents ByT5, a token-free byte-to-byte variant of the T5 Transformer that processes raw UTF-8 byte sequences without tokenization, achieving competitive performance across multilingual NLP tasks, offering robustness to noise, and demonstrating improved efficiency, especially in low-resource or multilingual contexts.</details> |
- Byte Latent Transformer: Patches Scale Better Than Tokens - level large language model (LLM) that dynamically groups bytes into patches, improving inference efficiency, robustness, and scalability beyond tokenization-based models.</details> |
- Beyond Language Models: Byte Models are Digital World Simulators - byte prediction model that operates directly on binary data to simulate diverse digital world processes, achieving state-of-the-art performance in modalities like text, audio, images, symbolic music conversion, and even CPU behavior simulation with over 99.99% accuracy.</details> |
- Bytes Are All You Need: Transformers Operating Directly On File Bytes - independent transformer architecture that operates directly on file bytes, eliminating the need for modality-specific processing, and demonstrates superior performance on classification tasks across images, audio, and mixed-modality data.</details> |
- MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers - pytorch) | <details>The paper introduces MEGABYTE, a multiscale Transformer architecture that segments sequences into patches, enabling efficient modeling of million-byte sequences with sub-quadratic self-attention, enhanced feedforward computation, and improved decoding parallelism, achieving competitive performance on tasks like long-context language modeling, image generation, and audio modeling.</details> |
- ByT5: Towards a token-free future with pre-trained byte-to-byte models - research/byt5) | <details>The paper presents ByT5, a token-free byte-to-byte variant of the T5 Transformer that processes raw UTF-8 byte sequences without tokenization, achieving competitive performance across multilingual NLP tasks, offering robustness to noise, and demonstrating improved efficiency, especially in low-resource or multilingual contexts.</details> |
- Byte Latent Transformer: Patches Scale Better Than Tokens - level large language model (LLM) that dynamically groups bytes into patches, improving inference efficiency, robustness, and scalability beyond tokenization-based models.</details> |
- Beyond Language Models: Byte Models are Digital World Simulators - byte prediction model that operates directly on binary data to simulate diverse digital world processes, achieving state-of-the-art performance in modalities like text, audio, images, symbolic music conversion, and even CPU behavior simulation with over 99.99% accuracy.</details> |
- MambaByte: Token-free Selective State Space Model - free selective state space model (SSM) that enables efficient language modeling of byte-level sequences with a fixed-sized memory state, outperforming subword and byte-level Transformers on language modeling tasks while offering improved robustness, efficiency, and speculative decoding for faster inference.</details> |
- CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation - research/language/tree/master/language/canine) | <details>The paper introduces CANINE, a tokenization-free neural encoder that directly processes character sequences using a downsampling-transformer-upsampling architecture, achieving competitive multilingual performance on tasks like TYDI QA and NER while offering efficiency and robustness advantages over traditional subword-based models.</details> |
Categories
Sub Categories