https://github.com/voidful/dtokenizer

discretize everything into tokens
https://github.com/voidful/dtokenizer

Last synced: 8 months ago
JSON representation

discretize everything into tokens

Host: GitHub
URL: https://github.com/voidful/dtokenizer
Owner: voidful
License: mit
Created: 2024-06-09T05:09:32.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-08-13T10:25:22.000Z (almost 2 years ago)
Last Synced: 2025-08-16T14:35:49.882Z (10 months ago)
Language: Python
Size: 24.4 KB
Stars: 6
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # dtokenizer

discretize everything into tokens

## Introduction

`dtokenizer` is a Python library designed to discretize audio files into tokens using various models. It supports models like Hubert and Encodec for tokenization.

## Installation

To use `dtokenizer`, first ensure you have Python and pip installed. Then, install the required dependencies by running:

```bash

pip install -r requirements.txt

```

## Usage

### Hubert Tokenizer

The Hubert tokenizer can be used to tokenize audio files into discrete tokens and then decode them back. Here's how you can use it:

```python

from dtokenizer.audio.model.hubert_model import HubertTokenizer

import soundfile as sf

ht = HubertTokenizer('hubert_layer6_code100')

code, decodec_stuff = ht.encode_file('./sample2_22k.wav')

wav_values = ht.decode(code)

# Write the decoded audio to a file

sf.write('output.wav', wav_values, 16000)

```

### Encodec Tokenizer

Similarly, the Encodec tokenizer allows for efficient audio file tokenization. Here's an example of its usage:

```python

import torch

from dtokenizer.audio.model.encodec_model import EncodecTokenizer

import torchaudio

et = EncodecTokenizer('encodec_24k_6bps')

code, stuff_for_decode = et.encode_file('./sample2_22k.wav')

wav_values = et.decode(stuff_for_decode)

# Save the decoded audio to a file

torchaudio.save('output.wav', torch.from_numpy(wav_values), 22050)

```

## Contributing

We welcome contributions to the `dtokenizer` project. Please feel free to submit issues or pull requests.

## License

This project is released under the MIT License. See the LICENSE file for more details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/voidful/dtokenizer

Awesome Lists containing this project

README