Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/archinetai/archisound
A collection of pre-trained audio models, in PyTorch.
https://github.com/archinetai/archisound
artificial-intelligence audio deep-learning
Last synced: 8 days ago
JSON representation
A collection of pre-trained audio models, in PyTorch.
- Host: GitHub
- URL: https://github.com/archinetai/archisound
- Owner: archinetai
- License: mit
- Created: 2022-11-25T09:24:59.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-27T10:35:04.000Z (almost 2 years ago)
- Last Synced: 2024-10-05T11:17:47.820Z (about 1 month ago)
- Topics: artificial-intelligence, audio, deep-learning
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 110
- Watchers: 5
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ArchiSound
A collection of pre-trained audio models in PyTorch from [`audio-encoders-pytorch`](https://github.com/archinetai/audio-encoders-pytorch) and [`audio-diffusion-pytorch`](https://github.com/archinetai/audio-diffusion-pytorch).
## Install
```bash
pip install archisound
```[![PyPI - Python Version](https://img.shields.io/pypi/v/archisound?style=flat&colorA=black&colorB=black)](https://pypi.org/project/archisound/)
## Autoencoders
* [`dmae1d-ATC32-v3`](https://huggingface.co/archinetai/dmae1d-ATC32-v3/tree/main)
Usage and Info```py
from archisound import ArchiSoundautoencoder = ArchiSound.from_pretrained("dmae1d-ATC32-v3")
x = torch.randn(1, 2, 2**18)
z = autoencoder.encode(x) # [1, 32, 512]
y = autoencoder.decode(z, num_steps=20) # [1, 2, 262144]
```| Info | |
| ------------- | ------------- |
| Input type | Audio (stereo @ 48kHz) |
| Number of parameters | 86M |
| Compression Factor | 32x |
| Downsampling Factor | 512x |
| Bottleneck Type | Tanh |
* [`dmae1d-ATC64-v2`](https://huggingface.co/archinetai/dmae1d-ATC64-v2/tree/main)
Usage and Info```py
from archisound import ArchiSoundautoencoder = ArchiSound.from_pretrained("dmae1d-ATC64-v2")
x = torch.randn(1, 2, 2**18)
z = autoencoder.encode(x) # [1, 32, 256]
y = autoencoder.decode(z, num_steps=20) # [1, 2, 262144]
```| Info | |
| ------------- | ------------- |
| Input type | Audio (stereo @ 48kHz) |
| Number of parameters | 185M |
| Compression Factor | 64x |
| Downsampling Factor | 1024x |
| Bottleneck Type | Tanh |
* [`autoencoder1d-AT-v1`](https://huggingface.co/archinetai/autoencoder1d-AT-v1/tree/main)
Usage and Info```py
from archisound import ArchiSoundautoencoder = ArchiSound.from_pretrained('autoencoder1d-AT-v1')
x = torch.randn(1, 2, 2**18) # [1, 2, 262144]
z = autoencoder.encode(x) # [1, 32, 8192]
y = autoencoder.decode(z) # [1, 2, 262144]
```| Info | |
| ------------- | ------------- |
| Input type | Audio (stereo @ 48kHz) |
| Number of parameters | 20.7M |
| Compression Factor | 2x |
| Downsampling Factor | 32x |
| Bottleneck Type | Tanh |
| Known Limitations | Slight blurriness in high frequency spectrogram reconstruction |
* [`dmae1d-ATC64-v1`](https://huggingface.co/archinetai/dmae1d-ATC64-v1/tree/main)
Usage and InfoA diffusion based autoencoder with high compression ratio. Requires `audio_diffusion_pytorch==0.0.92`.
```py
from archisound import ArchiSoundautoencoder = ArchiSound.from_pretrained("dmae1d-ATC64-v1")
x = torch.randn(1, 2, 2**18)
z = autoencoder.encode(x) # [1, 32, 256]
y = autoencoder.decode(z, num_steps=20) # [1, 2, 262144]
```| Info | |
| ------------- | ------------- |
| Input type | Audio (stereo @ 48kHz) |
| Number of parameters | 234.2M |
| Compression Factor | 64x |
| Downsampling Factor | 1024x |
| Bottleneck Type | Tanh |