https://github.com/kyegomez/ast
Implementation of AST from the paper: "AST: Audio Spectrogram Transformer' in PyTorch and Zeta
https://github.com/kyegomez/ast
ai artificial-intelligence attention machine machine-learning machine-learning-algorithms pytorch pytorch-implementation tensorflow
Last synced: about 1 year ago
JSON representation
Implementation of AST from the paper: "AST: Audio Spectrogram Transformer' in PyTorch and Zeta
- Host: GitHub
- URL: https://github.com/kyegomez/ast
- Owner: kyegomez
- License: mit
- Created: 2023-12-30T04:09:32.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-11T17:27:09.000Z (over 2 years ago)
- Last Synced: 2025-04-19T20:16:47.036Z (about 1 year ago)
- Topics: ai, artificial-intelligence, attention, machine, machine-learning, machine-learning-algorithms, pytorch, pytorch-implementation, tensorflow
- Language: Python
- Homepage: https://discord.gg/GYbXvDGevY
- Size: 2.18 MB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://discord.gg/qUtxnK2NMf)
# AST
Implementation of AST from the paper: "AST: Audio Spectrogram Transformer' in PyTorch and Zeta. In this implementation we basically take an 2d input tensor representing audio -> then patchify it -> linear proj -> then position embeddings -> then attention and feedforward in a loop for layers. Please Join Agora and tag me if this could be improved in any capacity.
## Install
`pip3 install ast-torch`
## Usage
```python
import torch
from ast_torch.model import ASTransformer
# Create dummy data
x = torch.randn(2, 16)
# Initialize model
model = ASTransformer(
dim=4, seqlen=16, dim_head=4, heads=4, depth=2, patch_size=4
)
# Run model and print output shape
print(model(x).shape)
```
# Citation
```bibtex
@misc{gong2021ast,
title={AST: Audio Spectrogram Transformer},
author={Yuan Gong and Yu-An Chung and James Glass},
year={2021},
eprint={2104.01778},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
```
# License
MIT