https://github.com/huckiyang/interspeech23-tutorial-para-efficient-cross-modal-tutorial

Interspeech Tutorial - Resource Efficient and Cross-Modal Learning Toward Foundation Modeling
https://github.com/huckiyang/interspeech23-tutorial-para-efficient-cross-modal-tutorial

lora nlp speech-processing tutorial

Last synced: about 1 month ago
JSON representation

Interspeech Tutorial - Resource Efficient and Cross-Modal Learning Toward Foundation Modeling

Host: GitHub
URL: https://github.com/huckiyang/interspeech23-tutorial-para-efficient-cross-modal-tutorial
Owner: huckiyang
Created: 2023-08-14T03:37:04.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-10-09T09:57:56.000Z (over 2 years ago)
Last Synced: 2025-01-26T16:19:35.461Z (about 1 year ago)
Topics: lora, nlp, speech-processing, tutorial
Homepage:
Size: 47.9 KB
Stars: 15
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Interspeech 2023 Tutorial

-  Interspeech 23 `Resource Efficient and Cross-Modal Learning Toward Foundation Modeling Tutorial` - [Video](https://www.youtube.com/watch?v=k_egHWj09l4)

- ICASSP 22 Tutorial `Neural Model Reprogramming and Prompting for Speech Modeling` - [Video](https://www.youtube.com/watch?v=-iirkbYkyXI) | [Slide](https://docs.google.com/presentation/d/1sXcxYiTHY_URovr2irb6QvQj7xFIQd1tzObz4SpyEfo/edit) 

- ICASSP 23 Tutorial `Parameter-Efficient Learning (PEL) for Speech and NLP: Adapters, Prompts, and Reprogramming` - [Slide](https://docs.google.com/presentation/d/16ypY73W0xC0WQxkPUtjchxjhzY8XehTEgRSuMJD4QdM/edit?usp=sharing)

## Part 1. Overview of Resource Efficient Learning, Dr. Huck Yang

`9:00`

### 1.1. Parameter-Efficient Learning

- Background of Frozen Model Adaptation

- Neural Adapter, Reprogramming, Prompting, and Low-Rank Adaptation (LoRA) 

| Title | Authors | Code | Year |

| ----- | ------- | -------- | ---- |

|[Differentially Private Adapters for Parameter Efficient Acoustic Modeling](https://arxiv.org/abs/2305.11360)|C.-W. Ho et al.|[code](https://github.com/Chun-wei-Ho/)|Interspeech 2023|

|[Parameter-Efficient Learning for Text-to-Speech Accent Adaptation](https://arxiv.org/abs/2305.11320)|L.-J. Yang et al.|[code](https://tts-research.github.io/)|Interspeech 2023|

|[A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model](https://arxiv.org/pdf/2305.11244)|S. Radhakrishnan et al.|[code](https://github.com/Srijith-rkr/KAUST-Whisper-Adapter)|Interspeech 2023|

### 1.2. Memory-Efficient Learning

- Reduce to GPU / TPU Memory During the Training (e.g., the Memory of Activation)

- Model Serialization

- Efficient On-Device Learning via Feature Reprogramming (CVPR 2022)

- Ladder-Side Tuning (NeurIPS 2022)

#### 1.3 How to Estimate which Layer or which Model to Tune?

- Universal Approximation Theory (IEEE TIP 1993)

- LogME: Practical Assessment of Pre-trained Models for Transfer Learning (ICML 2021)

- Latent Space Alignment in "Reprogramming Acoustic Models for Time Series Classification" (ICML 2021)

| Title | Authors | Code | Year |

| ----- | ------- | -------- | ---- |

|[How to Estimate Model Transferability of Pre-Trained Speech Models?](https://arxiv.org/pdf/2306.01015.pdf)|Z.-C. Chen et al.|[code](https://github.com/virginiakm1988/LogME-CTC)|Interspeech 2023|

#### 1.4 Advanced Low-Rank Adaptation (LoRA) Techniques

- Cross-Modal Merging

- Low-Rank Adaptation (LoRA) for Foundation Modeling and Pre-Training

#### 1.5 Community Service

- Special Session in ICASSP 2024: In-Context Learning for Speech and Language Processing

- icassp24-icl-sp@googlegroups.com

#### Break: Hand-on Session 1 (5 min)

- How to Train Your Whisper with Neural Adapter and LoRA

## Part 2: Trustworthy AI and Cross-Modal Learning in the Era of Foundation Models, Dr. Pin-Yu Chen

`11:00 to 11:45`

## Part 3: Multimodal Pre-Training for Automatic Speech Recognition and Vision Sharing, Dr. Shalini Ghosh

`11:45 to 12:20`

### Spotlight Invited Talk, "Prompting LLM for ASR," by Dr. Chunyang Wu, Meta AI

`12:20 to 12:30`

## QA and Plenary Discussion

`12:40 to 12:45`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/huckiyang/interspeech23-tutorial-para-efficient-cross-modal-tutorial

Awesome Lists containing this project

README