An open API service indexing awesome lists of open source software.

https://github.com/wq2012/curriculumvitae

Curriculum Vitae of Quan Wang
https://github.com/wq2012/curriculumvitae

curriculum-vitae resume

Last synced: about 1 month ago
JSON representation

Curriculum Vitae of Quan Wang

Awesome Lists containing this project

README

        

# Curriculum Vitae - Quan Wang

| Current Occupation| Contact|
|-------------------|--------|
| Senior Staff Software Engineer & Tech Lead Manager | Email: [[email protected]](mailto:[email protected]) |
| Google DeepMind | Web: [wangquan.me](https://wangquan.me/) |
| New York, NY | LinkedIn: [www.linkedin.com/in/wangquan](https://www.linkedin.com/in/wangquan/) |

## News

* Enroll in my online course on [Speaker Recognition](https://www.udemy.com/course/speaker-recognition/?referralCode=1914766AF241CE15D19A) (English) and [Speaker Diarization](https://www.udemy.com/course/diarization/?referralCode=21D7CC0AEABB7FE3680F) (English) on Udemy.
* Enroll in my [Speaker Recognition](https://jmq.xet.tech/s/4j70ZU) course (Chinese) on JiQiZhiXin.
* My [award-winning](https://mp.weixin.qq.com/s/QiprULWzfz6ChUCHZyHhwQ) book "Voice Identity Techniques: From core algorithms to engineering practice" (Chinese) can be purchased [here](https://item.jd.com/12970526.html).

## Index

* [Current Role](#current-role)
* [Media Coverage](#media-coverage)
* [Education](#education)
* [Work Experience](#work-experience)
* [Awards](#awards)
* [Invited Talks](#invited-talks)
* [Publications](#publications)
* [Patents](#patents)
* [Academic Service](#academic-service)
* [Teaching and Mentoring](#teaching-and-mentoring)

## Current Role

Dr. Quan Wang leads the **Hotword Modeling** team and the **Speaker, Voice & Language** team at Google DeepMind. The teams deliver a diverse set of server-side and on-device speech models to Google's product ecosystem, including "Hey Google" spoken keyword spotting, voice match, language recognition, spoofed speech detection, speech enhancement, speaker diarization, and multilingual speech recognition. The server-side models power numerous speech features in Google Search, YouTube, Google Cloud, and Google Assistant, used by billions. The on-device models are deployed on billions of Android phones, tablets, Chromebooks, cars, and wearables across the globe.

## Media Coverage

* Book on Voice Identity Techniques: [[博文视点](https://mp.weixin.qq.com/s/ZjmzLRxxUbLwLSIH4u3X5g)] [[语音杂谈](https://mp.weixin.qq.com/s/xwjMlWeZO3azVw0TpwVfMw)] [[机器之心](https://mp.weixin.qq.com/s/iQtHFi34uKTGfvWVOl8adw)] [[载思考](https://mp.weixin.qq.com/s/ZFBFM9FtcDqTSOYGttDDxw)] [[声纹圈](https://mp.weixin.qq.com/s/lYt0Teg_Pj4ponN-Jd-AXg)]

* On-device multilingual speech recognition for Pixel 8: [[The Verge](https://www.theverge.com/2023/10/4/23895660/google-pixel-8-event-news-roundup/archives/2)] [[TechCrunch](https://techcrunch.com/2023/10/04/google-assistant-gets-a-host-of-upgrades-on-the-pixel-8-and-pixel-8-pro/)] [[Android Authority](https://www.androidauthority.com/google-pixel-8-ai-features-3371529/)] [[9to5Google](https://9to5google.com/2023/10/04/google-assistant-pixel-8/)]

* Speaker label for Recorder app:
* Official: [[Google AI Blog](https://ai.googleblog.com/2022/12/who-said-what-recorders-on-device.html)] [[Google AI Official tweet](https://twitter.com/GoogleAI/status/1603138933386199040)]
* English: [[Android Authority](https://www.androidauthority.com/google-pixel-recorder-speaker-labels-tech-3251520/)] [[9to5Google](https://9to5google.com/2022/12/14/pixel-recorder-speaker-labels-work/)] [[Research Snipers](https://researchsnipers.com/google-explains-how-the-pixel-recorder-app-utilizes-ai-models-to-assign-speaker-labels/)] [[Real Mi Central](https://www.realmicentral.com/2022/12/15/google-details-recorder-speaker-labels-work-tensor-tpu-use-to-save-power/)] [[Chrome Unboxed](https://chromeunboxed.com/pixel-recorder-app-speaker-labels)] [[XDA](https://www.xda-developers.com/pixel-recorder-speaker-labels/)] [[engadget](https://www.engadget.com/google-recorder-app-speaker-labels-at-a-glance-rain-alerts-baggage-claim-pixel-7-155339727.html)] [[TechCrunch](https://techcrunch.com/2022/10/06/google-assistant-gets-an-upgrade-on-pixel-7-with-voice-typing-calling-and-transcription-improvements/)]
* Italian: [[TuttoAndroid](https://www.tuttoandroid.net/news/2022/12/15/google-recorder-come-funziona-speaker-labels-973301/)]
* Greek: [[SecNews](https://www.secnews.gr/437961/pixel-recorder-4-2-google-prosthese-ta-speaker-labels/)]
* Chinese: [[机器之心](https://mp.weixin.qq.com/s/_ly7uW7WE925dli-aM5l8g)]

* Quick Phrases: [[9to5Google](https://9to5google.com/2022/09/06/nest-hub-max-quick-phrases/)] [[Droid Life](https://www.droid-life.com/2022/09/06/quick-phrases-arrive-on-nest-hub-max-with-no-ok-google-needed/)] [[The Verge](https://www.theverge.com/2022/9/8/23342903/nest-hub-max-quick-phrases-hey-google)] [[Voicebot.ai](https://voicebot.ai/2022/09/08/google-assistant-adds-quick-phrases-for-skipping-wake-words-to-nest-hub-max/)]

* On-device language identification for Live Caption/Translate: [[Google Pixel Blog](https://blog.google/products/pixel/feature-drop-march-2022/)]

* Google Cloud Speaker ID
* Official: [[Google Cloud Blog](https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-announces-speaker-id)] [[Homepage](https://cloud.google.com/speaker-id)] [[Google Cloud official tweet](https://twitter.com/googlecloud/status/1444106862933250055)] [[YouTube](https://www.youtube.com/watch?v=EvCQzIcphOc)]
* English: [[SiliconANGLE](https://siliconangle.com/2021/10/01/speaker-id-callers-can-now-use-voice-authenticate/)] [[Techzine](https://www.techzine.eu/news/cloud/66435/google-clouds-speaker-id-adds-speech-identification/)] [[TheRegister](https://www.theregister.com/2021/10/06/google_speech_id/)] [[BiometricUpdate](https://www.biometricupdate.com/202110/google-adds-voice-biometrics-to-contact-center-ai-platform)]
* Dutch: [[TechZine](https://www.techzine.nl/nieuws/cloud/467674/google-verbetert-contactcenter-automatisering-met-speaker-id/)]
* Chinese: [[TensorFlow公众号](https://mp.weixin.qq.com/s/ZC3-UyrIKbhGTi88WQKyAQ)] [[HiNet](https://times.hinet.net/news/23537105)] [[iThome](https://www.ithome.com.tw/news/147070)]

* VoiceFilter-Lite:
* Official:
[[Google AI Blog](https://ai.googleblog.com/2020/11/improving-on-device-speech-recognition.html)]
[[Google AI official tweet](https://twitter.com/GoogleAI/status/1326661322746941440)]
* English:
[[TheNextWeb](https://thenextweb.com/neural/2020/11/12/at-just-2-2mb-googles-new-speech-filtering-tech-is-perfect-for-mobile-apps/)]
[[apk9to5](https://www.apk9to5.com/2020/11/12/just-2-2-mb-of-speech-recognition-algorithm-from-google/)]
[[Somag News](https://www.somagnews.com/google-voice-recognition-algorithm-voicefilter/)]
[[Silicon Canals](https://siliconcanals.com/news/google-voicefilter-lite-voice-command/)]
[[AndroidFist](https://androidfist.com/google-improves-on-device-speech-recognition-with-voicefilter-lite/)]
[[eyerys](https://www.eyerys.com/articles/news/how-googles-voicefilter-lite-can-significantly-improve-device-speech-recognition)]
[[voicebot.ai](https://voicebot.ai/2020/11/16/google-demonstrates-new-speech-recognition-model-for-mobile-use/)]
[[kouragoal.com](https://kouragoal.com/google-improves-voice-command-features-for-apps/)]
[[Medium](https://medium.com/dev-genius/androids-voicefilter-lite-can-recognize-the-speaker-svoice-2c215fb86a15)]
* Chinese:
[[机器之心](https://mp.weixin.qq.com/s/hOfdyUH4AhUA1rmkZprVIQ)]
[[搜狐](https://www.sohu.com/a/431787490_129720)]
[[新浪](https://tech.sina.com.cn/roll/2020-11-14/doc-iiznezxs1815483.shtml)]
[[声纹圈](https://mp.weixin.qq.com/s/9DB9pn0rboZIXVSsJ6rjTA)]
[[Google News App](https://www.googlenewsapp.com/improving-on-device-speech-recognition-with-voicefilter-lite/)]
[[新经网](https://www.xinhuatone.com/AI/202011/30604.html)]
[[iThome](https://www.ithome.com.tw/news/141072)]
[[HiNet](https://times.hinet.net/news/23117109)]
* Spanish:
[[Nuevo Periodico](https://nuevoperiodico.com/con-solo-22-mb-la-nueva-tecnologia-de-filtrado-de-voz-de-google-es-perfecta-para-aplicaciones-moviles/)]
[[nobbot](https://www.nobbot.com/tecnologia/aplicaciones-moviles-tecnologia/reconocimiento-de-voz-google/)]
* Turkish:
[[teknotalk](https://www.teknotalk.com/yalnizca-22-mb-boyutuyla-googlein-yeni-konusma-filtreleme-teknolojisi-79498/)]
[[webtekno](https://www.webtekno.com/google-ses-tanima-algoritmasi-voicefilter-h102015.html)]
[[TechInside](https://www.techinside.com/google-voicefilter-icin-yeni-surum-duyurdu/)]
[[Trendoa.com](https://www.trendoa.com/googledan-sadece-22-mblik-ses-tanima-algoritmasi.html)]
* Japanese:
[[webbigdata.jp](https://webbigdata.jp/ai/post-7762)]
* Russian:
[[Neurohive](https://neurohive.io/ru/vidy-nejrosetej/voicefilter-lite-legkovesnaya-arhitektura-dlya-raspoznavaniya-rechi/)]
* Arabic:
[[aitnews](https://aitnews.com/2020/11/14/%D8%AC%D9%88%D8%AC%D9%84-%D8%AA%D8%AD%D8%B3%D9%86-%D9%85%D9%8A%D8%B2%D8%A7%D8%AA-%D8%A7%D9%84%D8%A3%D9%88%D8%A7%D9%85%D8%B1-%D8%A7%D9%84%D8%B5%D9%88%D8%AA%D9%8A%D8%A9-%D9%84%D9%84%D8%AA%D8%B7%D8%A8/)]

* Diarization and UIS-RNN:
* Official:
[[Google AI Blog](https://ai.googleblog.com/2018/11/accurate-online-speaker-diarization.html)]
[[Google AI official tweet](https://twitter.com/GoogleAI/status/1062095536256278528)]
* English:
[[VentureBeat](https://venturebeat.com/2018/11/12/google-open-sources-ai-that-can-distinguish-between-voices-with-92-percent-accuracy/)]
[[SiliconAngle](https://siliconangle.com/2018/11/12/google-built-ai-model-can-accurately-distinguish-different-human-voices/)]
[[InfoQ](https://www.infoq.com/news/2018/11/Google-AI-Voice)]
[[OpenSourceForU](http://opensourceforu.com/2018/11/google-open-sources-ai-technology-of-speaker-diarization/)]
[[Futurism](https://futurism.com/the-byte/google-ai-recognize-new-voices)]
* Chinese:
* Full article:
[[量子位](https://mp.weixin.qq.com/s/YOupCjU06JhRCZNCbMvAgQ)]
[[cnBeta](https://www.cnbeta.com/articles/tech/787295.htm)]
[[EEPW电子产品世界](http://www.eepw.com.cn/article/201811/394377.htm)]
[[Sina新浪科技](https://tech.sina.com.cn/it/2018-11-13/doc-ihmutuea9658316.shtml)]
[[iThome](https://www.ithome.com.tw/news/126984)]
[[osChina开源中国](http://oschina.net/news/101780/fully-super-vised-speaker-diarization)]
[[机器之心快讯](https://www.jiqizhixin.com/dailies/92fb72c6-4e83-49d1-af63-e6103fa6840b)]
[[网易科技](http://tech.163.com/18/1113/11/E0G6J5HI00097U7T.html)]
[[ChinaEmail中国邮箱网](http://www.chinaemail.com.cn/blog/content/9652/%E8%B0%B7%E6%AD%8C%E5%BC%80%E6%BA%90AI%E8%83%BD%E5%8C%BA%E5%88%86%E5%A3%B0%E9%9F%B3-%E5%87%86%E7%A1%AE%E7%8E%87%E8%BE%BE92%25.html)]
[[智东西快讯](http://zhidx.com/news/9005.html)]
[[报价宝](http://www.baojiabao.com/bjbnews/zh201811131700226357.html)]
[[HiNet](https://times.hinet.net/news/22076118)]
[[IT经理网](https://www.ctocio.com/ccnews/28155.html)]
[[贤集网](https://www.xianjichina.com/news/details_91094.html)]
* Included in:
[[机器之心AI每日精选](https://www.jiqizhixin.com/topics/2018-11-13)]
[[人工智能半月刊](http://finance.sina.com.cn/stock/stockzmt/2018-11-17/doc-ihnyuqhh3951826.shtml)]
[[智东西早报](http://www.sohu.com/a/275211682_115978)]
* Russian:
[[dev.by](https://dev.by/news/google-otkryla-ii-algoritm-kotoryi-raspoznayot-golosa-s-92-procentnoi-tochnostyu)]
* Italian:
[[tom's HARDWARE](https://www.tomshw.it/altro/google-lia-che-riconosce-le-voci-col-92-di-accuratezza-diventa-open-source/)]
* Vietnamese:
[[GENK](http://genk.vn/ai-cua-google-co-kha-nang-phan-biet-giong-noi-nhieu-nguoi-khac-nhau-voi-do-chinh-xac-toi-92-20181114162438144.chn)]
* Japanese: [[WebBigData](https://webbigdata.jp/ai/post-2182)]

* VoiceFilter:
* English:
[[VentureBeat](https://venturebeat.com/2018/10/12/google-researchers-use-ai-to-pick-out-voices-in-a-crowd/)]
* Chinese:
[[机器之心](https://www.jiqizhixin.com/articles/2018-10-17-8)]
[[新智元](https://mp.weixin.qq.com/s/2DaBsFnRzqkf9PgvY3tWqw)]
[[搜狐](https://www.sohu.com/a/259595992_100085595)]
[[简书Tech blog](https://www.jianshu.com/p/6d0c27200c01)]
* Russian:
[[Tproger](https://tproger.ru/news/google-pick-out-voice/)]

* Joint ASR frontend: [[声纹圈](https://mp.weixin.qq.com/s/OZJIyhkLrYhWDALMtl5Mdw)]

* Token-level SCD loss: [[声纹圈](https://mp.weixin.qq.com/s/6G329X3ctq_7T8VlL5zPYA)]

* Synth2Aug: [[声纹圈](https://mp.weixin.qq.com/s/H8RpUMMKB8XBnKRgXAhm6g)]

* SpeakerStew: [[声纹圈](https://mp.weixin.qq.com/s?__biz=MzA5OTA4NTkyMA==&mid=2454902168&idx=1&sn=ce8fea8c5b299882871f8ad24b2e3709)] [[语音杂谈](https://mp.weixin.qq.com/s/Tuo9YYdTvdaaDDyYpnO3mQ)]

* Multispeaker Text-to-Speech:
[[机器之心](https://www.jiqizhixin.com/articles/062404)] [[Two Minute Papers](https://www.youtube.com/watch?v=0sR1rU3gLzQ)]

* Translatotron2:
* Official: [[Google AI Blog](https://ai.googleblog.com/2021/09/high-quality-robust-and-responsible.html)]
* English: [[VentureBeat](https://venturebeat.com/2021/07/23/googles-translatotron-2-removes-ability-to-deepfake-voices/)] [[voicebot.ai](https://voicebot.ai/2021/07/27/googles-translatotron-2-improves-linguistic-shifts-without-the-deepfake-potential/)] [[slator](https://slator.com/google-ups-ante-in-speech-to-speech-translation-with-robust-translatotron-2/)] [[Analytics India Magazine](https://analyticsindiamag.com/google-releases-new-version-of-translatotron-its-end-to-end-speech-translation-model/)] [[MarkTechPost](https://www.marktechpost.com/2021/08/07/google-ai-introduces-translatotron-2-a-neural-direct-speech-to-speech-translation-model-without-the-deepfake-potential/)]
* Chinese: [[TensorFlow公众号](https://mp.weixin.qq.com/s/8I_IKgChGLm56pZCJlTJUw)] [[AI前线](https://mp.weixin.qq.com/s/4k9cBWSp0AxmD_BdGO1Blg)]

* Translatotron:
* Official: [[Google AI Blog](https://ai.googleblog.com/2019/05/introducing-translatotron-end-to-end.html)]
[[Google AI official tweet](https://twitter.com/GoogleAI/status/1128732845566976000)]
* English: [[VentureBeat](https://venturebeat.com/2019/05/15/googles-translatotron-is-an-end-to-end-model-that-mimics-human-voices/)]
[[TechCrunch](https://techcrunch.com/2019/05/15/googles-translatotron-converts-one-spoken-language-to-another-no-text-involved)]
[[CNET](https://www.cnet.com/news/googles-translatotron-translates-speech-directly-to-speech/)]
[[Android Central](https://www.androidcentral.com/googles-translatotron-will-mimic-speakers-voice-when-translating)]
[[Engadget](https://www.engadget.com/2019/05/15/google-translatotron-direct-speech-translation/)]
[[Gadgets](https://gadgets.ndtv.com/apps/news/google-unveils-translatotron-its-speech-to-speech-translation-system-2038623)]
[[Android Police](https://www.androidpolice.com/2019/05/16/google-introduces-direct-speech-to-speech-translation-technology-it-calls-translatotron/)]
[[slator](https://slator.com/slator-pro/behold-the-translatotron-googles-latest-move-in-speech-translation/)]
* Chinese: [[量子位](https://posts.careerengine.us/p/5cb57549a74a261fe5c48db7)]
[[cnBeta](https://www.cnbeta.com/articles/tech/847773.htm)]
[[机器之心](https://www.jiqizhixin.com/articles/2019-05-16-13)]
[[新智元](https://mp.weixin.qq.com/s/1somQ_I3LwyQUX_OC5W8Kw)]

* Bolo Android App:
* Official: [[Official Site](https://bolo.withgoogle.com/intl/en/)] [[Google Play](https://play.google.com/store/apps/details?id=com.google.android.apps.seekh&hl=en_US)] [[Sundar's tweet](https://twitter.com/sundarpichai/status/1103985641702883328)] [[Google AI Education](https://blog.google/technology/ai/bolo-literacy/)]
* English: [[Techcrunch](https://techcrunch.com/2019/03/06/google-introduces-educational-app-bolo-to-improve-childrens-literacy-rates-in-india)] [[Venturebeat](https://venturebeat.com/2019/03/05/google-releases-bolo-a-speech-recognition-app-that-helps-indian-kids-learn-to-read)] [[IndiaTimes](https://www.indiatimes.com/technology/news/google-bolo-used-ai-power-to-improve-children-s-reading-skills-by-64-in-200-u-p-schools-363248.html)] [[NDTV](https://gadgets.ndtv.com/apps/news/google-bolo-reading-tutor-app-india-launch-android-play-store-speech-based-2003488)] [[CNN](https://www.cnn.com/2019/03/07/tech/google-bolo-india-reading-app/index.html)] [[9to5Google](https://9to5google.com/2019/03/06/google-launches-bolo/)]
* Chinese: [[新浪科技](https://tech.sina.com.cn/i/2019-03-11/doc-ihsxncvh1512147.shtml)] [[博客园](https://news.cnblogs.com/n/621444/)] [[智能手机网](http://www.xda.cn/keji/20190307/032505.html)] [[科技新报](http://technews.tw/2019/03/12/google-bolo-is-a-new-speech-based-reading-tutor-app-that-helps-children-learn-to-read/)]

* Multi-language on Google Home:
[[Launch blog](https://www.blog.google/products/assistant/meet-bilingual-google-assistant-new-smart-home-devices/)]
[[Google AI Blog](https://ai.googleblog.com/2018/08/Multilingual-Google-Assistant.html)]

* Multi-user voice match on Google Home:
* Text-dependent: [[Launch blog](https://blog.google/products/assistant/tomato-tomahto-google-home-now-supports-multiple-users/)]
* Text-independent: [[Blog on more smart speakers](https://www.blog.google/products/assistant/bringing-google-assistant-features-all-smart-devices/)] [[Enrollment UI launch blog](https://www.blog.google/products/assistant/more-ways-fine-tune-google-assistant-you/)] [[engadget](https://www.engadget.com/google-assistant-voice-match-boost-013116528.html)]

* ASVspoof: [[Google Blog](https://www.blog.google/outreach-initiatives/google-news-initiative/advancing-research-fake-audio-detection/)] [[9to5Google](https://ww.9to5google.com/2019/01/31/google-deep-fake-audio-detection/)] [[IBC365](https://www.ibc.org/tech-advances/google-leads-fight-against-fake-audio/3560.article)] [[Digital Information World](https://www.digitalinformationworld.com/2019/02/google-is-working-to-help-ai-systems-determine-if-an-audio-recording-is-real.html)]

* ICASSP 2018 speaker recognition papers:
[[机器之心](https://www.jiqizhixin.com/articles/2017-11-08)]

* MLFont:
* Official:
[[GoogleFonts official tweet](https://twitter.com/googlefonts/status/900019727732531200)]
* Chinese:
[[GooFan](http://www.goofan.net/earlyaccess-noto-sans-sc-sliced.html)]
[[LanDianNews](https://www.landiannews.com/archives/38885.html)]
[[cnBeta](https://www.cnbeta.com/articles/tech/645051.htm)]

* AGSM:
[[HighBeam Research](https://www.highbeam.com/doc/1G1-313059756.html)]
[[Issues in Computer Engineering: 2013 Edition](https://books.google.com/books?id=TuamoWYw9gMC&pg=PA194)]

* Semantic Context Forests:
[[HighBeam Research](https://www.highbeam.com/doc/1P3-3332903311.html)]

* COSBOS:
[[Technology Org](https://www.technology.org/2015/12/02/team-invents-occupancy-sensing-with-distributed-photodiodes/)]
[[PRWeb](http://www.prweb.com/releases/2015/12/prweb13112205.htm)]

## Education

* 2010/08 – 2014/10, **Ph.D., Rensselaer Polytechnic Institute,** NY, USA
* Signal Analysis and Machine Perception Laboratory (SAMPL)
* Department of Electrical, Computer, and Systems Engineering (ECSE)
* Advisor: **Prof. Kim L. Boyer**
* Thesis: Exploiting Geometric and Spatial Constraints for Vision and Lighting Applications
* GPA: 4.0/4.0

* 2006/08 – 2010/08, **B.Eng. in Automation, Tsinghua University,** Beijing, China
* Department of Automation, Class of Fundamental Sciences
* Advisor: **Prof. Qionghai Dai**
* Thesis: Implementation and Study of Light-Field-Based 3D Object Retrieval System
* Major GPA: 91.3/100

## Work Experience

* 2015/11 – Current, *Senior Staff Software Engineer* and *Tech Lead Manager*, **Google**, New York City, NY, USA
* Manager: Dr. Ignacio Lopez Moreno
* "OK Google" voice search & actions
* Speaker identification and speaker diarization
* Language recognition and diarization
* VoiceFilter source separation
* Learning-based font loading (MLFont)

* 2014/11 – 2015/10, *Machine Learning Scientist*, **Amazon**, Cambridge, MA, USA
* Manager: Dr. Shiv Vitaladevuni
* Amazon Firefly: Optical character recognition
* Amazon Echo: Speech recognition

* 2013/05 – 2013/08, *Research Intern*, **IBM Almaden Research Center**, San Jose, CA, USA
* Manager: Dr. Tanveer Syeda-Mahmood
* Automated segmentation and heart disease detection from echocardiogram images
* The Medical Sieve project (in Java)

* 2012/05 – 2012/08, *Research Intern*, **Siemens Corporate Research**, Princeton, NJ, USA
* Manager: Dr. Dijia Wu and Dr. Shaohua Kevin Zhou
* Learning-based automatic knee cartilage segmentation in 3D MR images (in C++)

* 2009/06 – 2009/07, *Intern Programmer*, **Northking Technology Corporation**, Beijing, China
* The development of the Business Operation System of Northking Technology Corporation (with JSF framework)

## Awards

* SLT 2024 Best Paper Finalist (top 2% paper, 9/373 submitted, 9/170 accepted)
* For the paper "GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting".

* [2024 AI 2000 Most Influential Scholar Honorable Mention](https://raw.githubusercontent.com/wq2012/CurriculumVitae/master/resources/AMiner_2024_Most_Influential_Scholar.png)
* Awarded by AMiner.org in recognition of outstanding and vibrant contributions in the field of Speech Recognition between 2014-2023.

* [ASRU 2023 Best Paper Finalist](https://raw.githubusercontent.com/wq2012/CurriculumVitae/master/resources/ASRU_2023_top_papers.png) (top 3% paper, 12/435)
* For the paper "Improved long-form speech recognition by jointly modeling the primary and non-primary speakers".

* [Annual Best Content Contribution Award, 2022](https://raw.githubusercontent.com/wq2012/CurriculumVitae/master/resources/CircleOfVoiceprint_Best_Content_QuanWang.jpg)
* Awarded by The Circle of Voiceprint

* [Top 100 case studies of the year, 2021](https://raw.githubusercontent.com/wq2012/CurriculumVitae/master/resources/Top100_Summit_Award_QuanWang.jpg)

* [**Distinguished Author** of Year 2020](https://raw.githubusercontent.com/wq2012/CurriculumVitae/master/resources/PHEI_Author_Award_QuanWang.jpg)
* Awarded by Publishing House of Electronics Industry (PHEI).

* [The **Allen B. Dumont Prize**, 2015](resources/Allen_DuMont_Prize_QuanWang.pdf)
* This prize is awarded to a graduate student who has demonstrated high scholastic ability and has made a substantial contribution to that field.

## Invited talks

* Keynote talk at [Speech and Audio in the Northeast (SANE) 2024](https://www.saneworkshop.org/sane2024/)
* "Speaker diarization at Google: From modularized systems to LLMs" [[YouTube](https://www.youtube.com/watch?v=pO6dfo4BSyk&ab_channel=SpeechandAudiointheNortheast%28SANE%29)] [[Slides](https://drive.google.com/file/d/1IlMnZYPTnZ6R1YkkPUnoN0JfeoMv7QIu/view)]

* Invited talk at [MIT CSAIL](https://www.csail.mit.edu/), 2024
* "Advances in Speaker Diarization at Google"

* [Summit of Top 100 Global Software Case Studies](https://www.top100summit.com/), 2021
* "Building the Product Ecosystem for Voice and Language Recognition" (声纹与语种识别的产品生态构建)

## Publications

[[Google Scholar](https://scholar.google.com/citations?user=cB62SPcAAAAJ)]

### Books

* **Quan Wang**, "Voice Identity Techniques: From core algorithms to engineering practice" (Chinese), Publishing House of Electronics Industry (PHEI), September 2020. [[GitHub](https://github.com/wq2012/VoiceIdentityBook)] [[JD](https://item.jd.com/12970526.html)] [[TMall](https://detail.tmall.com/item.htm?id=628032618898)] [[DangDang](http://product.dangdang.com/29130997.html)]

### Journal Publications

* **Quan Wang**, Ignacio Lopez Moreno, "Version Control of Speaker Recognition Systems", Journal of Systems and Software, Volume 216, 2024.
[[link](https://authors.elsevier.com/a/1jFwPbKHpGRYB)]
[[PDF](https://arxiv.org/pdf/2007.12069.pdf)]
[[software](https://github.com/wq2012/SpeakerVerSim)]

* **Quan Wang**, Kim L. Boyer, "The active geometric shape model: A new robust deformable shape model and its applications", Computer Vision and Image Understanding, Volume 116, Issue 12, December 2012, Pages 1178-1194, ISSN 1077-3142, doi:10.1016/j.cviu.2012.08.004.
[[link](https://www.sciencedirect.com/science/article/pii/S1077314212001154?via%3Dihub)]
[[PDF](https://github.com/wq2012/AGSM/blob/master/documentation/AGSM_CVIU_2012.pdf)]
[[slides](https://github.com/wq2012/AGSM/blob/master/documentation/AGSM%20PPT.pdf)]
[[software](https://www.mathworks.com/matlabcentral/fileexchange/38358-active-geometric-shape-models)]

* **Quan Wang**, Xinchi Zhang, Kim L. Boyer, "Occupancy distribution estimation for smart light delivery with perturbation-modulated light sensing", Journal of Solid State Lighting 2014 1:17, ISSN 2196-1107, doi:10.1186/s40539-014-0017-2.
[[link](https://journalofsolidstatelighting.springeropen.com/articles/10.1186/s40539-014-0017-2)]
[[PDF](https://github.com/wq2012/COSBOS/blob/master/Papers/COSBOS_JSSL_2014.pdf)]
[[software](https://www.mathworks.com/matlabcentral/fileexchange/48428-cosbos-color-sensor-based-occupancy-sensing)]

* Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, **Quan Wang**, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling, "ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech", Computer Speech & Language, Volume 64, doi:10.1016/j.csl.2020.101114. [[link](https://www.sciencedirect.com/science/article/abs/pii/S0885230820300474)] [[PDF](https://arxiv.org/pdf/1911.01601.pdf)]

### Conference Publications

* Pai Zhu, Jacob W. Bartel, Dhruuv Agarwal, Kurt Partridge, Hyun Jin Park, **Quan Wang**, "GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting", IEEE Spoken Language Technology Workshop (SLT), 2024. [[PDF](https://arxiv.org/pdf/2410.16647)] [**Best Paper Finalist**]

* Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, **Quan Wang**, "Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting", SynData4GenAI Workshop, 2024. [[PDF](https://arxiv.org/abs/2408.10463)]

* Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, **Quan Wang**, "Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model", SynData4GenAI Workshop, 2024. [[PDF](https://arxiv.org/abs/2407.18879)]

* Pai Zhu, Dhruuv Agarwal, Jacob W. Bartel, Kurt Partridge, Hyun Jin Park, **Quan Wang**, "Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments", SynData4GenAI Workshop, 2024. [[PDF](https://arxiv.org/abs/2407.16840)]

* **Quan Wang**, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao, "DiarizationLM: Speaker Diarization Post-Processing with Large Language Models", Interspeech, 2024. [[PDF](https://arxiv.org/pdf/2401.03506.pdf)] [[model](https://huggingface.co/google/DiarizationLM-8b-Fisher-v2)] [[demo](https://huggingface.co/spaces/diarizers-community/DiarizationLM-GGUF)]

* Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, **Quan Wang**, "On the Success and Limitations of Auxiliary Network Based Word-Level End-to-End Neural Speaker Diarization", Interspeech, 2024.

* Beltrán Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati, **Quan Wang**, Alicia Lozano-Diez, Alex Park, Ignacio López Moreno, "Personalizing Keyword Spotting with Speaker Information", 2023. [[PDF](https://arxiv.org/pdf/2311.03419.pdf)]

* Guanlong Zhao, Yongqiang Wang, Jason Pelecanos, Yu Zhang, Hank Liao, Yiling Huang, Han Lu, **Quan Wang**, "USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models", 2023. [[PDF](https://arxiv.org/pdf/2309.08023.pdf)]

* Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, **Quan Wang**, "Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network", 2023. [[PDF](https://arxiv.org/pdf/2309.08489.pdf)]

* Guru Prakash Arumugam, Shuo-yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, **Quan Wang**, Shaan Bijwadia, "Improved long-form speech recognition by jointly modeling the primary and non-primary speakers", IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023. [[PDF](https://arxiv.org/pdf/2312.11123.pdf)] [[**Best Paper Finalist**](https://raw.githubusercontent.com/wq2012/CurriculumVitae/master/resources/ASRU_2023_top_papers.png)]

* Tom O’Malley, Shaojin Ding, Arun Narayanan, **Quan Wang**, Rajeev Rikhye, Qiao Liang, Yanzhang He, Ian McGraw, "Conditional Conformer: Improving Speaker Modulation for Single and Multi-User Speech Enhancement", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

* Beltrán Labrador, Guanlong Zhao, Ignacio López Moreno, Angelo Scorza Scarpati, Liam Fowl, **Quan Wang**, "Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[[PDF](https://arxiv.org/pdf/2211.06478.pdf)]

* Guanlong Zhao, **Quan Wang**, Han Lu, Yiling Huang, Ignacio Lopez Moreno, "Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[[PDF](https://arxiv.org/pdf/2211.06482.pdf)]

* **Quan Wang**, Yiling Huang, Han Lu, Guanlong Zhao, Ignacio Lopez Moreno, "Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering", arXiv:2210.13690 [*eess.AS*], 2022.
[[PDF](https://arxiv.org/pdf/2210.13690.pdf)]

* Tom O'Malley, Arun Narayanan, **Quan Wang**, "A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation", Interspeech, 2022. [[PDF](https://arxiv.org/abs/2209.06410)]

* Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, **Quan Wang**, Arun Narayanan, Tom O'Malley, Ian McGraw, "Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition", Interspeech, 2022. [[PDF](https://arxiv.org/abs/2204.03793)]

* Jason Pelecanos, **Quan Wang**, Yiling Huang, Ignacio Lopez Moreno, "Parameter-Free Attentive Scoring for Speaker Verification", Odyssey: The Speaker and Language Recognition Workshop, 2022.
[[PDF](https://arxiv.org/pdf/2203.05642.pdf)]

* **Quan Wang**, Yang Yu, Jason Pelecanos, Yiling Huang, Ignacio Lopez Moreno, "Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech", Odyssey: The Speaker and Language Recognition Workshop, 2022.
[[PDF](https://arxiv.org/pdf/2202.12163.pdf)] [[model](https://huggingface.co/tflite-hub/conformer-lang-id)] [[demo](https://huggingface.co/spaces/tflite-hub/lang-id-demo)]

* Rajeev Rikhye, **Quan Wang**, Qiao Liang, Yanzhang He, Ian McGraw, "Closing the Gap between Single-User and Multi-User VoiceFilter-Lite", Odyssey: The Speaker and Language Recognition Workshop, 2022.
[[PDF](https://arxiv.org/pdf/2202.12169.pdf)]

* Ye Jia, Michelle Tadmor Ramanovich, **Quan Wang**, Heiga Zen, "CVSS Corpus and Massively Multilingual Speech-to-Speech Translation", Conference on Language Resources and Evaluation (LREC), 2022.
[[PDF](https://arxiv.org/pdf/2201.03713.pdf)] [[data](https://github.com/google-research-datasets/cvss)] [[Google AI Blog](https://ai.googleblog.com/2022/04/introducing-cvss-massively-multilingual.html)]

* Wei Xia, Han Lu, **Quan Wang**, Anshuman Tripathi, Ignacio Lopez Moreno, Hasim Sak, "Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
[[PDF](https://arxiv.org/pdf/2109.11641.pdf)] [[code](https://github.com/wq2012/SpectralCluster)]

* Arun Narayanan, Chung-Cheng Chiu, Tom O'Malley, **Quan Wang**, Yanzhang He, "Cross-Attention Conformer for Context Modeling in Speech Enhancement for ASR", IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021. [[PDF](https://arxiv.org/pdf/2111.00127.pdf)]

* Tom O'Malley, Arun Narayanan, **Quan Wang**, Alex Park, James Walker, Nathan Howard, "A Conformer-Based ASR Frontend For Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation", IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021. [[PDF](https://arxiv.org/pdf/2111.09935.pdf)]

* Rajeev Rikhye, **Quan Wang**, Qiao Liang, Yanzhang He, Ian McGraw, "Multi-user VoiceFilter-Lite via Attentive Speaker Embedding", IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021. [[PDF](https://arxiv.org/pdf/2107.01201.pdf)]

* Rajeev Rikhye, **Quan Wang**, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng (Arden) Huang, Arun Narayanan, Ian McGraw, "Personalized Keyphrase Detection using Speaker and Environment Information", Interspeech, 2021. [[PDF](https://arxiv.org/pdf/2104.13970.pdf)]

* Roza Chojnacka, Jason Pelecanos, **Quan Wang**, Ignacio Lopez Moreno, "SpeakerStew: Scaling to Many Languages with a Triaged Multilingual Text-Dependent and Text-Independent Speaker Verification System", Interspeech, 2021. [[PDF](https://arxiv.org/pdf/2104.02125.pdf)] [[model](https://huggingface.co/tflite-hub/conformer-speaker-encoder)] [[demo](https://huggingface.co/spaces/tflite-hub/speaker-id-demo)]

* Jason Pelecanos, **Quan Wang**, Ignacio Lopez Moreno, "Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition", Interspeech, 2021. [[PDF](https://arxiv.org/pdf/2104.01989.pdf)]

* Yiling Huang, Yutian Chen, Jason Pelecanos, **Quan Wang**, "Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech", IEEE Spoken Language Technology Workshop (SLT), 2021. [[PDF](https://arxiv.org/pdf/2011.11818.pdf)]

* Shaojin Ding, Ye Jia, Ke Hu, **Quan Wang**, "Textual Echo Cancellation", IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021.
[[PDF](https://arxiv.org/pdf/2008.06006.pdf)]

* **Quan Wang**, Ignacio Lopez Moreno, Mert Saglam, Kevin Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein, "VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition", Interspeech, 2020. [[PDF](https://arxiv.org/pdf/2009.04323.pdf)]
[[website](https://google.github.io/speaker-id/publications/VoiceFilter-Lite/)] [[Google AI Blog](https://ai.googleblog.com/2020/11/improving-on-device-speech-recognition.html)]

* Shaojin Ding, **Quan Wang**, Shuo-yiin Chang, Li Wan, Ignacio Lopez Moreno, "Personal VAD: Speaker-Conditioned Voice Activity Detection", Odyssey: The Speaker and Language Recognition Workshop, 2020.
[[PDF](https://www.isca-speech.org/archive/Odyssey_2020/pdfs/2.pdf)]

* Li Wan, Prashant Sridhar, Yang Yu, **Quan Wang**, Ignacio Lopez Moreno, "Tuplemax Loss for Language Identification", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.
[[PDF](https://arxiv.org/pdf/1811.12290.pdf)] [[poster](https://github.com/google/speaker-id/blob/master/publications/Tuplemax/resources/icassp2019_tuplemax_poster.pdf)]

* **Quan Wang**, Hannah Muckenhirn, Kevin Wilson, Prashant Sridhar, Zelin Wu, John Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez Moreno, "VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking", Interspeech, 2019. (ORAL)
[[PDF](https://arxiv.org/pdf/1810.04826.pdf)]
[[samples](https://google.github.io/speaker-id/publications/VoiceFilter/)]

* Aonan Zhang, **Quan Wang**, Zhenyao Zhu, John Paisley, Chong Wang, "Fully Supervised Speaker Diarization", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.
[[PDF](https://arxiv.org/pdf/1810.04719.pdf)]
[[code](https://github.com/google/uis-rnn)] [[poster](https://github.com/google/uis-rnn/blob/master/resources/icassp2019_supervised_diarization_poster.pdf)] [[Google AI Blog](https://ai.googleblog.com/2018/11/accurate-online-speaker-diarization.html)]

* Yutian Chen, Yannis Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, **Quan Wang**, Luis C. Cobo, Andrew Trask, Ben Laurie, Caglar Gulcehre, Aäron van den Oord, Oriol Vinyals, Nando de Freitas, "Sample Efficient Adaptive Text-to-Speech", International Conference on Learning Representations (ICLR 2019).
[[PDF](https://arxiv.org/pdf/1809.10460.pdf)]
[[samples](https://sample-efficient-adaptive-tts.github.io/demo/)] [[poster](https://sample-efficient-adaptive-tts.github.io/demo/poster/poster.pdf)] [[Google AI Blog](https://ai.googleblog.com/2019/05/google-at-iclr-2019.html)]

* Ye Jia, Yu Zhang, Ron J. Weiss, **Quan Wang**, Jonathan Shen, Fei Ren, Zhifeng Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu, "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis", Advances in neural information processing systems (NeurIPS 2018).
[[PDF](https://arxiv.org/pdf/1806.04558.pdf)]
[[samples](https://google.github.io/tacotron/publications/speaker_adaptation/)]
[[poster](https://google.github.io/tacotron/publications/speaker_adaptation/poster.pdf)]
[[Google AI Blog](https://ai.googleblog.com/2018/12/google-at-neurips-2018.html)]

* **Quan Wang**, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno, "Speaker Diarization with LSTM", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
[[PDF](https://arxiv.org/pdf/1710.10468.pdf)]
[[poster](https://google.github.io/speaker-id/publications/LstmDiarization/resources/icassp2018_diarization_poster.pdf)]
[[code](https://github.com/wq2012/SpectralCluster)]
[[wiki](https://google.github.io/speaker-id/publications/LstmDiarization/)]

* Li Wan, **Quan Wang**, Alan Papir, Ignacio Lopez Moreno, "Generalized End-to-End Loss for Speaker Verification", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018. (ORAL)
[[PDF](https://arxiv.org/pdf/1710.10467.pdf)]
[[slides]](https://sigport.org/documents/generalized-end-end-loss-speaker-verification)
[[wiki](https://google.github.io/speaker-id/publications/GE2E/)]

* F A Rezaur Rahman Chowdhury, **Quan Wang**, Ignacio Lopez Moreno, Li Wan, "Attention-Based Models for Text-Dependent Speaker Verification", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
[[PDF](https://arxiv.org/pdf/1710.10470.pdf)]
[[poster](https://sigport.org/documents/attention-based-models-text-dependent-speaker-verification)]

* Alejandro Luebs, Bastiaan Kleijn, Felicia Lim, Florian Stimberg, Jan Skoglund, **Quan Wang**, Thomas Walters, "Wavenet Based Low-Rate Speech Coding", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
[[PDF](https://arxiv.org/pdf/1712.01120.pdf)]
[[poster](https://wangquan.me/files/research/icassp2018_wavenet_poster.pdf)]

* **Quan Wang**, Xinchi Zhang, Kim L. Boyer, "3D Scene Estimation with Perturbation-Modulated Light and Distributed Sensors", 10th IEEE Workshop on Perception Beyond the Visible Spectrum (PBVS). (ORAL)
[[PDF](https://github.com/wq2012/COSBOS/blob/master/Papers/scene_PBVS_2014.pdf)]

* **Quan Wang**, Yan Ou, A. Agung Julius, Kim L. Boyer and Min Jun Kim, "Tracking Tetrahymena Pyriformis Cells using Decision Trees", 21st International Conference on Pattern Recognition (ICPR), Pages 1843-1847, 11-15 Nov. 2012.
[[PDF](https://github.com/wq2012/DecisionForest/blob/master/documentation/cell_tracking_ICPR_2012.pdf)]
[[shotgun](https://github.com/wq2012/DecisionForest/blob/master/documentation/cell_tracking_shotgun.pdf)]
[[poster](https://github.com/wq2012/DecisionForest/blob/master/documentation/cell_tracking_poster.pdf)]
[[software](https://www.mathworks.com/matlabcentral/fileexchange/39110-decision-tree-and-decision-forest)]

* **Quan Wang**, Dijia Wu, Le Lu, Meizhu Liu, Kim L. Boyer, and Shaohua Kevin Zhou, "Semantic Context Forests for Learning-Based Knee Cartilage Segmentation in 3D MR Images", MICCAI 2013: Workshop on Medical Computer Vision. (ORAL)
[[PDF](https://github.com/wq2012/DecisionForest/blob/master/documentation/miccai_mcv_2013.pdf)]
[[poster](https://github.com/wq2012/DecisionForest/blob/master/documentation/miccai_mcv_poster.pdf)]
[[slides](https://github.com/wq2012/DecisionForest/blob/master/documentation/miccai_mcv_slides.pdf)]
[[software](https://www.mathworks.com/matlabcentral/fileexchange/39110-decision-tree-and-decision-forest)]

* **Quan Wang**, Xin Shen, Meng Wang, Kim L. Boyer, "Label Consistent Fisher Vectors for Supervised Feature Aggregation", 22nd International Conference on Pattern Recognition (ICPR), 2014.
[[PDF](https://github.com/wq2012/LCFV/blob/master/documentation/LCFV_ICPR_2014.pdf)]
[[poster](https://github.com/wq2012/LCFV/blob/master/documentation/LCFV_ICPR_2014.pdf)]
[[software](https://www.mathworks.com/matlabcentral/fileexchange/47730-label-consistent-fisher-vectors-lcfv)]
[[demo](https://www.youtube.com/watch?v=GTSMONLaRAg)]

* **Quan Wang**, Xinchi Zhang, Meng Wang, Kim L. Boyer, "Learning Room Occupancy Patterns from Sparsely Recovered Light Transport Models", 22nd International Conference on Pattern Recognition (ICPR), 2014. (ORAL)
[[PDF](https://github.com/wq2012/COSBOS/blob/master/Papers/ERC_ICPR_2014.pdf)]

* **Quan Wang**, Kim L. Boyer, "Feature Learning by Multidimensional Scaling and its Applications in Object Recognition", 26th SIBGRAPI Conference on Graphics, Patterns and Images (Sibgrapi). IEEE, 2013. (ORAL)
[[PDF]https://github.com/wq2012/SimpleMatrix/blob/master/documentation/MDS_SIBGRAPI_2013.pdf)]
[[slides](https://wangquan.me/files/research/MDS_feature_learning_slides.pdf)]
[[software](https://www.mathworks.com/matlabcentral/fileexchange/42261-efficient-multidimensional-scaling-mds)]

* Tanveer Syeda-Mahmood, **Quan Wang**, Patrick McNeillie, David Beymer, Colin Compas, "Discriminating Normal and Abnormal Left Ventricular Shapes in Four-Chamber View 2D Echocardiography", International Symposium on Biomedical Imaging (ISBI), 2014.

* **Quan Wang**, Yu Wang, Zuoguan Wang, "Online Smart Face Morphing Engine with Prior Constraints and Local Geometry Preservation", International Workshop on Multimodal pattern recognition of social signals in human computer interaction (MPRSS 2014). (ORAL)
[[PDF](https://wangquan.me/files/research/facewarping_MPRSS_2014.pdf)]

* Xinchi Zhang, **Quan Wang**, Kim L. Boyer, "Illumination Adaptation with Rapid-Response Color Sensors", SPIE Optical Engineering + Applications, 2014. (ORAL)
[[PDF](https://github.com/wq2012/COSBOS/blob/master/Papers/sensor_SPIE_2014.pdf)]

### Technical Reports and Theses

* Gemini Team, "Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context", 2024. [[PDF](https://arxiv.org/abs/2403.05530)]

* Jin Shi, **Quan Wang**, Yeming Fang, Gang Feng, Zhengying Chen, Jason Pelecanos, Ignacio Lopez Moreno,
Andrea Chu, Pedro Moreno Mengibar, "Utterance Augmentation for Speaker Recognition", Technical Disclosure Commons, Defensive Publications Series, 2020.
[[link](https://www.tdcommons.org/dpubs_series/3238/)]
[[PDF](https://www.tdcommons.org/cgi/viewcontent.cgi?article=4311&context=dpubs_series)]

* **Quan Wang**, Yiran Mao , "Learning Better Font Slicing Strategies from Data", Technical Disclosure Commons, Defensive Publications Series, 2017.
[[link](https://www.tdcommons.org/dpubs_series/906/)]
[[PDF](https://google.github.io/speaker-id/publications/MLFont/resources/Learning%20Better%20Font%20Slicing%20Strategies%20from%20Data.pdf)]
[[wiki](https://google.github.io/speaker-id/publications/MLFont/)]

* Philip Andrew Mansfield, **Quan Wang**, Carlton Downey, Li Wan, Ignacio Lopez Moreno, "Links: A High-Dimensional Online Clustering Method", arXiv:1801.10123 [*stat.ML*], 2018.
[[PDF](https://arxiv.org/pdf/1801.10123.pdf)]

* **Quan Wang**, "GMM-Based Hidden Markov Random Field for Color Image and 3D Volume Segmentation", arXiv:1212.4527 [*cs.CV*], 2012.
[[PDF](https://arxiv.org/pdf/1212.4527.pdf)]

* **Quan Wang**, "HMRF-EM-image: Implementation of the Hidden Markov Random Field Model and its Expectation-Maximization Algorithm", arXiv:1207.3510 [*cs.CV*], 2012.
[[PDF](https://arxiv.org/pdf/1207.3510.pdf)]

* **Quan Wang**, "Kernel Principal Component Analysis and its Applications in Face Recognition and Active Shape Models", arXiv:1207.3538 [*cs.CV*], 2012.
[[PDF](https://arxiv.org/pdf/1207.3538.pdf)]

* **Quan Wang**, "Exploiting Geometric and Spatial Constraints for Vision and Lighting Applications", Rensselaer Polytechnic Institute Ph.D. dissertation, 2014.

* **Quan Wang**, "Implementation and Study of Light-Field-Based 3D Object Retrieval System", Tsinghua University Undergraduate Thesis, 2010.
[[PDF](https://wangquan.me/files/research/quan_thesis_thu.pdf)]
[[poster](https://wangquan.me/images/research/LF3DR.jpg)]
[[demo](https://www.youtube.com/watch?v=cqpDEDbjTL8)]

### Acknowledged by

* Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang Wei Han Ankur Bapna Michiel Bacchiani, "Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations", [cs.SD]. [[demo](https://google.github.io/df-conformer/miipher/index.html)] [[PDF](https://arxiv.org/abs/2303.01664)]

* Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz, "Translatotron 2: Robust direct speech-to-speech translation", arXiv preprint arXiv:2107.08661 [cs.CL]. [[PDF](https://arxiv.org/pdf/2107.08661.pdf)]

* Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey, "End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings", arXiv:2105.02096 [cs.SD]. [[PDF](https://arxiv.org/pdf/2105.02096.pdf)]

* ASVspoof 2019: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan, 2019. [[PDF](http://www.asvspoof.org/asvspoof2019/asvspoof2019_evaluation_plan.pdf)]

* Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu, "Direct speech-to-speech translation with a sequence-to-sequence model", arXiv:1904.06037 [cs.CL].
[[PDF](https://arxiv.org/pdf/1904.06037.pdf)]

* Aonan Zhang, "Composing Deep Learning and Bayesian Nonparametric Methods", Ph.D. Dissertation, 2019. [[PDF](https://academiccommons.columbia.edu/doi/10.7916/d8-wz1q-6892)]

* Yu Wang, "A broadly applicable three-dimensional neuron reconstruction framework based on deformable models and software system with parallel GPU implementation". Ph.D. Dissertation, 2011.

## Patents

* **Quan Wang**, Dijia Wu, Meizhu Liu, Le Lu, Kevin Shaohua Zhou, [Automatic spatial context based multi-object segmentation in 3D images](https://patents.google.com/patent/US20140161334A1)
[[PDF](https://patentimages.storage.googleapis.com/f9/87/1f/ceafa7923c90a4/US20140161334A1.pdf)]

* **Quan Wang**, David Beymer, Patrick McNeillie, Tanveer Syeda-Mahmood, [Discriminating between normal and abnormal left ventricles in echocardiography](https://patents.google.com/patent/US9436995B2)
[[PDF](https://patentimages.storage.googleapis.com/47/a2/36/277b3561f12096/US9436995.pdf)]

* **Quan Wang**, Xinchi Zhang, Kim L. Boyer, [Occupancy sensing smart lighting systems](https://patents.google.com/patent/US9907138B2/)
[[PDF](https://patentimages.storage.googleapis.com/aa/1d/cc/92a2676ee09129/US9907138.pdf)]

* **Quan Wang**, Thibaud Senechal, Daniel Makoto Willenson, Shuang Wu, Yue Liu, Shiv Naga Prasad Vitaladevuni, David Paul Ramos, Qingfeng Yu, [Text detection using features associated with neighboring glyph pairs](https://patents.google.com/patent/US9367736B1)
[[PDF](https://patentimages.storage.googleapis.com/66/73/b8/25575a9c978818/US9367736.pdf)]

* **Quan Wang**, Ignacio Lopez Moreno, Li Wan, [Improving speaker verification across locations, languages, and/or dialects](https://patents.google.com/patent/US10403291B2/)
[[PDF](https://patentimages.storage.googleapis.com/3d/bb/f8/d13df0c9ae5e5a/US10403291.pdf)]

* **Quan Wang**, Hasim Sak, Ignacio Lopez Moreno, Alan Sean Papir, Li Wan, [Neural Networks for Speaker Verification](https://patents.google.com/patent/WO2019027531A1/)
[[PDF](https://patentimages.storage.googleapis.com/8a/34/97/b362cbcd9cff22/WO2019027531A1.pdf)]

* **Quan Wang**, Yash Sheth, Ignacio Lopez Moreno, Li Wan, [Speaker diarization using an end-to-end model](https://patents.google.com/patent/WO2019209569A1)
[[PDF](https://patentimages.storage.googleapis.com/b0/86/55/4aa9f9bb5ee517/WO2019209569A1.pdf)]

* **Quan Wang**, Ye Jia, Zhifeng Chen, Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Ignacio Lopez Moreno, Fei Ren, Yu Zhang, Patrick An Phu Nguyen, [Synthesis of speech from text in a voice of a target speaker using neural networks](https://patents.google.com/patent/WO2019222591A1) [[PDF](https://patentimages.storage.googleapis.com/64/70/a2/267ac141ed1688/WO2019222591A1.pdf)]

* **Quan Wang**, Pu-sen Chao, Diego Melendo Casado, Ignacio Lopez Moreno, [Text independent speaker recognition](https://patents.google.com/patent/WO2020117639A2/) [[PDF](https://patentimages.storage.googleapis.com/d4/c5/c6/3e29f6698d5841/WO2020117639A2.pdf)]

* **Quan Wang**, Prashant Sridhar, Ignacio Lopez Moreno, Hannah Muckenhirn, [TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING](https://www.freepatentsonline.com/y2020/0202869.html) [[PDF](https://www.freepatentsonline.com/20200202869.pdf)]

* **Quan Wang**, Chong Wang, Aonan Zhang, Zhenyao Zhu, [Fully Supervised Speaker Diarization](https://patents.google.com/patent/US20200219517A1/) [[PDF](https://patentimages.storage.googleapis.com/3c/a6/b2/d498594ee0b168/US20200219517A1.pdf)]

## Academic Service

### Reviewing

Journals:
* [Neural Networks](https://www.sciencedirect.com/journal/neural-networks)
* [IEEE Signal Processing Letters](https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=97)
* [IEEE Transactions on Audio, Speech and Language Processing](https://signalprocessingsociety.org/publications-resources/ieeeacm-transactions-audio-speech-and-language-processing)
* [Computer Speech & Language](https://www.sciencedirect.com/journal/computer-speech-and-language)
* [EURASIP Journal on Image and Video Processing](https://jivp-eurasipjournals.springeropen.com/)
* [Artificial Intelligence Review](https://link.springer.com/journal/10462)
* [Journal of Signal Processing Systems](https://link.springer.com/journal/11265)
* [Computer Communication & Collaboration](http://www.bapress.ca/ccc.php)

Conferences:

* IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
* IEEE Spoken Language Technology Workshop (SLT), 2021, 2022
* International Joint Conference on Artificial Intelligence (IJCAI) 2019
* VISAPP International Conference on Computer Vision Theory and Applications 2014
* SIBGRAPI Conference on Graphics, Patterns, and Images 2013
* SIBGRAPI Conference on Graphics, Patterns, and Images 2014

### Other

* [**Senior member** - IEEE](https://raw.githubusercontent.com/wq2012/CurriculumVitae/master/resources/IEEE_Senior_Member_Plaque_QuanWang.jpg)
* **Session chair** - [Interspeech 2021](https://www.interspeech2021.org/): Language and Accent Recognition
* **Interviewee** - [Interspeech 2020 Tutorial](http://www.interspeech2020.org/Program/Tutorials/): Neural Models for Speaker Diarization in the Context of Speech Recognition

## Teaching and Mentoring

### Online Courses

* Udemy (English): [Speaker Recognition | By Award Winning Textbook Author](https://www.udemy.com/course/speaker-recognition/?referralCode=1914766AF241CE15D19A)
* Udemy (English): [A Tutorial on Speaker Diarization
](https://www.udemy.com/course/diarization/?referralCode=21D7CC0AEABB7FE3680F)
* 机器之心 (Chinese): [声纹识别:从理论到编程实战](https://jmq.xet.tech/s/4j70ZU)

### Students

* Wei Xia, 2021 Google summer intern & 2021 Google Student Researcher, Ph.D.
* Shaojin Ding, 2019 & 2020 Google summer intern, Ph.D.
* Aonan Zhang, 2018 Google summer intern & 2018 Google Student Researcher, Ph.D.
* Hannah Muckenhirn, 2018 Google summer intern, Ph.D.
* F A Rezaur Rahman Chowdhury, 2017 Google summer intern, Ph.D. (cohost)
* Carlton Downey, 2017 Google summer intern, Ph.D.
* Xinchi Zhang, 2013-2014 undergraduate student at Smart Lighting Engineering Research Center

### Teaching Assistant

* 2011/01 – 2012/12, Teaching Assistant, **Rensselaer Polytechnic Institute**, Troy NY, USA
* Spring 2011, Embedded Control [ENGR 2350], by Prof. Russell P. Kraft
* Spring 2011, Real-Time Applications in Control & Communications [ECSE 4760], by Prof. Russell P. Kraft
* Fall 2011, Introduction to Engineering Analysis [ENGR 1100], by Prof. Mark W. Olles
* Spring 2012, Electric Circuits [ECSE 2010], by Prof. Jeffrey Braunstein
* Spring 2012, Biological Image Analysis [ECSE 4960], by Dr. Jens Rittscher and Dr. Dirk Padfield
* Fall 2012, Modeling and Analysis of Uncertainty [ENGR 2600], by Prof. Charles J. Malmborg