All times are displayed in Eastern Daylight Time (UTC -4)
Monday, June 7
10:00 – 13:30 | Tutorial
Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization
Presenters: Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda, Shinji Watanabe
18:00 – 19:00
Young Professionals Panel Discussion
Moderator: Subhro Das
Panelists: Sabrina Rashid, Vanessa Testoni, Hamid Palangi
Tuesday, June 8
13:00 – 13:45 | Speech Synthesis 1: Architecture
Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search (opens in new tab)
Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li (opens in new tab), Sheng Zhao (opens in new tab), Enhong Chen, Tie-Yan Liu
13:00 – 13:45 | Speech Synthesis 1: Architecture
A New High Quality Trajectory Tiling Based Hybrid TTS In Real Time (opens in new tab)
Feng-Long Xie, Xin-Hui Li, Wen-Chao Su, Li Lu, Frank K. Soong
13:00 – 13:45 | Language Modeling 1: Fusion and Training for End-to-End ASR
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition
Zhong Meng, Naoyuki Kanda, Yashesh Gaur (opens in new tab), Sarangarajan Parthasarathy, Eric Sun, Liang Lu (opens in new tab), Xie Chen, Jinyu Li, Yifan Gong
13:00 – 13:45 | Audio and Speech Source Separation 1: Speech Separation (opens in new tab)
Session Chair: Zhuo Chen
Rethinking The Separation Layers In Speech Separation Networks (opens in new tab)
Yi Luo, Zhuo Chen, Cong Han, Chenda Li, Tianyan Zhou (opens in new tab), Nima Mesgarani
13:00 – 13:45 | Deep Learning Training Methods 3 (opens in new tab)
Session Chair: Jinyu Li
13:00 – 13:45 | Brain-Computer Interfaces
Wenkang An, Barbara Shinn-Cunningham, Hannes Gamper, Dimitra Emmanouilidou, David Johnston, Mihai Jalobeanu, Edward Cutrell, Andrew Wilson, Kuan-Jung Chiang, Ivan Tashev
14:00 – 14:45 | Speech Enhancement 1: Speech Separation (opens in new tab)
Session Chair: Takuya Yoshioka
Dual-Path Modeling for Long Recording Speech Separation in Meetings (opens in new tab)
Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou (opens in new tab), Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian
14:00 – 14:45 | Speech Enhancement 1: Speech Separation
Continuous Speech Separation with Conformer (opens in new tab)
Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu (opens in new tab), Jinyu Li, Takuya Yoshioka, Chengyi Wang (opens in new tab), Shujie Liu, Ming Zhou (opens in new tab)
14:00 – 14:45 | Speech Enhancement 2: Speech Separation and Dereverberation (opens in new tab)
Session Chair: Takuya Yoshioka
14:00 – 14:45 | Speaker Recognition 1: Benchmark Evaluation
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020
Xiong Xiao (opens in new tab), Naoyuki Kanda, Zhuo Chen, Tianyan Zhou (opens in new tab), Takuya Yoshioka, Sanyuan Chen (opens in new tab), Yong Zhao (opens in new tab), Gang Liu (opens in new tab), Yu Wu, Jian Wu (opens in new tab), Shujie Liu, Jinyu Li, Yifan Gong
14:00 – 14:45 | Dialogue Systems 2: Response Generation
Topic-Aware Dialogue Generation with Two-Hop Based Graph Attention (opens in new tab)
Shijie Zhou, Wenge Rong, Jianfei Zhang, Yanmeng Wang, Libin Shi (opens in new tab), Zhang Xiong
16:30 – 17:15 | Speech Recognition 4: Transformer Models 2
Xie Chen, Yu Wu, Zhenghao Wang (opens in new tab), Shujie Liu, Jinyu Li
16:30 – 17:15 | Active Noise Control, Echo Reduction, and Feedback Reduction 2: Active Noise Control and Echo Cancellation (opens in new tab)
Session Chair: Hannes Gamper
Kusha Sridhar, Ross Cutler (opens in new tab), Ando Saabas (opens in new tab), Tanel Parnamaa, Markus Loide (opens in new tab), Hannes Gamper, Sebastian Braun, Robert Aichner, Sriram Srinivasan (opens in new tab)
16:30 – 17:15 | Learning (opens in new tab)
Session Chair: Zhong Meng
Sequence-Level Self-Teaching Regularization (opens in new tab)
Eric Sun, Liang Lu (opens in new tab), Zhong Meng, Yifan Gong
Wednesday, June 9
13:00 – 13:45 | Language Understanding 1: End-to-end Speech Understanding 1
Speech-Language Pre-Training for End-to-End Spoken Language Understanding
Yao Qian, Ximo Bian, Yu Shi, Naoyuki Kanda, Leo Shen, Zhen Xiao (opens in new tab), Michael Zeng
13:00 – 13:45 | Audio and Speech Source Separation 4: Multi-Channel Source Separation
Ali Aroudi, Sebastian Braun
14:00 – 14:45 | Speech Enhancement 4: Multi-channel Processing
Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu
14:00 – 14:45 | Matrix Factorization and Applications
Cold Start Revisited: A Deep Hybrid Recommender with Cold-Warm Item Harmonization (opens in new tab)
Oren Barkan, Roy Hirsch (opens in new tab), Ori Katz, Avi Caciularu (opens in new tab), Yoni Weill, Noam Koenigstein (opens in new tab)
14:00 – 14:45 | Biological Image Analysis
CMIM: Cross-Modal Information Maximization For Medical Imaging (opens in new tab)
Tristan Sylvain, Francis Dutil, Tess Berthier, Lisa Di Jorio, Margaux Luck, Devon Hjelm, Yoshua Bengio
15:30 – 16:15 | Speech Recognition 8: Multilingual Speech Recognition
Amit Das (opens in new tab), Kshitiz Kumar (opens in new tab), Jian Wu (opens in new tab)
15:30 – 16:15 | Quality and Intelligibility Measures
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network (opens in new tab)
Yichong Leng, Xu Tan, Sheng Zhao (opens in new tab), Frank K. Soong, Xiang-Yang Li, Tao Qin
15:30 – 16:15 | Quality and Intelligibility Measures
Crowdsourcing Approach for Subjective Evaluation of Echo Impairment (opens in new tab)
Ross Cutler (opens in new tab), Babak Nadari, Markus Loide (opens in new tab), Sten Sootla (opens in new tab), Ando Saabas (opens in new tab)
16:30 – 17:15 | Speech Recognition 9: Confidence Measures (opens in new tab)
Session Chair: Yifan Gong
16:30 – 17:15 | Speech Recognition 10: Robustness to Human Speech Variability (opens in new tab)
Session Chair: Yifan Gong
16:30 – 17:15 | Speech Processing 2: General Topics
Chandan K A Reddy, Vishak Gopal, Ross Cutler (opens in new tab)
16:30 – 17:15 | Style and Text Normalization
Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu, Hong Qu, Michael Zeng
16:30 – 17:15 | Modeling, Analysis and Synthesis of Acoustic Environments 3: Acoustic Analysis
Ziqi Fan, Vibhav Vineet, Chenshen Lu, T.W. Wu, Kyla McMullen
Thursday, June 10
13:00 – 13:45 | Speech Recognition 11: Novel Approaches
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR
Naoyuki Kanda, Zhong Meng, Liang Lu (opens in new tab), Yashesh Gaur (opens in new tab), Xiaofei Wang, Zhuo Chen, Takuya Yoshioka
13:00 – 13:45 | Speech Synthesis 5: Prosody & Style
Speech Bert Embedding for Improving Prosody in Neural TTS (opens in new tab)
Liping Chen (opens in new tab), Yan Deng (opens in new tab), Xi Wang (opens in new tab), Frank K. Soong, Lei He (opens in new tab)
13:00 – 13:45 | Speech Synthesis 6: Data Augmentation & Adaptation
Adaspeech 2: Adaptive Text to Speech with Untranscribed Data (opens in new tab)
Yuzi Yan, Xu Tan, Bohan Li, Tao Qin, Sheng Zhao (opens in new tab), Yuan Shen, Tie-Yan Liu
14:00 – 14:45 | Speech Enhancement 5: DNS Challenge Task (opens in new tab)
Session Chair: Chandan K A Reddy
ICASSP 2021 Deep Noise Suppression Challenge (opens in new tab)
Chandan K A Reddy, Harishchandra Dubey (opens in new tab), Vishak Gopal, Ross Cutler (opens in new tab), Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan (opens in new tab)
14:00 – 14:45 | Speech Enhancement 6: Multi-modal Processing (opens in new tab)
Session Chair: Chandan K A Reddy
14:00 – 14:45 | Graph Signal Processing
Fast Hierarchy Preserving Graph Embedding via Subspace Constraints
Xu Chen, Lun Du, Mengyuan Chen, Yun Wang, QingQing Long, Kunqing Xie
15:30 – 16:15 | Speech Recognition 13: Acoustic Modeling 1
Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings
Xuankai Chang, Naoyuki Kanda, Yashesh Gaur (opens in new tab), Xiaofei Wang, Zhong Meng, Takuya Yoshioka
15:30 – 16:15 | Speech Recognition 14: Acoustic Modeling 2
Ensemble Combination between Different Time Segmentations (opens in new tab)
Jeremy Heng Meng Wong (opens in new tab), Dimitrios Dimitriadis, Kenichi Kumatani (opens in new tab), Yashesh Gaur (opens in new tab), George Polovets (opens in new tab), Partha Parthasarathy, Eric Sun, Jinyu Li, Yifan Gong
15:30 – 16:15 | Privacy and Information Security
Detection Of Malicious DNS and Web Servers using Graph-Based Approaches (opens in new tab)
Jinyuan Jia, Zheng Dong (opens in new tab), Jie Li (opens in new tab), Jack W. Stokes
16:30 – 17:15 | Language Assessment
Bin Su, Shaoguang Mao (opens in new tab), Frank K. Soong, Yan Xia, Jonathan Tien, Zhiyong Wu
16:30 – 17:15 | Signal Enhancement and Restoration 1: Deep Learning
Towards Efficient Models for Real-Time Deep Noise Suppression (opens in new tab)
Sebastian Braun, Hannes Gamper, Chandan K A Reddy, Ivan Tashev
16:30 – 17:15 | Signal Enhancement and Restoration 3: Signal Enhancement
Phoneme-Based Distribution Regularization for Speech Enhancement (opens in new tab)
Yajing Liu, Xiulian Peng, Zhiwei Xiong, Yan Lu
16:30 – 17:15 | Audio & Images (opens in new tab)
Session Chair: Ivan Tashev
Friday, June 11
1:30 – 12:15 | Speech Recognition 18: Low Resource ASR
MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition (opens in new tab)
Linghui Meng, Jin Xu, Xu Tan, Jindong Wang, Tao Qin, Bo Xu
11:30 – 12:15 | Speech Synthesis 7: General Topics
Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling
Chen Zhang, Yi Ren, Xu Tan, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao (opens in new tab), Tie-Yan Liu
13:00 – 13:45 | Speech Enhancement 8: Echo Cancellation and Other Tasks
Arun Asokan Nair, Kazuhito Koishida
13:00 – 13:45 | Speaker Diarization
Hidden Markov Model Diarisation with Speaker Location Information (opens in new tab)
Jeremy Heng Meng Wong (opens in new tab), Xiong Xiao (opens in new tab), Yifan Gong
13:00 – 13:45 | Detection and Classification of Acoustic Scenes and Events 5: Scenes
Cross-Modal Spectrum Transformation Network for Acoustic Scene Classification (opens in new tab)
Yang Liu, Alexandros Neophytou, Sunando Sengupta, Eric Sommerlade