Prof. SOONG, Frank K. 宋 謌 平 教授



Adjunct Professor
BS  (National Taiwan University)
MS (The University of Rhode Island)
PhD (Stanford University)

Research Interests :
 * Speech and Language Processing


Frank Soong is a Principal Researcher and Research Manager of the Speech Group. He received his B.S., M.S. and Ph.D., all in EE from the National Taiwan University, the University of Rhode Island and Stanford University, respectively. He joined Bell Labs Research, Murray Hill, NJ, USA in 1982, worked there for 20 years and retired as a Distinguished Member of Technical Staff in 2001. In Bell Labs, he had worked on various aspects of acoustics and speech processing, including: speech coding, speech and speaker recognition, stochastic modeling of speech signals, efficient search algorithms, discriminative training, dereverberation of audio and speech signals, microphone array processing, acoustic echo cancellation, hands-free noisy speech recognition. He was also responsible for transferring recognition technology from research to AT&T voice-activated cell phones which were rated by the Mobile Office Magazine as the best among competing products evaluated. He was the co-recipient of the Bell Labs President Gold Award for developing the Bell Labs Automatic Speech Recognition (BLASR) software package. He visited Japan twice as a visiting researcher: first from 1987 to 1988, to the NTT Electro-Communication Labs, Musashino, Tokyo; then from 2002-2004, to the Spoken Language Translation Labs, ATR, Kyoto. In 2004, he joined Microsoft Research Asia (MSRA), Beijing, China to lead the Speech Research Group. He is a visiting professor of the Chinese University of Hong Kong (CUHK) and the co-director of CUHK-MSRA Joint Research Lab, recently promoted to a National Key Lab of Ministry of Education, China. He was the co-chair of the 1991 IEEE International Arden House Speech Recognition Workshop. He has served the IEEE Speech and Language Processing Technical Committee of the Signal Processing Society, as a committee member and associate editor of the Transactions of Speech and Audio Processing. He published extensively and coauthored more than 200 technical papers in the speech and signal processing fields. He is an IEEE Fellow.


Selected Publications

Wenping Hu, Yao Qian, Frank K. Soong and Wang Yong, “Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers,” Speech Communication 67, pp.154-166, 2015

Yao Qian, Frank K. Soong and Zhi-jie Yan, "A Unified Trajectory Tiling Approach to High Quality Speech Rendering",  IEEE Trans on Audio, Speech and Language Processing, Vol 21, 2, pp.280-290, 2013

Lijuan Wang, Yao Qian, Matthew Scott, Gang Chen and Frank K. Soong, “Computerized Audio-Visual Language Learning", Computer, Vol 45, 6, pp.38-47, 2012

Yao Qian, Zhi-Zheng Wu, Bo-Yang Gao and  Frank K. Soong, “Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units”, IEEE Transactions on Audio, Speech and Language Processing, Vol 19, 6, pp. 1702-1710, 2011

Bo Fan, Lijuan Wang, Frank K. Soong, and Lei Xie, “Photo-real Talking Head with Deep Bidirectional LSTM," ICASSP-2015

Yao Qian, Yuchen Fan, Wenping Hu and Frank K. Soong, "On the Training Aspects of Deep Neural Network (DNN) for Parametric TTS Synthesis", ICASSP-2014

Wenping Hu, Yao Qian and Frank K. Soong, "A DNN-based Acoustic Modeling of Tonal Language and Its Application to Mandarin Pronunciation Training", ICASSP-2014

Yuchen Fan, Yao Qian, Fenglong Xie and Frank. K, Soong, "TTS Synthesis with Bidirectional LSTM based Recurrent Neural Networks", Interspeech-2014

Xie Fenglong, Qian Yao, Fan Yuchen, Soong Frank. K and Li Haifeng , "Sequence Error (SE) Minimization Training of Neural Network for Voice Conversion", Interspeech-2014

JeeSok Lee, Frank K. Soong, Hong-Goo Kang, “Source-Filter based Full-band Adaptive Harmonic Model and Its Application to Prosody Modification,” Interspeech 2013

Chaoyang Wang, Lijuan Wang, Yasuyuki Matsushita, Frank K. Soong,“Binocular Photometric Stereo Acquisition and Reconstruction for 3D Talking Head Applications,” Interspeech 2013

Xinjian Zhang, Lijuan Wang, Gang Li, Frank Seide, Frank K. Soong, “A New Language Independent, Photo-realistic Talking Head Driven by Voice Only,” Interspeech 2013