Symposium for Celebrating 40 Years of Bayesian Learning in Speech and Language Processing and Beyond

Taipei, IEEE ASRU 2023 Workshop Satellite Event, December 20th, 2023



 

Introduction

The Bayes Rule was published in 1763 stating a simple theorem: P(A|B)=P(B|A)P(A)/P(B) which generated a long-lasting influence on many statistical inference applications. Since the first Bayesian learning speech paper that was published in ICASSP1983 [1], we have witnessed quite a few studies in the next 20 years on extending Bayesian learning to maximum a posteriori (MAP) estimation of hidden Markov model (HMM) [2-3]. Online adaptation of HMM and correlated HMM [4-5] have followed. Next, the popular maximum likelihood linear regression (MLLR) adaptation approach was formulated as MAPLR and extended to joint estimation [6]. To handle unseen units, structural MAP (SMAP) was developed [7] and extended to SMAPLR [8]. Online adaptation is often referred to as temporal prior evolution while tree-based SMAP was known as spatial prior evolution. In contrast to MAP, variational Bayesian [9] and Bayesian predictive classification [10] approaches have also been developed to extend as an alternative to point MAP estimation. A review of Bayesian learning for speech and language processing can be found in [11], while a book on variational Bayesian learning theory was also published [12]. More recently, Bayesian learning has been extended to handling DNN parameters [13-15]. We expect this direction to be extensively studied in the future, especially in the modern era of generative AI and large pre-trained models in which transfer learning becomes a viable tool to adapt general-purpose models to specific domains and applications.

Tentative Schedule

A total of four and half hours after ASRU2023 at 14:00-18:30 on December 20

  • Plenary Speaker: Chin-Hui Lee (30 minutes)
  • Six Invited Speakers (Key contributions to Bayesian Learning in speech and language processing in the last 40 years): Qiang Huo, Torbjorn Svendsen (or Olivier Siohan), Shinji Watanabe, Koichi Shinoda, Jen-Tzung Chien, Marco Siniscalchi (15 minutes each for a total of 90 minutes)
  • Panel Discussion: all seven invited speakers as panelists (30 minutes)
  • Break and Social Discussions (30 minutes)
  • Poster Session: 12-15 posters (90 minutes)
  • Workshop Dinner: hosted by the Organizer right after rhe Symposium 

 

14:00 Historical Perspective & Beyond C.-H. Lee
14:30 Online and Correlated HMMs Q. Huo
14:45 Joint MAP of LR and HMMs T. K. Svendsen
15:00 Variational Bayesian Learning S. Watanabe
15:15 Structural MAP for LR & HMMs K. Shinoda
15:30 MAP for N-grams and Beyond J.-T. Chien
15:45 MAP for DNN Parameters S.M. Siniscalchi
16:00 Panel Discussion All 7 speakers
16:30 Break  
17:00 Poster Contributions All participants
18:30 Closing  

Honorary Committee Chair

Chin-Hui Lee
Georgia Institute of Technology

Invited Speakers

Qiang Huo
Microsoft Research
Torbjorn Svendsen
Norwegian University of Science & Technology
Shinji Watanabe
Carnegie Mellon University
Koichi Shinoda
Tokyo Institute of Technology
Jen-Tzung Chien
National Yangming Chiaotung University
Sabato Marco Siniscalchi
Kore University of Enna

Organizers

Jinyu Li
Microsoft Research
Chao Zhang
Tsinghua University
Hsin-Min Wang
Academia Sinica
Yu Tsao
Academia Sinica

Submissions

All contributions to this Bayesian Celebration Workshop can be summarized in an abstract (limited to 200 words) to be published in the ASRU2023 Workshop Proceedings, 12-15 poster contributions with relevant topics to Bayesian Learning will be selected from submissions and reviewed by the Organizers. Presentation materials for each contribution, including an extended abstract or short paper with 1 to 3 pages, and a poster with references will be published on a symposium page hyperlinked to the ASRU website. Call for contributions will be sent to all potential participants and published in the ASRU website soon with a submission deadline of December 1st, 2023.

We welcome all submissions related to Bayesian learning, large models and generative models, including but not limited to:

  • Bayesian methods for machine learning and deep learning
  • In-context learning and generative models
  • Adaptation and few-shot learning for speech and language processing
  • Theory and parameter efficient tuning for large speech and language models
  • Multimodal intelligence across audio, text, and vision
Dual-submission policy - We welcome ongoing and unpublished work. We also include papers that are under review at the time of submission, or that have been recently accepted.
Submissions and accepted papers - Workshop submissions and reviews will be private. The camera-ready version of accepted papers will be shared on the workshop webpage. However, the hosting is not an official proceeding so the papers can be subsequently / concurrently submitted to other venues.
In-person presentation - Accepted extended abstracts and papers are expected to be presented in person. Online Presentation is exceptional based on visa difficulties.
Submission Deadline: December 1st, and acceptance will be notified within one week after submission.


Registration

Participants need to register separately from the main ASRU Workshop. A fee of USD$120 (covering Workshop, Proceedings, and Break) is required for registering for the Bayesian Symposium. ASRU participants are welcome to join this Celebration Workshop with an extra $100 (Satellite Workshop registration will be done separately from ASRU Workshop registration). Student registration can have a 50% discount leading to USD $60 and USD $50 for non-ASRU-participant-student and ASRU-participant-student respectively.


Accepted Presentations

  1. Multiple output samples per input in a single-output Gaussian process
  2. Bayesian adaptive learning to latent variables via Variational Bayes and Maximum a Posteriori
  3. Bayesian Example Selection for Speech-based In-Context Learning
  4. Speaker Adaptation for Quantised End-to-End ASR Models
  5. TS-HuBERT: Weakly-Supervised and Self-Supervised Speech Pre-Training for Target-Speaker Speech Processing
  6. Variational Inference-Based Dropout in Recurrent Neural Networks for Slot Filling in Spoken Language Understanding
  7. A PRELIMINARY STUDY ON ASSOCIATED LEARNING FOR ASR
  8. Fast Posterior Sampling for Conditional Diffusion Model
  9. Deep-Learning-Based Speech Enhancement with Maximum a Posteriori Spectral Amplitude Estimation
  10. Interpretable Unified Language Checking
  11. Maximum a Posteriori Adaptation of Network Parameters in Deep Models
  12. COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
  13. OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking

Selected References

  1. P. Brown, C.-H. Lee, and J. Spohrer, "Bayesian Adaptation in Speech Recognition," in Proc. ICASSP, Boston, 1983.
  2. C.-H. Lee, C.-H. Lin, and B.-H. Juang, "A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models," IEEE Transactions on Signal Processing, vol. 39, no. 4, pp. 806-814, 1991.
  3. J.-L. Gauvain and C.-H. Lee, "Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains," IEEE Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, 1994.
  4. Q. Huo, and C.-H. Lee "On-line Adaptive Learning of the Continuous Density Hidden Markov Model based on Approximate Recursive Bayes Estimate," IEEE Transactions on Speech and Audio Processing, vol. 5, no. 2, pp. 161-172, 1997.
  5. Q. Huo and C.-H. Lee, "On-line Adaptive Learning of the Correlated Continuous Density Hidden Markov Model for Speech Recognition," IEEE Transactions on Speech and Audio Processing, vol. 6, no. 4, pp. 386-397, 1998.
  6. O. Siohan, C. Chesta, and C.-H. Lee, "Joint Maximum A Posteriori Adaptation of Transformation and HMM Parameters," IEEE Transactions on Speech and Audio Processing, vol. 9, no. 4, pp. 417-428, 2001.
  7. K. Shinoda and C.-H. Lee, "A Structural Bayes Approach to Speaker Adaptation," IEEE Transactions on Speech and Audio Processing, vol. 9, no. 3, pp. 276-287, 2001.
  8. O. Siohan, T. A. Myrvoll, and C.-H. Lee, "Structural Maximum A Posteriori Linear Regression for HMM Adaptation," Computer Speech and Language, vol. 16, no. 1, pp. 5-24, 2002.
  9. S. Watanabe, Y. Minami, A. Nakamura, and N. Ueda, "Application of Variational Bayesian Approach to Speech Recognition," in Proc. NIPS, Vancouver, 2002.
  10. Q. Huo and C.-H. Lee, "A Bayesian Predictive Classification Approach to Robust Speech Recognition," IEEE Transactions on Speech and Audio Processing, vol. 8, no. 2, pp. 200-204, 2000.
  11. C.-H. Lee and Q. Huo, "On Adaptive Decision Rules and Decision Parameter Adaptation for Automatic Speech Recognition," Proceedings of the IEEE, vol. 88, no. 8, pp. 1241-1269, 2000.
  12. S. Nakajima, K. Watanabe, M. Sugiyama, Variational Bayesian Learning Theory, Cambridge University Press, 2019.
  13. Z. Huang, S. M. Siniscalchi, and C.-H. Lee, “A Unified Approach to Transfer Learning of Deep Neural Networks with Applications to Speaker Adaptation in Automatic Speech Recognition,” Neurocomputing, vol. 218, pp. 448-459, 2016.
  14. Z. Huang, S. M. Siniscalchi, and C.-H. Lee, “Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 1, pp. 64-75, 2017.
  15. Z. Huang, S. M. Siniscalchi, and C.-H. Lee, “Hierarchical Bayesian Combinations of Plug-in Maximum A Posteriori Decoders in Deep Neural Networks-based Speech Recognition and Speaker Adaptation,” Pattern Recognition Letters, vol. 98, pp. 1-7, 2017.
Website theme is modified and inspired from the VIGIL workshop Series by S. Florian et al.