Keynote Speakers

Professor Steve Renals

steverenals

University of Edinburgh
United Kingdom

Wednesday 23 November, 11:00 to 12:00
Auditorium, IST Congress Center

 

 

Multi Genre Broadcast Speech Recognition
Abstract: Rich transcription and diarization of bropadcast speech has been well-studied since the mid 1990s.  However much of the work has been limited in domain, often focussing on broadcast news.  To support the automatic transcription of broadcast content across the full range of genres we developed the MGB (Multi-Genre Broadcast) Challenge, a controlled set of evaluations of speech recognition, speaker diarization, and lightly supervised alignment in English (using BBC TV recordings) and multi-dialect Arabic (using recordings from Al-Jazeera).
In this talk I’ll discuss the MGB Challenge including issues raised by training speech recognition systems from lightly supervised data, the evaluation conditions, and an overview of the features of the best-performing systems on these tasks.  I’ll also outline some applications in subtitling and media monitoring,  and discuss some current research challenges in rich transcription.

Speaker Bio: Steve Renals is professor of Speech Technology in the Centre for Speech Technology Research at the University of Edinburgh.  He received a BSc in Chemistry from the University of Sheffield in 1986, an MSc in Artificial Intelligence from the University of Edinburgh in 1987, and a PhD in Speech Recognition and Neural Networks, also from Edinburgh, in 1990.  From 1991-92 he was a postdoctoral fellow at the International Computer Science Institute, Berkeley, and was then an EPSRC fellow in Information Engineering at the University of Cambridge (1992-94).  From 1994-2003 he was lecturer, then reader, in Computer Science at the University of Sheffield, moving to Edinburgh in 2003.
His main research interests are in speech recognition and spoken language processing, and he has about 250 publications in these areas, with a long-standing interest in neural network acoustic modelling.  Current interests include multi-genre broadcast speech recognition and distant speech recognition.  He coordinates the EU SUMMA project which is concerned with multilingual media monitoring, and was coordinator of the UK EPSRC Natural Speech Technology programme.  He is a senior area editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing and a fellow of the IEEE.

 Professor Elmar Nöth

ElmarPhoto

Pattern Recognition Lab
Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
Erlangen, Germany

Thursday 24 November, 11:00 to 12:00
Auditorium, IST Congress Center

 

Remote Monitoring of Neurodegeneration through Speech

Abstract: In this talk we will report on the results of the workshop on “Remote Monitoring of Neurodegeneration through Speech”, which was part of the “Third Frederick Jelinek Memorial Summer Workshop” (http://www.clsp.jhu.edu/workshops/16-workshop/remote-monitoring-of-neurodegeneration-through-speech/) and took place at Johns Hopkins University in Baltimore, USA from June 13th to August 5th, 2016. We will concentrate on Colombian-Spanish multi-modal data from people with Parkinson’s disease that contain speech, gait, and hand-writing data.

Speaker Bio: Elmar Nöth is a professor for Applied Computer Science at the University of Erlangen-Nuremberg. He studied in Erlangen and at M.I.T. and received the Dipl.-Inf. and the Dr.-Ing. degree from the University of Erlangen-Nuremberg in 1985 and 1990, respectively. Since 1990 he was an assistant professor at the Institute for Pattern Recognition in Erlangen. Since 2008 he is a full professor at the same institute and head of the speech group. He is one of the founders of the Sympalog Company, which markets conversational dialogue systems. He is author or co-author of more than 350 articles. His current interests are prosody, analysis of pathologic speech, computer aided language learning and emotion analysis.

 Professor Bhuvana Ramabhadran

bhuv

IBM Research
New York, US

Friday 25 November, 11:00 to 12:00
Auditorium, IST Congress Center

 

Deep Learning for processing low resource languages

Abstract: Keyword search, localizing an orthographic query in a speech corpus, is typically performed through analysis of automatic speech recognition (ASR). The IARPA funded Babel program focusses on the rapid development of speech recognition capability for keyword search in a previously unstudied language, working with speech recorded in a variety of conditions with limited amounts of transcription. In this talk, I will focus on the impact of several ideas in deep learning on the Babel task from a speech recognition and key word search perspective. First, I will address the derivation of multilingual (ML) representations from over 24 languages, speeding up this derivation using subsets of languages, different deep network architectures such as, DNNs, CNNs, LSTMs and VGG-net inspired convolution networks, and offer insights on the impact of these diverse ML representations on speech recognition performance. Next, “end-to-end” speech recognition systems have recently emerged as viable alternatives to traditional ASR frameworks. I will present two end-to-end ASR systems for keyword search: Connectionist Temporal Classification (CTC) networks, recurrent encoder-decoders with attention (attention models). While these end-to-end systems can generate high quality one-best transcripts on low-resource languages, their utility is limited for lattice based key word search. However, we can use these models as feature extractors for use in a DNN-based ASR system and realize gains in keyword search performance. Lastly, given the non-convex nature of the loss function used in training neural networks, their performance depends very much on the starting point, as well as other factors, such as the type of input features, batch randomization and the type of non-linearity. I will address these issues and their impact on speech recognition and keyword search performance.

Speaker Bio: Bhuvana Ramabhadran is a Research Staff Member and Manager at IBM’s TJ Watson Research Center, where she has been working since 1995. Currently, she manages a team of researchers in the Speech Recognition and Synthesis Research Group and co-ordinate research activities across IBM’s world-wide research labs in China, Tokyo, Prague and Haifa. She is also an adjunct professor at Columbia University.
She is currently serving on the Speech and Language Technical Committee (SLTC) of the IEEE. She is also a senior member of the IEEE, served on the editorial board of Computers, Speech and Language, and a member of ACL. She has published over 150 papers and been granted over 20 U.S. patents. Her research interests include speech recognition and synthesis algorithms, statistical modeling, signal processing, pattern recognition and machine learning.