SWS 2016 Speech Signal Processing Workshop

Speech Signal Processing Workshop is an annual event held by the Association for Computational Linguistics and Chinese Language Processing (ACLCLP). The meeting has featured distinguished experts and scholars from around the world, and this year’s invited speakers include:

Dr. Tomas Mikolov, Facebook AI Research lab

Prof. Björn W. Schuller, University of Passau, Germany

Prof. Chen-Yu Chiang , National Taipei University

Prof. Jia-Ching Wang, National Central University

Prof. Chi-Chun (Jeremy) Lee, National Tsing Hua University

Dr. Ying-Hui Lai, Academia Sinica

The workshop will cover a wide variety of research topics in speech signal processing. No matter from industry or academia, if you are interested in speech processing, natural language processing or music processing, this workshop is an event that you should not miss.

Important Dates

02/01 Registration and payment.

02/28 Registration deadline.

03/05 Payment deadline.

03/18 SWS 2016 !

Time and Location

March 18, 2016, Friday
Room 101, Barry Lam Hall, National Taiwan University
No.1, Sec. 4, Roosevelt Rd., Da’an Dist., Taipei City 10617, Taiwan

Organizers

國立臺灣大學 電機工程學系

Department of Electrical Engineering, National Taiwan University

Association for Computational Linguistics and Chinese Language Processing

Association for Computational Linguistics and Chinese Language Processing

Co-organizers

Research Center for Information Technology Innovation, Academia Sinica

Research Center for Information Technology Innovation, Academia Sinica

國立臺北大學 通訊工程學系

Department of Communication Engineering, National Taipei University

國立清華大學 電機工程學系

Department of Electrical Engineering, National Tsing Hua University

科技部工程技術研究發展司

Ministry of Science and Technology

Keynote Speakers
Learn from these great fellows

Tomas Mikolov
Facebook AI Research, FAIR
Tomas Mikolov
Research Scientist, Facebook AI Research, FAIR

Tomas Mikolov is a research scientist at Facebook AI Research lab. His most influential work includes development of recurrent neural network language models and discovery of semantic regularities in distributed word representations. These projects have been published as open-source tools RNNLM and word2vec which have been since widely used both in academia and industry. His main research interest is to develop intelligent machines.

Chen-Yu Chiang
Department of Communication Engineering, National Taipei University
Chen-Yu Chiang
Assistant Professor, Department of Communication Engineering, National Taipei University

Chen-Yu Chiang was born in Taipei, Taiwan, in 1980. He received the B.S., M.S., Ph.D. degrees in communication engineering from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 2002, 2004, and 2009, respectively. In 2009, he was a Postdoctoral Fellow at the Department of Electrical Engineering, NCTU, where he primarily worked on prosody modeling for automatic speech recognition and text-to-speech system, under the guidance of Prof. Sin-Horng Chen. In 2012, he was a Visiting Scholar at the Center for Signal and Image Processing (CSIP), Georgia Institute of Technology, Atlanta. Currently he is the director of the Speech and Multimedia Signal Processing Lab and an assistant professor at the Department of Communication Engineering, National Taipei University. His main research interests are in speech processing, in particular prosody modeling, automatic speech recognition and text-to-speech systems

Jia-Ching Wang
Department of Computer Science and Information Engineering, National Central University
Jia-Ching Wang
Associate Professor, Department of Computer Science and Information Engineering, National Central University

Jia-Ching Wang received the M.S. and Ph.D. degrees in electrical engineering from National Cheng Kung University, Tainan, Taiwan, in 1997 and 2002, respectively. He was an Honorary Fellow with the Department of Electrical and Computer Engineering, University of Wisconsin-Madison in 2008 and 2009. Currently, he is an Associate Professor with the Department of Computer Science and Information Engineering, National Central University, Jhongli City, Taiwan. His research interests include signal processing and VLSI architecture design. Dr. Wang is an honorary member of Phi Tau Phi Scholastic Honor Society and a member of the Association for Computing Machinery and IEICE.

Björn W. Schuller
Universität Passau
Björn W. Schuller
Full Professor, Universität Passau

Björn W. Schuller is Full Professor and Chair of Complex and Intelligent Systems at the University of Passau/Germany, Reader (Associate Professor) in Machine Learning at Imperial College London/UK, and the co-founding CEO of audEERING. Further affiliations include HIT/China as Visiting Professor and the University of Geneva/Switzerland and Joanneum Research in Graz/Austria as an Associate. Previously, he was with the CNRS-LIMSI in Orsay/France and headed the Machine Intelligence and Signal Processing Group at TUM in Munich/Germany. There, he received his diploma in 1999, doctoral degree in 2006, his habilitation in 2012, and was entitled Adjunct Teaching Professor – all in electrical engineering and information technology. Best known are his works advancing Intelligent Audio Analysis and Affective Computing. Dr Schuller is President Emeritus of the AAAC, elected member of the IEEE SLTC, and Senior Member of the IEEE. He (co-)authored 5 books and >500 peer reviewed technical contributions (>10,000 citations, h-index = 49). Selected activities include his role as Editor in Chief of the IEEE Transactions on Affective Computing, Associate Editor of Computer Speech and Language, IEEE Signal Processing Letters, IEEE Transactions on Cybernetics, and IEEE Transactions on Neural Networks and Learning Systems. Professor Schuller was General Chair of ACM ICMI 2014, Program Chair of ACM ICMI 2013, IEEE SocialCom 2012, and ACII 2015 and 2011, as well as organiser of the INTERSPEECH 2009-2016 annual Computational Paralinguistics Challenges and the 2011-2016 annual Audio/Visual Emotion Challenges. He won several awards including best results in research challenges such as CHiME, MediaEval, or of ACM Multimedia. In 2015 and 2016 he has been honoured as one of 40 extraordinary scientists under the age of 40 by the World Economic Forum.

Chi-Chun (Jeremy) Lee
Department of Electrical Engineering, National Tsing Hua University
Chi-Chun (Jeremy) Lee
Assistant Professor, Department of Electrical Engineering, National Tsing Hua University

Chi-Chun Lee (Jeremy) is an Assistant Professor at the Electrical Engineering Department of the National Tsing Hua University (NTHU), Taiwan. He received his B.S. degree with honor, magna cum laude, in electrical engineering from the University of Southern California (USC) in 2007, and his Ph.D. degree in electrical engineering from the USC in 2012. He was a data scientist at id:a lab at ID Analytics in 2013. He was awarded with the USC Annenberg Fellowship. He led a team to win the Emotion Challenge in Interspeech 2009. He is a coauthor on a best paper in Interspeech 2010. He is a member of Tau Beta Pi, Phi Kappa Phi, and Eta Kappa Nu honor societies.

His research interests are in interdisciplinary human-centered behavioral signal processing, emphasizing the development of computational frameworks in recognizing and quantifying human behavioral attributes and interpersonal interaction dynamics using machine learning and signal processing techniques.

Ying-Hui Lai
Research Center for Information Technology Innovation, Academia Sinica
Ying-Hui Lai
Doctor, Research Center for Information Technology Innovation, Academia Sinica

Ying-Hui Lai received the B.S. degrees in department of industrial education from National Taiwan Normal University in 2005, and the Ph.D. degree in department of biomedical engineering from National Yang-Ming University in 2013. From January 2010 to June 2012, Dr. Lai was as a research and development (R&D) engineering in Aescu technology, Taipei, Taiwan, where he engaged in research and product development in hearing aids. Currently, Dr. Lai is a postdoctoral fellow of the research center for information technology innovation at Academia Sinica. His research focuses on hearing aids, cochlear implant, speech enhancement and pattern recognition.

Agenda

TimeTopicSpeaker
08:30 - 09:00Check-in-
09:00 - 09:10Opening RemarksHung-Yi Lee Assistant Professor
09:10 - 10:10Recurrent Networks and Beyond Tomas Mikolov Research Scientist
10:10 - 10:30Coffee Break-
10:30 - 11:30Prosody Modeling and its Applications to Spoken Language Processing Chen-Yu Chiang Assistant Professor
11:30 - 12:30Robust Sound Event Recognition Jia-Ching Wang Associate Professor
12:30 - 14:00Lunch-
14:00 - 15:00Say no more – the computer already deeply knows you? Björn W. Schuller Full Professor
15:00 - 16:00A window into you: BSP effort for quantifying human behaviors across domains of health, education, and psychology Chi-Chun (Jeremy) Lee Assistant Professor
16:00 - 16:20Coffee Break-
16:20 - 17:20Improvement of the speech intelligibility for cochlear implantees by the adaptive compression strategy and deep learning based noise reduction approaches. Ying-Hui Lai Doctor
17:20 - 17:30Close-

Check-in

Opening Remarks

Hung-Yi Lee Assistant Professor

Recurrent Networks and Beyond

Tomas Mikolov Research Scientist

In this talk, I will give a brief overview of recurrent networks and their applications. I will then present several extensions that aim to help these powerful models to learn more patterns from training data. This will include a simple modification of the architecture that allows to capture longer context information, and an architecture that allows to learn complex algorithmic patterns. The talk will be concluded with a discussion of a long term research plan on how to advance machine learning techniques towards development of artificial intelligence.

HOST:Hung-yi Lee (Assistant Professor)

Coffee Break

Prosody Modeling and its Applications to Spoken Language Processing

Chen-Yu Chiang Assistant Professor

The term prosody refers to certain inherent suprasegmental properties that carry melodic, timing, and pragmatic information of continuous speech, encompassing accentuation, intonation, rhythm, speaking rate, prominences, pauses, and attitudes or emotions intended to express. Prosodic features are physically encoded in the variations in pitch contour, energy level, duration, and silence of spoken utterances. Prosodic studies have indicated that these prosodic features are not produced arbitrarily, but rather realized after a hierarchically organized structure which demarcates speech flows into domains of varying lengths by boundary or break cues. It is also known that the hierarchical prosodic structures are highly correlated with information sources of the linguistic features (lexical, syntactic, semantic, and pragmatic), the para-linguistic features (intentional, attitudinal, and stylistic), and the non-linguistic features (physical and emotional). Therefore, we can regard prosodic information as an interface between messages generated by humans and realized speech acoustic features. We may also regard prosody as a communication protocol between speakers. This talk will introduce some advances of prosody modeling jointly developed by the Speech and Multimedia Signal Processing Lab, NTPU, and the Speech Processing Lab, NCTU. The applications of the prosody modeling to automatic speech recognition (ASR) and text-to-speech system (TTS) will be also addressed.

HOST:Chung-Hsien Wu (Professor)

Robust Sound Event Recognition

Jia-Ching Wang Associate Professor

Using sound event recognition in home environments has become a new research issue in home automation or smart homes. Identifying sound classes can significantly help home environmental monitoring. Predefined home automation services can be triggered by associated sound classes. However, variant noises or interferences always make an impact on the recognition performance. These problems are unsolved and researches to tackle them are greatly needed. In this talk, we will present several robust sound event recognition techniques such as front-end processes to filter out noises or interferences, and an approach to extract robust audio features.

HOST:Hsin-Min Wang (Research Fellow)

Say no more – the computer already deeply knows you?

Björn W. Schuller Full Professor

Recent advances in deep and weakly supervised learning helped to lend computers new socio-affective skills. Focusing on human speech analysis, this talk highlights current abilities and potential in automatic characterisation of speakers in rich ways. This includes acquisition of information on speakers’ sincerity, deception, native language and degree of nativeness, cognitive and physical load, emotion and personality, or health diagnostics – just to name a few. An according modern architecture for holistic speech analysis will be shown including cooperative on-line learning by efficient crowd-sourcing. Further, an approach for end-to-end learning will be featured aiming at seamless speech modelling. Then, a low-resource implementation based on the openSMILE toolkit co-developed by the presenter is demonstrated considering real-time on device processing on smart phones and alike. Examples of application use-cases stem from a number of ongoing European projects – these will show-case the potential, but also current shortcomings. In an outlook, future avenues are laid out to best overcome these.

HOST:Yi-Hsuan Yang (Associate Research Fellow)

A window into you: BSP effort for quantifying human behaviors across domains of health, education, and psychology

Chi-Chun (Jeremy) Lee Assistant Professor

The abstraction of humans with a signals and systems framework naturally brings a synergy between engineering and behavioral sciences. Behavioral signal processing (BSP) offers a new frontier of interdisciplinary research between these communities. The core research in BSP is to model human behaviors, internal states, and perceptual judgements with observational data by using computational methods grounded in signal processing and machine learning. The outcome of BSP offers novel informatics for enhancing the capabilities of domain experts in facilitating better decision making

In this talk, we will demonstrate the use of BSP techniques in various application domains: affective computing, mental health, and educational research. The heterogeneity in human behavior expression, the subjectivity in human perceptual judgement, and the complex non-linear interplay of multiple influencing factors require not only an advancement in algorithmic development but also a closer collaboration with domain experts. With this emerging effort of BSP, we strive not only to provide engineering solutions to domain experts but also to open up potential opportunities of novel insights in the applications with broad societal impact.

HOST:Jen-Tzung Chien (Professor)

Coffee Break

Improvement of the speech intelligibility for cochlear implantees by the adaptive compression strategy and deep learning based noise reduction approaches.

Ying-Hui Lai Doctor

Cochlear implants (CIs) are surgically implanted electronic devices that provide a sense of sound in patients with profound to severe hearing loss. The considerable progress of CI technologies in the past three decades has enabled many CI users to enjoy a high level of speech understanding in quiet. For most CI users, however, understanding speech in noisy environments remains a challenge. In this talk, I will present two approaches to address this important issue to further improve the benefits of speech intelligibility for CI recipients under noisy conditions. First, I will describe the proposed adaptive envelope compression (AEC) strategy, which is effective at enhancing the modulation depth of envelope waveform by making best use of its dynamic range and thus improving the intelligibility for CI recipients compared with the traditional static envelope compression. Second, I will introduce the deep learning based NR approach (deep denoising autoencoder, DDAE), which has been investigated its effectiveness for improving the speech intelligibility for CI recipients. Experimental results indicated that, under challenging noisy listening conditions, the AEC strategy and DDAE NR yields higher intelligibility scores than conventional approaches for Mandarin-speaking listeners, suggesting that AEC and DDAE NR could potentially be integrated into a CI processor to overcome speech perception degradation caused by noise.

HOST:Tsao Yu (Assistant Research Fellow)

Traffic Routes

National Taiwan University - No.1, Sec. 4, Roosevelt Rd., Da’an Dist., Taipei City 10617, Taiwan

In responding to environmental protection and carrying out the carbon deduction policy, we strongly recommend you take MRT to visit the National Taiwan University (NTU).

  • Songshan—Xindian Line - The National Taiwan University (NTU) is located right next to the Gongguan Metro Station (Xindian Line). The Exit No.3 is only 3 minutes away from the main entrance of the NTU by foot. It is even more convenient that its Exit No.2 directly links to the ZouhShan Road Entrance (cross-section of Roosevelt Road and ZhouShan Road).
  • Wenshan—Neihu Line - If you wish to reach the north-east side of the NTU main campus, you may consider taking Wenshan— Neihu Line and getting off at Technology Building Station. Then walk southward along FuXing South Road, you may reach the XinHai Entrance of the NTU.
  • Guideline
map

Contact Us

GENERAL CO-CHAIR
Prof. Hung-Yi Lee
hungyilee@ntu.edu.tw

No.1, Sec. 4, Roosevelt Rd., Da’an Dist., Taipei City 10617, Taiwan

Prof. Lin-shan Lee
lslee@gate.sinica.edu.tw

No.1, Sec. 4, Roosevelt Rd., Da’an Dist., Taipei City 10617, Taiwan

WORKSHOP EMAIL
Workshop E-mail
sws2016.ntuspeech@gmail.com

For any question, please mail to this.

REGISTRATION
Ms. Qi Huang
aclclp@hp.iis.sinica.edu.tw

Association for Computational Linguistics and Chinese Language Processing

02-2788-3799 #1502